Distributed Processing System

A generic distributed processing system using NATS JetStream for message routing and supporting multiple service types (PDF processing, image analysis, text processing, etc.).

🏗️ Architecture Overview

Infrastructure Server    Processing Servers         Client Applications
     (NATS)              (GPU, CPU, etc.)           (Laptop, Web, etc.)
        |                       |                           |
   ┌────────────┐        ┌─────────────┐             ┌──────────────┐
   │    NATS    │◄──────►│ PDF Worker  │             │   Your App   │
   │ JetStream  │        │ Image Worker│◄───────────►│  (services.py│
   │  Message   │        │ Text Worker │             │   thinktank2) │
   │   Broker   │        │     ...     │             │      ...     │
   └────────────┘        └─────────────┘             └──────────────┘
        │                       │                           │
   Pure Messaging         Business Logic              Submit Requests

📁 Directory Structure

ct/
├── infrastructure/          # 🏗️ Infrastructure components
│   └── nats-server/        # Pure NATS server (dedicated server)
├── pdf/                    # 📄 PDF processing service (GPU server)
│   ├── docling_worker.py   # Worker process
│   ├── services.py         # Client library
│   └── tests/             # Service tests
└── future_services/        # 🔮 Add more services as needed
    ├── image_processing/
    ├── text_analysis/
    └── audio_transcription/

🚀 Quick Start

1. Infrastructure Setup (NATS Server)

On your dedicated NATS server:

cd infrastructure/nats-server/
./setup.sh
# Save the generated token - you'll need it for all services!

2. PDF Processing Service Setup (GPU Server)

On your GPU server:

cd pdf/
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Configure environment
cp environment_config.txt .env
# Edit .env with NATS server IP and token

# Start the worker
python docling_worker.py

3. Client Integration (Your Laptop)

Connect your existing services.py:

# In your thinktank2 project
from pdf.services import DocumentService

# Configure to point to your NATS server
doc_service = DocumentService()
await doc_service.setup()

result = await doc_service.process_document(
    s3_key="documents/my-file.pdf",
    docling_options={...}
)

🎛️ Configuration

Infrastructure Server (.env)

# Pure NATS configuration - no service specifics
NATS_TOKEN=your-generated-secure-token

Processing Services (.env)

# Points to your infrastructure
NATS_URL=nats://your-nats-server-ip:4222
NATS_TOKEN=your-generated-secure-token

# Service-specific settings
AWS_ACCESS_KEY_ID=your-s3-credentials
# ... etc

🔧 Service Types & Namespacing

Each service type gets its own namespace on the shared NATS server:

Service Type	Stream Name	Subject Prefix	Worker Group
PDF Docling	`PDF_PROCESSING`	`pdf.docling.*`	`pdf_docling_workers`
Image Processing	`IMAGE_PROCESSING`	`image.process.*`	`image_workers`
Text Analysis	`TEXT_ANALYSIS`	`text.analyze.*`	`text_workers`
Audio Transcription	`AUDIO_TRANSCRIPTION`	`audio.transcribe.*`	`audio_workers`

📋 Deployment Scenarios

Scenario 1: Simple Setup

NATS Server: 1 dedicated server
PDF Processing: 1 GPU server
Clients: Your laptop

Scenario 2: Production Setup

NATS Cluster: 3 servers (HA)
PDF Workers: Multiple GPU servers (auto-scaling)
Image Workers: Multiple CPU servers
Clients: Web applications, mobile apps, etc.

Scenario 3: Development

NATS: Local Docker container
Workers: Local processes
Clients: Local development

🛡️ Security

Token Authentication: Secure token for NATS access
Network Isolation: Firewall rules for known IPs only
TLS: Optional TLS encryption for production
Separate Concerns: Infrastructure vs. business logic

🔄 Adding New Services

Create service directory: mkdir new_service/
Implement worker: Use existing patterns from pdf/
Configure namespace: Add to generic_config.py
Deploy: On appropriate servers (GPU, CPU, etc.)
Connect: All services use the same NATS infrastructure

📖 Documentation

Infrastructure Setup - NATS server deployment
Architecture Guide - Detailed system design
PDF Service - PDF processing specifics

🧪 Testing

# Test infrastructure
cd infrastructure/nats-server/
# Connection tests included in setup

# Test PDF service
cd pdf/
pytest tests/ -v

# Test end-to-end
python -c "
import asyncio
from services import DocumentService

async def test():
    service = DocumentService()
    await service.setup()
    print('✅ Connected to distributed system!')

asyncio.run(test())
"

🎯 Key Benefits

✅ Scalable: Add processing power by adding servers
✅ Flexible: Mix different service types on same infrastructure
✅ Reliable: Dedicated message infrastructure
✅ Maintainable: Clear separation of concerns
✅ Future-proof: Easy to add new processing capabilities

Perfect for: Multi-modal AI processing, distributed computing, microservices architecture

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
infrastructure		infrastructure
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
NATS_SETUP.md		NATS_SETUP.md
README.md		README.md
S3_INTEGRATION_README.md		S3_INTEGRATION_README.md
client_nats_objectstore.py		client_nats_objectstore.py
config.py		config.py
deploy_service.sh		deploy_service.sh
deploy_worker.sh		deploy_worker.sh
docling_options_examples.py		docling_options_examples.py
docling_worker.py		docling_worker.py
environment_config.txt		environment_config.txt
generic_config.py		generic_config.py
nats-server.conf		nats-server.conf
pytest.ini		pytest.ini
requirements.txt		requirements.txt
s3_client.py		s3_client.py
s3_config.py		s3_config.py
s3_integration.py		s3_integration.py
services.py		services.py
setup_nats_streams.py		setup_nats_streams.py
start_services.sh		start_services.sh
start_worker.sh		start_worker.sh
status_worker.sh		status_worker.sh
stop_worker.sh		stop_worker.sh
storage_simple_s3.py		storage_simple_s3.py
test_enrichment_conversion.py		test_enrichment_conversion.py
test_nats_connection.py		test_nats_connection.py
test_vlm_nats_integration.py		test_vlm_nats_integration.py
worker_nats_objectstore.py		worker_nats_objectstore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Processing System

🏗️ Architecture Overview

📁 Directory Structure

🚀 Quick Start

1. Infrastructure Setup (NATS Server)

2. PDF Processing Service Setup (GPU Server)

3. Client Integration (Your Laptop)

🎛️ Configuration

Infrastructure Server (.env)

Processing Services (.env)

🔧 Service Types & Namespacing

📋 Deployment Scenarios

Scenario 1: Simple Setup

Scenario 2: Production Setup

Scenario 3: Development

🛡️ Security

🔄 Adding New Services

📖 Documentation

🧪 Testing

🎯 Key Benefits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Processing System

🏗️ Architecture Overview

📁 Directory Structure

🚀 Quick Start

1. Infrastructure Setup (NATS Server)

2. PDF Processing Service Setup (GPU Server)

3. Client Integration (Your Laptop)

🎛️ Configuration

Infrastructure Server (.env)

Processing Services (.env)

🔧 Service Types & Namespacing

📋 Deployment Scenarios

Scenario 1: Simple Setup

Scenario 2: Production Setup

Scenario 3: Development

🛡️ Security

🔄 Adding New Services

📖 Documentation

🧪 Testing

🎯 Key Benefits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages