A generic distributed processing system using NATS JetStream for message routing and supporting multiple service types (PDF processing, image analysis, text processing, etc.).
Infrastructure Server Processing Servers Client Applications
(NATS) (GPU, CPU, etc.) (Laptop, Web, etc.)
| | |
┌────────────┐ ┌─────────────┐ ┌──────────────┐
│ NATS │◄──────►│ PDF Worker │ │ Your App │
│ JetStream │ │ Image Worker│◄───────────►│ (services.py│
│ Message │ │ Text Worker │ │ thinktank2) │
│ Broker │ │ ... │ │ ... │
└────────────┘ └─────────────┘ └──────────────┘
│ │ │
Pure Messaging Business Logic Submit Requests
ct/
├── infrastructure/ # 🏗️ Infrastructure components
│ └── nats-server/ # Pure NATS server (dedicated server)
├── pdf/ # 📄 PDF processing service (GPU server)
│ ├── docling_worker.py # Worker process
│ ├── services.py # Client library
│ └── tests/ # Service tests
└── future_services/ # 🔮 Add more services as needed
├── image_processing/
├── text_analysis/
└── audio_transcription/
On your dedicated NATS server:
cd infrastructure/nats-server/
./setup.sh
# Save the generated token - you'll need it for all services!On your GPU server:
cd pdf/
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure environment
cp environment_config.txt .env
# Edit .env with NATS server IP and token
# Start the worker
python docling_worker.pyConnect your existing services.py:
# In your thinktank2 project
from pdf.services import DocumentService
# Configure to point to your NATS server
doc_service = DocumentService()
await doc_service.setup()
result = await doc_service.process_document(
s3_key="documents/my-file.pdf",
docling_options={...}
)# Pure NATS configuration - no service specifics
NATS_TOKEN=your-generated-secure-token# Points to your infrastructure
NATS_URL=nats://your-nats-server-ip:4222
NATS_TOKEN=your-generated-secure-token
# Service-specific settings
AWS_ACCESS_KEY_ID=your-s3-credentials
# ... etcEach service type gets its own namespace on the shared NATS server:
| Service Type | Stream Name | Subject Prefix | Worker Group |
|---|---|---|---|
| PDF Docling | PDF_PROCESSING |
pdf.docling.* |
pdf_docling_workers |
| Image Processing | IMAGE_PROCESSING |
image.process.* |
image_workers |
| Text Analysis | TEXT_ANALYSIS |
text.analyze.* |
text_workers |
| Audio Transcription | AUDIO_TRANSCRIPTION |
audio.transcribe.* |
audio_workers |
- NATS Server: 1 dedicated server
- PDF Processing: 1 GPU server
- Clients: Your laptop
- NATS Cluster: 3 servers (HA)
- PDF Workers: Multiple GPU servers (auto-scaling)
- Image Workers: Multiple CPU servers
- Clients: Web applications, mobile apps, etc.
- NATS: Local Docker container
- Workers: Local processes
- Clients: Local development
- Token Authentication: Secure token for NATS access
- Network Isolation: Firewall rules for known IPs only
- TLS: Optional TLS encryption for production
- Separate Concerns: Infrastructure vs. business logic
- Create service directory:
mkdir new_service/ - Implement worker: Use existing patterns from
pdf/ - Configure namespace: Add to
generic_config.py - Deploy: On appropriate servers (GPU, CPU, etc.)
- Connect: All services use the same NATS infrastructure
- Infrastructure Setup - NATS server deployment
- Architecture Guide - Detailed system design
- PDF Service - PDF processing specifics
# Test infrastructure
cd infrastructure/nats-server/
# Connection tests included in setup
# Test PDF service
cd pdf/
pytest tests/ -v
# Test end-to-end
python -c "
import asyncio
from services import DocumentService
async def test():
service = DocumentService()
await service.setup()
print('✅ Connected to distributed system!')
asyncio.run(test())
"✅ Scalable: Add processing power by adding servers
✅ Flexible: Mix different service types on same infrastructure
✅ Reliable: Dedicated message infrastructure
✅ Maintainable: Clear separation of concerns
✅ Future-proof: Easy to add new processing capabilities
Perfect for: Multi-modal AI processing, distributed computing, microservices architecture