Voice Creator Studio packages all services into Docker containers and targets cloud environments that support container orchestration.
- Containers: Each service (FastAPI backend, React frontend, Celery workers, Redis) runs in a dedicated Docker image.
- Cloud Services: Production deployments run on Kubernetes (e.g., Amazon EKS or Google GKE). Object storage uses services like Amazon S3; PostgreSQL or Aurora host persistent data; Redis is provided by a managed service such as AWS ElastiCache.
- Local Development: Docker Compose starts the full stack on a single host for developers.
- FastAPI instances are replicated behind a load balancer (NGINX or a cloud load balancer).
- Horizontal Pod Autoscalers adjust replica counts based on CPU and memory utilization.
- Static assets from the React build are served via a CDN to offload traffic from the backend.
- Celery workers run in separate deployments that can scale independently from the web layer.
- Queue depth and processing time drive autoscaling decisions; GPU jobs use dedicated nodes with limited concurrency.
- Workers are labeled by capability (CPU vs. GPU) so tasks are routed to matching queues.
- GPU: NVIDIA GPU with at least 16 GB VRAM (e.g., Tesla V100/A100). Multi-GPU setups improve throughput for large voices.
- CPU/RAM: 8+ vCPUs and 32 GB RAM minimum to feed the GPU and handle preprocessing.
- Storage: 500 GB SSD for datasets, checkpoints, and intermediate artifacts.
- GPU (optional): 8 GB VRAM (e.g., T4) accelerates real-time synthesis, but CPU-only inference is possible for smaller models.
- CPU/RAM: 4 vCPUs and 8–16 GB RAM support typical interactive workloads.
- Storage: 50 GB SSD accommodates models and cached assets.