DPython is a custom Python-based launcher that enables automatic single-GPU or multi-machine distributed training over a local network (LAN) using Hugging Face Accelerate.
It is built for real-world setups, including low-VRAM GPUs (GTX 1650 safe), and removes the complexity of manual distributed configuration.
- ✅ Auto-detects local GPU
- 🌐 Checks remote GPU availability via SSH
- 🔁 Automatically falls back to single-GPU
- ⚡ Launches true multi-machine distributed training
- 🧠 No changes required inside your training logic
- 🖥️ Windows-friendly (batch file included)
- 🔥 Powered by Hugging Face Accelerate
distributed_env/ │ ├── dpython.py # Distributed training launcher ├── cluster.json # Cluster configuration ├── run.bat # Windows one-click launcher │ ├── train.py # Example Accelerate training script └── README.md
- Windows or Linux
- NVIDIA GPU
- NVIDIA Driver + CUDA
- Python 3.9+
- SSH enabled (passwordless SSH recommended)
- Same project path on both machines
pip install torch accelerate{
"master_ip": "192.168.1.10",
"worker_ip": "192.168.1.11",
"port": 29500,
"ssh_user": "vikas"
}Notes:
master_ip→ Machine where you run the commandworker_ip→ Remote GPU machineport→ Any free port (default 29500)- SSH must work without prompts
dpython.bat train.pydpython train.pypython dpython.py train.py --epochs 10 --batch_size 16DPython works with any script using Hugging Face Accelerate.
Minimum required setup:
from accelerate import Accelerator
accelerator = Accelerator()
device = accelerator.deviceDPython automatically manages:
- Process ranks
- Device placement
- Multi-GPU synchronization
- Distributed launch across machines
================ GPU STATUS ================
LOCAL GPU: NVIDIA GTX 1650
REMOTE GPU: NVIDIA RTX 3050
[DPYTHON] Launching remote worker...
[DPYTHON] Launching local master...
Distributed training initialized
REMOTE GPU NOT AVAILABLE
Falling back to LOCAL GPU ONLY
[DPYTHON] Running single-GPU training
| Problem | Solution |
|---|---|
| Low VRAM GPUs | Multi-machine training |
| Manual Accelerate setup | Fully automated |
| Idle remote GPU | Auto utilization |
| Complex configs | Simple JSON |
| Research-grade setups | Production-ready script |
- Python environment must match on all machines
- Dataset paths must exist on both machines
- LAN latency affects scaling efficiency
MIT License Free to use, modify, and distribute.
V R Vikash
AI & Distributed Systems Developer
Built for real-world, low-VRAM GPU environments
If this project helped you:
- ⭐ Star the repository
- 🍴 Fork and improve
- 🐛 Open issues or suggestions
Happy Distributed Training 🚀