🚀 DPython – Distributed Multi-GPU Training Launcher (LAN)

Note: You need to change some path in the cluster.json and dpython.py

DPython is a custom Python-based launcher that enables automatic single-GPU or multi-machine distributed training over a local network (LAN) using Hugging Face Accelerate.

It is built for real-world setups, including low-VRAM GPUs (GTX 1650 safe), and removes the complexity of manual distributed configuration.

✨ Key Features

✅ Auto-detects local GPU
🌐 Checks remote GPU availability via SSH
🔁 Automatically falls back to single-GPU
⚡ Launches true multi-machine distributed training
🧠 No changes required inside your training logic
🖥️ Windows-friendly (batch file included)
🔥 Powered by Hugging Face Accelerate

📁 Project Structure

distributed_env/ │ ├── dpython.py # Distributed training launcher ├── cluster.json # Cluster configuration ├── run.bat # Windows one-click launcher │ ├── train.py # Example Accelerate training script └── README.md

⚙️ Requirements

Local & Remote Machine

Windows or Linux
NVIDIA GPU
NVIDIA Driver + CUDA
Python 3.9+
SSH enabled (passwordless SSH recommended)
Same project path on both machines

Python Packages

pip install torch accelerate

🔧 Configuration

1️⃣ Cluster Configuration (`cluster.json`)

{
  "master_ip": "192.168.1.10",
  "worker_ip": "192.168.1.11",
  "port": 29500,
  "ssh_user": "vikas"
}

Notes:

master_ip → Machine where you run the command
worker_ip → Remote GPU machine
port → Any free port (default 29500)
SSH must work without prompts

▶️ Usage

🖱️ One-Click (Windows)

dpython.bat train.py

⌨️ TO use the Multi GPU from any where in you pc add the main folder to your environment variable

dpython train.py

📦 With Arguments

python dpython.py train.py --epochs 10 --batch_size 16

🧪 Example Training Script (`train.py`)

DPython works with any script using Hugging Face Accelerate.

Minimum required setup:

from accelerate import Accelerator

accelerator = Accelerator()
device = accelerator.device

DPython automatically manages:

Process ranks
Device placement
Multi-GPU synchronization
Distributed launch across machines

🖥️ Runtime Behavior

✅ When Remote GPU is Available

================ GPU STATUS ================
LOCAL GPU: NVIDIA GTX 1650
REMOTE GPU: NVIDIA RTX 3050

[DPYTHON] Launching remote worker...
[DPYTHON] Launching local master...
Distributed training initialized

⚠️ When Remote GPU is NOT Available

REMOTE GPU NOT AVAILABLE
Falling back to LOCAL GPU ONLY
[DPYTHON] Running single-GPU training

🧠 Why DPython?

Problem	Solution
Low VRAM GPUs	Multi-machine training
Manual Accelerate setup	Fully automated
Idle remote GPU	Auto utilization
Complex configs	Simple JSON
Research-grade setups	Production-ready script

🚧 Known Limitations

Python environment must match on all machines
Dataset paths must exist on both machines
LAN latency affects scaling efficiency

📜 License

MIT License Free to use, modify, and distribute.

👤 Author

V R Vikash
AI & Distributed Systems Developer
Built for real-world, low-VRAM GPU environments

⭐ Support

If this project helped you:

⭐ Star the repository
🍴 Fork and improve
🐛 Open issues or suggestions

Happy Distributed Training 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 DPython – Distributed Multi-GPU Training Launcher (LAN)

Note: You need to change some path in the cluster.json and dpython.py

✨ Key Features

📁 Project Structure

⚙️ Requirements

Local & Remote Machine

Python Packages

🔧 Configuration

1️⃣ Cluster Configuration (`cluster.json`)

▶️ Usage

🖱️ One-Click (Windows)

⌨️ TO use the Multi GPU from any where in you pc add the main folder to your environment variable

📦 With Arguments

🧪 Example Training Script (`train.py`)

🖥️ Runtime Behavior

✅ When Remote GPU is Available

⚠️ When Remote GPU is NOT Available

🧠 Why DPython?

🚧 Known Limitations

📜 License

👤 Author

⭐ Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
cluster.json		cluster.json
dpython.bat		dpython.bat
dpython.py		dpython.py
train.py		train.py

vikashvr1024/Multi-GPU-Training

Folders and files

Latest commit

History

Repository files navigation

🚀 DPython – Distributed Multi-GPU Training Launcher (LAN)

Note: You need to change some path in the cluster.json and dpython.py

✨ Key Features

📁 Project Structure

⚙️ Requirements

Local & Remote Machine

Python Packages

🔧 Configuration

1️⃣ Cluster Configuration (cluster.json)

▶️ Usage

🖱️ One-Click (Windows)

⌨️ TO use the Multi GPU from any where in you pc add the main folder to your environment variable

📦 With Arguments

🧪 Example Training Script (train.py)

🖥️ Runtime Behavior

✅ When Remote GPU is Available

⚠️ When Remote GPU is NOT Available

🧠 Why DPython?

🚧 Known Limitations

📜 License

👤 Author

⭐ Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1️⃣ Cluster Configuration (`cluster.json`)

🧪 Example Training Script (`train.py`)

Packages