A comprehensive benchmarking tool for MLX models on Apple Silicon.
- 📊 Comprehensive Metrics: Measures throughput, TTFT, token latency
- 🔄 Multiple Test Scenarios: Short, medium, and long generation tests
- 📈 Streaming Benchmarks: Measures real-time streaming performance
- 🎯 Consistent Results: Multiple runs with statistical averaging
- 💻 Apple Silicon Optimized: Built specifically for MLX framework
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10 or higher
# Clone the repository
git clone <your-repo-url>
cd mlx-benchmark
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtpython benchmark.py --model-path /path/to/your/mlx-modelpython benchmark.py --model-path /path/to/your/mlx-model --runs 5python benchmark.py \
--model-path ~/.lmstudio/models/lmstudio-community/Qwen3-Coder-Next-MLX-6bit \
--runs 3The tool runs four types of benchmarks:
Tests quick code completions and snippets
Tests function and class implementations
Tests complex code generation scenarios
Measures real-time streaming performance:
- Time to First Token (TTFT)
- Per-token latency
- Streaming throughput
- Throughput: Tokens generated per second (tokens/s)
- TTFT: Time to First Token - how quickly the model starts responding
- Token Latency: Average time between each token during streaming
- Total Time: Complete generation time including prompt processing
======================================================================
MLX Model Benchmark
======================================================================
Model: Qwen3-Coder-Next-MLX-6bit
Path: ~/.lmstudio/models/lmstudio-community/Qwen3-Coder-Next-MLX-6bit
Runs: 3
Loading model... Done! (13.82s)
======================================================================
Short Code Generation (50 tokens)
======================================================================
Run 1:
Prompt tokens: 9
Generated tokens: 40
Total time: 0.820s
Throughput: 48.79 tokens/s
Average (excluding warmup):
Throughput: 48.88 tokens/s
Time per run: 0.818s
Typical performance on Apple Silicon:
| Chip | Model Size | Expected Throughput |
|---|---|---|
| M1 | 7B (6-bit) | 30-50 tokens/s |
| M2 | 7B (6-bit) | 40-60 tokens/s |
| M3 Pro/Max | 7B (6-bit) | 50-70 tokens/s |
| M3 Max | 14B (6-bit) | 35-55 tokens/s |
mlx-benchmark/
├── benchmark.py # Main benchmark script
├── requirements.txt # Python dependencies
├── README.md # This file
└── .gitignore # Git ignore patterns
To add a new benchmark test, edit the test_cases list in benchmark.py:
test_cases = [
{
"name": "Your Test Name",
"prompt": "Your prompt here",
"max_tokens": 100
}
]Contributions are welcome! Please feel free to submit a Pull Request.
MIT License