Skip to content

jecruz/mlx-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLX Benchmark

License: MIT Python 3.10+ MLX

A comprehensive benchmarking tool for MLX models on Apple Silicon.

Features

  • 📊 Comprehensive Metrics: Measures throughput, TTFT, token latency
  • 🔄 Multiple Test Scenarios: Short, medium, and long generation tests
  • 📈 Streaming Benchmarks: Measures real-time streaming performance
  • 🎯 Consistent Results: Multiple runs with statistical averaging
  • 💻 Apple Silicon Optimized: Built specifically for MLX framework

Installation

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10 or higher

Setup

# Clone the repository
git clone <your-repo-url>
cd mlx-benchmark

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Usage

Basic Benchmark

python benchmark.py --model-path /path/to/your/mlx-model

Custom Number of Runs

python benchmark.py --model-path /path/to/your/mlx-model --runs 5

Example with LM Studio Model

python benchmark.py \
  --model-path ~/.lmstudio/models/lmstudio-community/Qwen3-Coder-Next-MLX-6bit \
  --runs 3

Benchmark Tests

The tool runs four types of benchmarks:

1. Short Code Generation (50 tokens)

Tests quick code completions and snippets

2. Medium Code Generation (150 tokens)

Tests function and class implementations

3. Long Code Generation (300 tokens)

Tests complex code generation scenarios

4. Streaming Benchmark

Measures real-time streaming performance:

  • Time to First Token (TTFT)
  • Per-token latency
  • Streaming throughput

Metrics Explained

  • Throughput: Tokens generated per second (tokens/s)
  • TTFT: Time to First Token - how quickly the model starts responding
  • Token Latency: Average time between each token during streaming
  • Total Time: Complete generation time including prompt processing

Example Output

======================================================================
  MLX Model Benchmark
======================================================================
Model: Qwen3-Coder-Next-MLX-6bit
Path:  ~/.lmstudio/models/lmstudio-community/Qwen3-Coder-Next-MLX-6bit
Runs:  3

Loading model... Done! (13.82s)

======================================================================
  Short Code Generation (50 tokens)
======================================================================

Run 1:
  Prompt tokens:     9
  Generated tokens:  40
  Total time:        0.820s
  Throughput:        48.79 tokens/s

Average (excluding warmup):
  Throughput:        48.88 tokens/s
  Time per run:      0.818s

Performance Expectations

Typical performance on Apple Silicon:

Chip Model Size Expected Throughput
M1 7B (6-bit) 30-50 tokens/s
M2 7B (6-bit) 40-60 tokens/s
M3 Pro/Max 7B (6-bit) 50-70 tokens/s
M3 Max 14B (6-bit) 35-55 tokens/s

Development

Project Structure

mlx-benchmark/
├── benchmark.py         # Main benchmark script
├── requirements.txt     # Python dependencies
├── README.md           # This file
└── .gitignore          # Git ignore patterns

Adding New Benchmarks

To add a new benchmark test, edit the test_cases list in benchmark.py:

test_cases = [
    {
        "name": "Your Test Name",
        "prompt": "Your prompt here",
        "max_tokens": 100
    }
]

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Acknowledgments

  • Built with MLX by Apple
  • Uses mlx-lm for language model inference

About

Comprehensive benchmarking tool for MLX models on Apple Silicon

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages