Releases: SharpAI/mlx-server
mlx-server b21
mlx-server b21-2d382ef
Merge pull request #3 from SharpAI/feature/api-parity-roadmap
Feature/api parity roadmap
Changelog
- fix: CI — install mlx.metallib from Python mlx package (4086ce9)
- feat: Prompt caching — reduce TTFT by reusing system prompt KV state (6b34c97)
- feat: API key authentication (--api-key flag) (75e927d)
- feat: Phase 3 — Memory limit, /metrics, enhanced /health, graceful shutdown, stats (433d90e)
- feat: Phase 2 — JSON mode, VLM vision support, multipart content, extra sampling params (bfc980a)
- feat: Phase 1 API parity with mlx-lm (519bfda)
Download
Quick Start
tar -xzf mlx-server-b21-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b14
mlx-server b14-689df93
Merge pull request #2 from SharpAI/feature/qwen35-support
feat: auto-detect local directory path for --model flag
Changelog
- feat: auto-detect local directory path for --model flag (1232bc9)
Download
Quick Start
tar -xzf mlx-server-b14-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b12
mlx-server b12-1c7c512
Merge pull request #1 from SharpAI/feature/qwen35-support
Feature/qwen35 support
Changelog
- feat: add full OpenAI-compatible tool calling support (19717cf)
- feat: add --thinking flag to disable thinking mode by default (Qwen3.5) (91ee743)
- feat: update mlx-swift-lm to SharpAI fork main branch for Qwen3.5 support (3e1f923)
Download
Quick Start
tar -xzf mlx-server-b12-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b8
mlx-server b8-2dce45c
feat: add Aegis integration events
- Emit JSON-line ready event on stdout after server starts:
{"event":"ready","port":5413,"model":"...","engine":"mlx","vision":false} - Add approximate token counting in non-streaming responses
(prompt chars/4 heuristic + completion chunk count) - Note: Aegis detects process exit for stopped state automatically
Changelog
- feat: add Aegis integration events (2dce45c)
Download
Quick Start
tar -xzf mlx-server-b8-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b7
mlx-server b7-6bd9224
Add context window, temperature, parallel requests, and sampling controls
New CLI flags (matching llama-server patterns):
--ctx-size N Context window / KV cache size (sliding window)
--temp N Default sampling temperature (default: 0.6)
--top-p N Top-p nucleus sampling (default: 1.0)
--repeat-penalty N Repetition penalty factor
--parallel N Max concurrent request slots (default: 1)
Per-request overrides via JSON body:
temperature, top_p, repetition_penalty, max_tokens
Concurrency control:
AsyncSemaphore actor limits concurrent inference tasks
Also: commit Package.resolved for reproducible builds
Changelog
- Add context window, temperature, parallel requests, and sampling controls (6bd9224)
Download
Quick Start
tar -xzf mlx-server-b7-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b6
mlx-server b6-78ff596
Auto-generate release notes from commit messages + changelog
- Release body now includes the triggering commit message (like llama.cpp)
- Auto-generates changelog from git log since last release tag
- Includes download link and quick start in release notes
Changelog
- Auto-generate release notes from commit messages + changelog (78ff596)
Download
Quick Start
tar -xzf mlx-server-b6-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.
mlx-server b5
mlx-server b5-47916b8
Native Swift MLX server with OpenAI-compatible API for Apple Silicon.
Commit: 47916b8
Quick Start
tar -xzf mlx-server-b5-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413Note: Requires
mlx.metallibnext to the binary for GPU compute. See README for setup.