Releases · SharpAI/mlx-server

24 Mar 16:58

github-actions

b21

2d382ef

mlx-server b21 Latest

Latest

mlx-server b21-2d382ef

Merge pull request #3 from SharpAI/feature/api-parity-roadmap

Feature/api parity roadmap

Changelog

fix: CI — install mlx.metallib from Python mlx package (4086ce9)
feat: Prompt caching — reduce TTFT by reusing system prompt KV state (6b34c97)
feat: API key authentication (--api-key flag) (75e927d)
feat: Phase 3 — Memory limit, /metrics, enhanced /health, graceful shutdown, stats (433d90e)
feat: Phase 2 — JSON mode, VLM vision support, multipart content, extra sampling params (bfc980a)
feat: Phase 1 API parity with mlx-lm (519bfda)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b21-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

23 Mar 01:45

github-actions

b14

689df93

mlx-server b14

mlx-server b14-689df93

Merge pull request #2 from SharpAI/feature/qwen35-support

feat: auto-detect local directory path for --model flag

Changelog

feat: auto-detect local directory path for --model flag (1232bc9)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b14-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

22 Mar 19:59

github-actions

b12

1c7c512

mlx-server b12

mlx-server b12-1c7c512

Merge pull request #1 from SharpAI/feature/qwen35-support

Feature/qwen35 support

Changelog

feat: add full OpenAI-compatible tool calling support (19717cf)
feat: add --thinking flag to disable thinking mode by default (Qwen3.5) (91ee743)
feat: update mlx-swift-lm to SharpAI fork main branch for Qwen3.5 support (3e1f923)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b12-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

21 Mar 23:12

github-actions

2dce45c

mlx-server b8

mlx-server b8-2dce45c

feat: add Aegis integration events

Emit JSON-line ready event on stdout after server starts:
{"event":"ready","port":5413,"model":"...","engine":"mlx","vision":false}
Add approximate token counting in non-streaming responses
(prompt chars/4 heuristic + completion chunk count)
Note: Aegis detects process exit for stopped state automatically

Changelog

feat: add Aegis integration events (2dce45c)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b8-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

21 Mar 16:25

github-actions

6bd9224

mlx-server b7

mlx-server b7-6bd9224

Add context window, temperature, parallel requests, and sampling controls

New CLI flags (matching llama-server patterns):
--ctx-size N Context window / KV cache size (sliding window)
--temp N Default sampling temperature (default: 0.6)
--top-p N Top-p nucleus sampling (default: 1.0)
--repeat-penalty N Repetition penalty factor
--parallel N Max concurrent request slots (default: 1)

Per-request overrides via JSON body:
temperature, top_p, repetition_penalty, max_tokens

Concurrency control:
AsyncSemaphore actor limits concurrent inference tasks

Also: commit Package.resolved for reproducible builds

Changelog

Add context window, temperature, parallel requests, and sampling controls (6bd9224)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b7-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

21 Mar 15:34

github-actions

78ff596

mlx-server b6

mlx-server b6-78ff596

Auto-generate release notes from commit messages + changelog

Release body now includes the triggering commit message (like llama.cpp)
Auto-generates changelog from git log since last release tag
Includes download link and quick start in release notes

Changelog

Auto-generate release notes from commit messages + changelog (78ff596)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b6-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

21 Mar 07:08

github-actions

47916b8

mlx-server b5

mlx-server b5-47916b8

Native Swift MLX server with OpenAI-compatible API for Apple Silicon.

Commit: 47916b8

Quick Start

tar -xzf mlx-server-b5-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Assets 3

Releases: SharpAI/mlx-server

mlx-server b21

mlx-server b21-2d382ef

Changelog

Download

Quick Start

Uh oh!

mlx-server b14

mlx-server b14-689df93

Changelog

Download

Quick Start

Uh oh!

mlx-server b12

mlx-server b12-1c7c512

Changelog

Download

Quick Start

Uh oh!

mlx-server b8

mlx-server b8-2dce45c

Changelog

Download

Quick Start

Uh oh!

mlx-server b7

mlx-server b7-6bd9224

Changelog

Download

Quick Start

Uh oh!

mlx-server b6

mlx-server b6-78ff596

Changelog

Download

Quick Start

Uh oh!

mlx-server b5

mlx-server b5-47916b8

Quick Start

Uh oh!