Skip to content

Releases: SharpAI/mlx-server

mlx-server b21

24 Mar 16:58
2d382ef

Choose a tag to compare

mlx-server b21-2d382ef

Merge pull request #3 from SharpAI/feature/api-parity-roadmap

Feature/api parity roadmap

Changelog

  • fix: CI — install mlx.metallib from Python mlx package (4086ce9)
  • feat: Prompt caching — reduce TTFT by reusing system prompt KV state (6b34c97)
  • feat: API key authentication (--api-key flag) (75e927d)
  • feat: Phase 3 — Memory limit, /metrics, enhanced /health, graceful shutdown, stats (433d90e)
  • feat: Phase 2 — JSON mode, VLM vision support, multipart content, extra sampling params (bfc980a)
  • feat: Phase 1 API parity with mlx-lm (519bfda)

Download

Quick Start

tar -xzf mlx-server-b21-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b14

23 Mar 01:45
689df93

Choose a tag to compare

mlx-server b14-689df93

Merge pull request #2 from SharpAI/feature/qwen35-support

feat: auto-detect local directory path for --model flag

Changelog

  • feat: auto-detect local directory path for --model flag (1232bc9)

Download

Quick Start

tar -xzf mlx-server-b14-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b12

22 Mar 19:59
1c7c512

Choose a tag to compare

mlx-server b12-1c7c512

Merge pull request #1 from SharpAI/feature/qwen35-support

Feature/qwen35 support

Changelog

  • feat: add full OpenAI-compatible tool calling support (19717cf)
  • feat: add --thinking flag to disable thinking mode by default (Qwen3.5) (91ee743)
  • feat: update mlx-swift-lm to SharpAI fork main branch for Qwen3.5 support (3e1f923)

Download

Quick Start

tar -xzf mlx-server-b12-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b8

21 Mar 23:12

Choose a tag to compare

mlx-server b8-2dce45c

feat: add Aegis integration events

  • Emit JSON-line ready event on stdout after server starts:
    {"event":"ready","port":5413,"model":"...","engine":"mlx","vision":false}
  • Add approximate token counting in non-streaming responses
    (prompt chars/4 heuristic + completion chunk count)
  • Note: Aegis detects process exit for stopped state automatically

Changelog

  • feat: add Aegis integration events (2dce45c)

Download

Quick Start

tar -xzf mlx-server-b8-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b7

21 Mar 16:25

Choose a tag to compare

mlx-server b7-6bd9224

Add context window, temperature, parallel requests, and sampling controls

New CLI flags (matching llama-server patterns):
--ctx-size N Context window / KV cache size (sliding window)
--temp N Default sampling temperature (default: 0.6)
--top-p N Top-p nucleus sampling (default: 1.0)
--repeat-penalty N Repetition penalty factor
--parallel N Max concurrent request slots (default: 1)

Per-request overrides via JSON body:
temperature, top_p, repetition_penalty, max_tokens

Concurrency control:
AsyncSemaphore actor limits concurrent inference tasks

Also: commit Package.resolved for reproducible builds

Changelog

  • Add context window, temperature, parallel requests, and sampling controls (6bd9224)

Download

Quick Start

tar -xzf mlx-server-b7-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b6

21 Mar 15:34

Choose a tag to compare

mlx-server b6-78ff596

Auto-generate release notes from commit messages + changelog

  • Release body now includes the triggering commit message (like llama.cpp)
  • Auto-generates changelog from git log since last release tag
  • Includes download link and quick start in release notes

Changelog

  • Auto-generate release notes from commit messages + changelog (78ff596)

Download

Quick Start

tar -xzf mlx-server-b6-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b5

21 Mar 07:08

Choose a tag to compare

mlx-server b5-47916b8

Native Swift MLX server with OpenAI-compatible API for Apple Silicon.

Commit: 47916b8

Quick Start

tar -xzf mlx-server-b5-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.