MCP server that dispatches prompts to multiple AI models in parallel. Built in Rust for Claude Code.
No single model finds everything. Different models have different strengths, different blind spots, and different failure modes. When you send the same review to five models, you get additive signal — one catches concurrency bugs, another spots auth gaps, a third finds edge cases in error handling.
The overlap gives confidence. The divergence gives coverage. One model consistently finds resource leaks while another catches configuration gaps neither would find alone. The redundancy isn't waste — it's the point.
Squall tracks every model's performance — latency, success rate, failure modes — and Claude uses those metrics to pick the best ensemble for each review. A model that keeps timing out gets benched. A model that shines on security-sensitive code gets picked when auth files change. The selection adapts over time.
Squall was used to build and validate itself — ~10k lines of Rust and ~13k lines of tests, written and reviewed over the course of a few days with continuous multi-model feedback at every step.
- Rust toolchain (stable)
- Claude Code CLI installed (
claudecommand available) - At least one API key (see below)
- Optional: Gemini CLI and/or Codex CLI for free CLI models
git clone https://github.com/DSado88/squall.git
cd squall
cp .env.example .env
# Fill in your API keys — see .env.example for signup links
./install.shThe install script:
- Checks that
.envexists and has at least one API key - Builds the release binary and copies it to
~/.local/bin/squall - Registers Squall as a global MCP server in
~/.claude.json(injects your API keys) - Installs skills (slash commands) to
~/.claude/skills/
Restart Claude Code (or run /mcp) to pick up the new server.
Fill in what you have, skip what you don't. Models only load when their key is set.
| Variable | Unlocks | Signup |
|---|---|---|
TOGETHER_API_KEY |
Kimi K2.5, DeepSeek V3.1, Qwen 3.5, Qwen3 Coder | together.xyz |
XAI_API_KEY |
Grok | console.x.ai |
OPENROUTER_API_KEY |
GLM-5 | openrouter.ai |
DEEPSEEK_API_KEY |
DeepSeek R1 | platform.deepseek.com |
MISTRAL_API_KEY |
Mistral Large | console.mistral.ai |
OPENAI_API_KEY |
o3-deep-research, o4-mini-deep-research | platform.openai.com |
GOOGLE_API_KEY |
deep-research-pro | aistudio.google.com |
CLI models (gemini, codex) use their respective CLI tools with OAuth authentication — no API key needed, but usage may be subject to each provider's terms and rate limits. Install and authenticate the Gemini CLI and Codex CLI separately.
Ask Claude Code: "list the available squall models". If Squall is connected, it will call listmodels and show what's available.
After pulling new changes:
./install.sh # rebuild + reinstall
./install.sh --skills # skills only (skip build)
./install.sh --build # build only (skip skills)Squall exposes seven tools to Claude Code.
The flagship tool. Fan out a prompt to multiple models in parallel. Each model can get a different expertise lens via per_model_system_prompts — one focused on security, another on correctness, another on architecture.
Returns when all models finish or the straggler cutoff fires (default 180s). Models that don't finish in time return partial results. Results persist to .squall/reviews/ so they survive context compaction — if Claude's context window resets, the results_file path still works.
Key parameters:
models— which models to query (defaults to config if omitted)per_model_system_prompts— map of model name to expertise lensdeep: true— raises timeout to 600s, reasoning effort to high, max tokens to 16384diff— unified diff text to include in the promptfile_paths+working_directory— source files injected as context
Models with less than 70% success rate (over 5+ reviews) are automatically excluded by a hard gate. This prevents known-broken models from wasting dispatch slots.
Query a single model via HTTP (OpenAI-compatible API). Pass file_paths and working_directory to inject source files as context. Good for one-off questions to a specific model.
Query a single CLI model (gemini, codex) as a subprocess. The model gets filesystem access via its native CLI — it can read your code directly. Useful when you need a model that can see the full project, not just the files you pass.
List all available models with metadata: provider, backend, speed tier, precision tier, strengths, and weaknesses. Call this before review to see what's available.
Save a learning to persistent memory. Three categories:
- pattern — a recurring finding across reviews (e.g., "JoinError after abort silently drops panics")
- tactic — a prompt strategy that works (e.g., "Kimi needs a security lens to find real bugs")
- recommend — a model recommendation (e.g., "deepseek-v3.1 is fastest for Rust reviews")
Duplicate patterns auto-merge with evidence counting. Patterns reaching 5 occurrences get confirmed status. Scoped to branch or codebase, auto-detected from git context.
Read persistent memory. Returns model performance stats, recurring patterns, proven prompt tactics, or model recommendations with recency-weighted confidence scores. Call this before reviews to inform model selection and lens assignment.
Clean up branch-scoped memory after a PR merge. Graduates high-evidence patterns to codebase scope, archives the rest, and prunes model events older than 30 days.
Three dispatch backends: HTTP (OpenAI-compatible), CLI (subprocess, OAuth), and async-poll (deep research, launch-then-poll).
| Model | Provider | Backend | Speed | Best for |
|---|---|---|---|---|
grok |
xAI | HTTP | fast | Quick triage, broad coverage |
gemini |
CLI (OAuth) | medium | Systems-level bugs, concurrency | |
codex |
OpenAI | CLI (OAuth) | medium | Highest precision, zero false positives |
kimi-k2.5 |
Together | HTTP | medium | Edge cases, adversarial scenarios |
deepseek-v3.1 |
Together | HTTP | medium | Strong coder, finds real bugs |
deepseek-r1 |
DeepSeek | HTTP | medium | Deep reasoning, logic-heavy analysis |
qwen-3.5 |
Together | HTTP | medium | Pattern matching, multilingual |
qwen3-coder |
Together | HTTP | medium | Purpose-built for code review |
z-ai/glm-5 |
OpenRouter | HTTP | medium | Architectural framing |
mistral-large |
Mistral | HTTP | fast | Efficient, multilingual |
o3-deep-research |
OpenAI | async-poll | minutes | Deep web research |
o4-mini-deep-research |
OpenAI | async-poll | minutes | Faster deep research |
deep-research-pro |
async-poll | minutes | Google-powered deep research |
All models are configurable via TOML. Add your own models, swap providers, or override defaults.
Squall uses a three-layer TOML config system. Later layers override earlier ones:
- Built-in defaults — 13 models, 5 providers, shipped with the binary
- User config (
~/.config/squall/config.toml) — personal overrides - Project config (
.squall/config.toml) — project-specific settings
[providers.custom]
base_url = "https://my-api.example.com/v1/chat/completions"
api_key_env = "CUSTOM_API_KEY"
[models.my-model]
provider = "custom"
backend = "http"
description = "My custom model"
speed_tier = "fast"
strengths = ["domain expertise"]When models is omitted from a review call, Squall dispatches to these defaults:
[review]
default_models = ["gemini", "codex", "grok"]Override in your user or project config to change the default ensemble.
Squall learns from every review and uses what it learns to make better decisions next time.
Three files in .squall/memory/:
-
models.md — Per-model performance stats (latency, success rate, common failures). Updated automatically after every review. Claude reads this before each review to pick models, and Squall's hard gate uses it to auto-exclude models below 70% success rate.
-
patterns.md — Recurring findings across reviews with evidence counting. Patterns found by multiple models in multiple reviews get confirmed status. Capped at 50 entries with automatic pruning.
-
tactics.md — Proven system prompts and model+lens combinations. Claude reads this to assign the right expertise lens to each model — e.g., "Kimi performs best with a security-focused lens on Rust code."
- Before review — Claude calls
memoryto check which models are performing well, which lenses work, and what patterns keep recurring. This drives model selection and prompt assignment. - After review — Claude calls
memorizeto record what worked: which model found what, which lens was effective, which model missed obvious things. - After PR merge — call
flushwith the branch name. Graduates high-evidence patterns to codebase scope, archives the rest.
The result: reviews get better over time. Models that consistently fail get excluded. Lens assignments that produce good results get reused. The system adapts without manual tuning.
When built with the global-memory feature (enabled by default), Squall also maintains a DuckDB database at ~/.local/share/squall/global.duckdb. This tracks model performance across all projects:
- Cross-project intelligence — latency percentiles, success rates, and token costs aggregated across every project you use Squall in. A model that's fast for Python reviews but slow for Rust reviews will show different stats per project.
- Automatic recording — every
chat,clink, andreviewcall records a model event (latency, tokens, success/failure, project context). No manual action needed. - Global recommendations —
memorywith categoryrecommendreturns recency-weighted model recommendations informed by all your projects, not just the current one. - Local-first — the database lives on your machine. Nothing is sent anywhere.
To disable: build with --no-default-features. The file-based memory (.squall/memory/) works independently and is always available.
Squall ships with Claude Code skills — prompt templates that teach Claude how to orchestrate the tools. You trigger them with natural language or slash commands:
| You say | Skill | What happens |
|---|---|---|
| "review", "review this diff", "code review" | squall-unified-review |
Auto-depth code review — Claude scores the diff and picks the right depth |
| "deep review", "thorough review" | squall-unified-review |
Forces DEEP depth — full investigation + more models + longer timeouts |
| "swarm review", "team review" | squall-unified-review |
Forces SWARM depth — 3 independent agent teams (security, correctness, architecture) |
| "quick review", "quick check" | squall-unified-review |
Forces QUICK depth — single fast model, instant triage |
| "research [topic]" | squall-research |
Team swarm — multiple agents investigating different vectors in parallel |
| "deep research [question]" | squall-deep-research |
Web-sourced research via Codex and Gemini deep research |
Claude automatically picks the right review intensity based on what changed:
| Depth | When | Models | What's different |
|---|---|---|---|
| QUICK | Small non-critical changes | 1 (grok) | Fast triage, no parallel dispatch |
| STANDARD | Normal PRs | 5 (3 core + 2 picked by memory stats) | Per-model lenses, Claude agent for local investigation |
| DEEP | Security, auth, critical infra | 5+ models, deep mode | Claude investigates first, forms hypotheses, then models + agent validate in parallel |
| SWARM | Large + security + memory patterns | 3 agents × 3 models each | 3 independent investigation agents (security, correctness, architecture), each with local shell access + its own Squall review dispatch |
For STANDARD and DEEP, Claude spawns a background agent alongside the external model dispatch. The external models only see what's in the prompt — the agent has full access to your codebase. It reads changed files, traces callers, checks test coverage, runs git blame, and greps for related patterns. This is the perspective that static text analysis can't provide: cross-file interactions, test gaps, and git history context.
For SWARM, Claude spawns 3 independent agents via agent teams — each with a different lens (security, correctness, architecture). Each agent does its own local investigation AND dispatches its own Squall review with 3 models. The team lead synthesizes across all agents using a cross-reference matrix. SWARM degrades gracefully to DEEP if agent teams are unavailable.
Claude reads memory before each review to check model success rates, proven tactics, and recurring patterns — then picks the best ensemble for this specific diff. You can always override: "deep review" forces DEEP, "quick review" forces QUICK, "swarm review" forces SWARM.
Skills are markdown files in .claude/skills/. They teach Claude how to use the tools — they don't change the server.
SWARM reviews and research swarms require Claude Code's experimental agent teams feature:
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}Without this, SWARM auto-degrades to DEEP. Research swarms won't work at all.
Claude Code (orchestrator)
|
+-- review -----> fan out to N models in parallel
| |-- HTTP models get file content injected as context
| |-- CLI models get filesystem access via subprocess
| +-- straggler cutoff returns partial results for slow models
|
+-- memory/memorize/flush --> .squall/memory/ (per-project learning)
| \-> ~/.local/share/squall/global.duckdb (cross-project stats)
|
+-- chat/clink --> single model query
|
+-- listmodels --> model discovery with metadata
Claude is the intelligence. Squall is transport + memory. Claude decides what to ask, which models to query, and how to synthesize results. Squall handles authenticated dispatch, file context injection, parallel fan-out, and persistent learning — both per-project (markdown files) and cross-project (DuckDB).
- Path sandboxing — rejects absolute paths,
..traversal, and symlink escapes - No shell — CLI dispatch uses direct exec with discrete args, no shell interpolation
- Process group kill — timeouts kill the entire process tree via
kill(-pgid), not just the leader - Five-layer timeouts — per-model (configurable), straggler cutoff, MCP deadline, HTTP client timeout, process group kill
- Capped reads — HTTP responses: 2MB. CLI output: capped. File context: pre-checked via metadata
- Concurrency limits — semaphores: 8 HTTP, 4 CLI, 4 async-poll. Prevents resource exhaustion under parallel fan-out
- No cascade errors — MCP results never set
is_error: true, preventing Claude Code sibling tool failures - Error sanitization — user-facing messages never leak internal URLs or credentials
- Input sanitization — all user inputs (content, tags, metadata, scope) are sanitized against newline injection in memory files
cargo build
cargo test
cargo clippy --all-targets -- -D warningsAll tests must pass. Zero clippy warnings. Clippy all lints are denied.
Run the full check suite locally before pushing:
./scripts/pre-commit.shThis runs: rustfmt --check, clippy (default + no-default-features), tests (default + no-default-features). The same checks run in CI on every push and PR.
To the built-in defaults — add a [models.name] entry to BUILTIN_DEFAULTS in src/config.rs. HTTP models need a provider with base_url and api_key_env. CLI models need a parser in src/dispatch/cli.rs.
For personal use — add to ~/.config/squall/config.toml. Same TOML format, no code changes needed.
- One feature per PR
- Tests for new behavior
./scripts/pre-commit.shclean before submitting