Squall

MCP server that dispatches prompts to multiple AI models in parallel. Built in Rust for Claude Code.

Why multiple models

No single model finds everything. Different models have different strengths, different blind spots, and different failure modes. When you send the same review to five models, you get additive signal — one catches concurrency bugs, another spots auth gaps, a third finds edge cases in error handling.

The overlap gives confidence. The divergence gives coverage. One model consistently finds resource leaks while another catches configuration gaps neither would find alone. The redundancy isn't waste — it's the point.

Squall tracks every model's performance — latency, success rate, failure modes — and Claude uses those metrics to pick the best ensemble for each review. A model that keeps timing out gets benched. A model that shines on security-sensitive code gets picked when auth files change. The selection adapts over time.

Squall was used to build and validate itself — ~10k lines of Rust and ~13k lines of tests, written and reviewed over the course of a few days with continuous multi-model feedback at every step.

Quick start

Prerequisites

Rust toolchain (stable)
Claude Code CLI installed (claude command available)
At least one API key (see below)
Optional: Gemini CLI and/or Codex CLI for free CLI models

Install

git clone https://github.com/DSado88/squall.git
cd squall
cp .env.example .env
# Fill in your API keys — see .env.example for signup links
./install.sh

The install script:

Checks that .env exists and has at least one API key
Builds the release binary and copies it to ~/.local/bin/squall
Registers Squall as a global MCP server in ~/.claude.json (injects your API keys)
Installs skills (slash commands) to ~/.claude/skills/

Restart Claude Code (or run /mcp) to pick up the new server.

API keys

Fill in what you have, skip what you don't. Models only load when their key is set.

Variable	Unlocks	Signup
`TOGETHER_API_KEY`	Kimi K2.5, DeepSeek V3.1, Qwen 3.5, Qwen3 Coder	together.xyz
`XAI_API_KEY`	Grok	console.x.ai
`OPENROUTER_API_KEY`	GLM-5	openrouter.ai
`DEEPSEEK_API_KEY`	DeepSeek R1	platform.deepseek.com
`MISTRAL_API_KEY`	Mistral Large	console.mistral.ai
`OPENAI_API_KEY`	o3-deep-research, o4-mini-deep-research	platform.openai.com
`GOOGLE_API_KEY`	deep-research-pro	aistudio.google.com

CLI models (gemini, codex) use their respective CLI tools with OAuth authentication — no API key needed, but usage may be subject to each provider's terms and rate limits. Install and authenticate the Gemini CLI and Codex CLI separately.

Verify

Ask Claude Code: "list the available squall models". If Squall is connected, it will call listmodels and show what's available.

Updating

After pulling new changes:

./install.sh          # rebuild + reinstall
./install.sh --skills # skills only (skip build)
./install.sh --build  # build only (skip skills)

Tools

Squall exposes seven tools to Claude Code.

review

The flagship tool. Fan out a prompt to multiple models in parallel. Each model can get a different expertise lens via per_model_system_prompts — one focused on security, another on correctness, another on architecture.

Returns when all models finish or the straggler cutoff fires (default 180s). Models that don't finish in time return partial results. Results persist to .squall/reviews/ so they survive context compaction — if Claude's context window resets, the results_file path still works.

Key parameters:

models — which models to query (defaults to config if omitted)
per_model_system_prompts — map of model name to expertise lens
deep: true — raises timeout to 600s, reasoning effort to high, max tokens to 16384
diff — unified diff text to include in the prompt
file_paths + working_directory — source files injected as context

Models with less than 70% success rate (over 5+ reviews) are automatically excluded by a hard gate. This prevents known-broken models from wasting dispatch slots.

chat

Query a single model via HTTP (OpenAI-compatible API). Pass file_paths and working_directory to inject source files as context. Good for one-off questions to a specific model.

clink

Query a single CLI model (gemini, codex) as a subprocess. The model gets filesystem access via its native CLI — it can read your code directly. Useful when you need a model that can see the full project, not just the files you pass.

listmodels

List all available models with metadata: provider, backend, speed tier, precision tier, strengths, and weaknesses. Call this before review to see what's available.

memorize

Save a learning to persistent memory. Three categories:

pattern — a recurring finding across reviews (e.g., "JoinError after abort silently drops panics")
tactic — a prompt strategy that works (e.g., "Kimi needs a security lens to find real bugs")
recommend — a model recommendation (e.g., "deepseek-v3.1 is fastest for Rust reviews")

Duplicate patterns auto-merge with evidence counting. Patterns reaching 5 occurrences get confirmed status. Scoped to branch or codebase, auto-detected from git context.

memory

Read persistent memory. Returns model performance stats, recurring patterns, proven prompt tactics, or model recommendations with recency-weighted confidence scores. Call this before reviews to inform model selection and lens assignment.

flush

Clean up branch-scoped memory after a PR merge. Graduates high-evidence patterns to codebase scope, archives the rest, and prunes model events older than 30 days.

Models

Three dispatch backends: HTTP (OpenAI-compatible), CLI (subprocess, OAuth), and async-poll (deep research, launch-then-poll).

Model	Provider	Backend	Speed	Best for
`grok`	xAI	HTTP	fast	Quick triage, broad coverage
`gemini`	Google	CLI (OAuth)	medium	Systems-level bugs, concurrency
`codex`	OpenAI	CLI (OAuth)	medium	Highest precision, zero false positives
`kimi-k2.5`	Together	HTTP	medium	Edge cases, adversarial scenarios
`deepseek-v3.1`	Together	HTTP	medium	Strong coder, finds real bugs
`deepseek-r1`	DeepSeek	HTTP	medium	Deep reasoning, logic-heavy analysis
`qwen-3.5`	Together	HTTP	medium	Pattern matching, multilingual
`qwen3-coder`	Together	HTTP	medium	Purpose-built for code review
`z-ai/glm-5`	OpenRouter	HTTP	medium	Architectural framing
`mistral-large`	Mistral	HTTP	fast	Efficient, multilingual
`o3-deep-research`	OpenAI	async-poll	minutes	Deep web research
`o4-mini-deep-research`	OpenAI	async-poll	minutes	Faster deep research
`deep-research-pro`	Google	async-poll	minutes	Google-powered deep research

All models are configurable via TOML. Add your own models, swap providers, or override defaults.

Configuration

Squall uses a three-layer TOML config system. Later layers override earlier ones:

Built-in defaults — 13 models, 5 providers, shipped with the binary
User config (~/.config/squall/config.toml) — personal overrides
Project config (.squall/config.toml) — project-specific settings

Adding a custom model

[providers.custom]
base_url = "https://my-api.example.com/v1/chat/completions"
api_key_env = "CUSTOM_API_KEY"

[models.my-model]
provider = "custom"
backend = "http"
description = "My custom model"
speed_tier = "fast"
strengths = ["domain expertise"]

Review defaults

When models is omitted from a review call, Squall dispatches to these defaults:

[review]
default_models = ["gemini", "codex", "grok"]

Override in your user or project config to change the default ensemble.

Memory

Squall learns from every review and uses what it learns to make better decisions next time.

Three files in .squall/memory/:

models.md — Per-model performance stats (latency, success rate, common failures). Updated automatically after every review. Claude reads this before each review to pick models, and Squall's hard gate uses it to auto-exclude models below 70% success rate.
patterns.md — Recurring findings across reviews with evidence counting. Patterns found by multiple models in multiple reviews get confirmed status. Capped at 50 entries with automatic pruning.
tactics.md — Proven system prompts and model+lens combinations. Claude reads this to assign the right expertise lens to each model — e.g., "Kimi performs best with a security-focused lens on Rust code."

The learning loop

Before review — Claude calls memory to check which models are performing well, which lenses work, and what patterns keep recurring. This drives model selection and prompt assignment.
After review — Claude calls memorize to record what worked: which model found what, which lens was effective, which model missed obvious things.
After PR merge — call flush with the branch name. Graduates high-evidence patterns to codebase scope, archives the rest.

The result: reviews get better over time. Models that consistently fail get excluded. Lens assignments that produce good results get reused. The system adapts without manual tuning.

Global memory

When built with the global-memory feature (enabled by default), Squall also maintains a DuckDB database at ~/.local/share/squall/global.duckdb. This tracks model performance across all projects:

Cross-project intelligence — latency percentiles, success rates, and token costs aggregated across every project you use Squall in. A model that's fast for Python reviews but slow for Rust reviews will show different stats per project.
Automatic recording — every chat, clink, and review call records a model event (latency, tokens, success/failure, project context). No manual action needed.
Global recommendations — memory with category recommend returns recency-weighted model recommendations informed by all your projects, not just the current one.
Local-first — the database lives on your machine. Nothing is sent anywhere.

To disable: build with --no-default-features. The file-based memory (.squall/memory/) works independently and is always available.

Skills

Squall ships with Claude Code skills — prompt templates that teach Claude how to orchestrate the tools. You trigger them with natural language or slash commands:

You say	Skill	What happens
"review", "review this diff", "code review"	`squall-unified-review`	Auto-depth code review — Claude scores the diff and picks the right depth
"deep review", "thorough review"	`squall-unified-review`	Forces DEEP depth — full investigation + more models + longer timeouts
"swarm review", "team review"	`squall-unified-review`	Forces SWARM depth — 3 independent agent teams (security, correctness, architecture)
"quick review", "quick check"	`squall-unified-review`	Forces QUICK depth — single fast model, instant triage
"research [topic]"	`squall-research`	Team swarm — multiple agents investigating different vectors in parallel
"deep research [question]"	`squall-deep-research`	Web-sourced research via Codex and Gemini deep research

Auto-depth review

Claude automatically picks the right review intensity based on what changed:

Depth	When	Models	What's different
QUICK	Small non-critical changes	1 (grok)	Fast triage, no parallel dispatch
STANDARD	Normal PRs	5 (3 core + 2 picked by memory stats)	Per-model lenses, Claude agent for local investigation
DEEP	Security, auth, critical infra	5+ models, deep mode	Claude investigates first, forms hypotheses, then models + agent validate in parallel
SWARM	Large + security + memory patterns	3 agents × 3 models each	3 independent investigation agents (security, correctness, architecture), each with local shell access + its own Squall review dispatch

For STANDARD and DEEP, Claude spawns a background agent alongside the external model dispatch. The external models only see what's in the prompt — the agent has full access to your codebase. It reads changed files, traces callers, checks test coverage, runs git blame, and greps for related patterns. This is the perspective that static text analysis can't provide: cross-file interactions, test gaps, and git history context.

For SWARM, Claude spawns 3 independent agents via agent teams — each with a different lens (security, correctness, architecture). Each agent does its own local investigation AND dispatches its own Squall review with 3 models. The team lead synthesizes across all agents using a cross-reference matrix. SWARM degrades gracefully to DEEP if agent teams are unavailable.

Claude reads memory before each review to check model success rates, proven tactics, and recurring patterns — then picks the best ensemble for this specific diff. You can always override: "deep review" forces DEEP, "quick review" forces QUICK, "swarm review" forces SWARM.

Skills are markdown files in .claude/skills/. They teach Claude how to use the tools — they don't change the server.

SWARM reviews and research swarms require Claude Code's experimental agent teams feature:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

Without this, SWARM auto-degrades to DEEP. Research swarms won't work at all.

How it works

Claude Code (orchestrator)
    |
    +-- review -----> fan out to N models in parallel
    |                   |-- HTTP models get file content injected as context
    |                   |-- CLI models get filesystem access via subprocess
    |                   +-- straggler cutoff returns partial results for slow models
    |
    +-- memory/memorize/flush --> .squall/memory/ (per-project learning)
    |                        \-> ~/.local/share/squall/global.duckdb (cross-project stats)
    |
    +-- chat/clink --> single model query
    |
    +-- listmodels --> model discovery with metadata

Claude is the intelligence. Squall is transport + memory. Claude decides what to ask, which models to query, and how to synthesize results. Squall handles authenticated dispatch, file context injection, parallel fan-out, and persistent learning — both per-project (markdown files) and cross-project (DuckDB).

Safety

Path sandboxing — rejects absolute paths, .. traversal, and symlink escapes
No shell — CLI dispatch uses direct exec with discrete args, no shell interpolation
Process group kill — timeouts kill the entire process tree via kill(-pgid), not just the leader
Five-layer timeouts — per-model (configurable), straggler cutoff, MCP deadline, HTTP client timeout, process group kill
Capped reads — HTTP responses: 2MB. CLI output: capped. File context: pre-checked via metadata
Concurrency limits — semaphores: 8 HTTP, 4 CLI, 4 async-poll. Prevents resource exhaustion under parallel fan-out
No cascade errors — MCP results never set is_error: true, preventing Claude Code sibling tool failures
Error sanitization — user-facing messages never leak internal URLs or credentials
Input sanitization — all user inputs (content, tags, metadata, scope) are sanitized against newline injection in memory files

Contributing

Setup

cargo build
cargo test
cargo clippy --all-targets -- -D warnings

All tests must pass. Zero clippy warnings. Clippy all lints are denied.

Pre-commit

Run the full check suite locally before pushing:

./scripts/pre-commit.sh

This runs: rustfmt --check, clippy (default + no-default-features), tests (default + no-default-features). The same checks run in CI on every push and PR.

Adding a model

To the built-in defaults — add a [models.name] entry to BUILTIN_DEFAULTS in src/config.rs. HTTP models need a provider with base_url and api_key_env. CLI models need a parser in src/dispatch/cli.rs.

For personal use — add to ~/.config/squall/config.toml. Same TOML format, no code changes needed.

Pull requests

One feature per PR
Tests for new behavior
./scripts/pre-commit.sh clean before submitting

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.claude		.claude
.github/workflows		.github/workflows
.squall		.squall
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
clippy.toml		clippy.toml
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Squall

Why multiple models

Quick start

Prerequisites

Install

API keys

Verify

Updating

Tools

review

chat

clink

listmodels

memorize

memory

flush

Models

Configuration

Adding a custom model

Review defaults

Memory

The learning loop

Global memory

Skills

Auto-depth review

How it works

Safety

Contributing

Setup

Pre-commit

Adding a model

Pull requests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages