Skip to content

DSado88/squall

Repository files navigation

Squall

MCP server that dispatches prompts to multiple AI models in parallel. Built in Rust for Claude Code.

Why multiple models

No single model finds everything. Different models have different strengths, different blind spots, and different failure modes. When you send the same review to five models, you get additive signal — one catches concurrency bugs, another spots auth gaps, a third finds edge cases in error handling.

The overlap gives confidence. The divergence gives coverage. One model consistently finds resource leaks while another catches configuration gaps neither would find alone. The redundancy isn't waste — it's the point.

Squall tracks every model's performance — latency, success rate, failure modes — and Claude uses those metrics to pick the best ensemble for each review. A model that keeps timing out gets benched. A model that shines on security-sensitive code gets picked when auth files change. The selection adapts over time.

Squall was used to build and validate itself — ~10k lines of Rust and ~13k lines of tests, written and reviewed over the course of a few days with continuous multi-model feedback at every step.

Quick start

Prerequisites

Install

git clone https://github.com/DSado88/squall.git
cd squall
cp .env.example .env
# Fill in your API keys — see .env.example for signup links
./install.sh

The install script:

  1. Checks that .env exists and has at least one API key
  2. Builds the release binary and copies it to ~/.local/bin/squall
  3. Registers Squall as a global MCP server in ~/.claude.json (injects your API keys)
  4. Installs skills (slash commands) to ~/.claude/skills/

Restart Claude Code (or run /mcp) to pick up the new server.

API keys

Fill in what you have, skip what you don't. Models only load when their key is set.

Variable Unlocks Signup
TOGETHER_API_KEY Kimi K2.5, DeepSeek V3.1, Qwen 3.5, Qwen3 Coder together.xyz
XAI_API_KEY Grok console.x.ai
OPENROUTER_API_KEY GLM-5 openrouter.ai
DEEPSEEK_API_KEY DeepSeek R1 platform.deepseek.com
MISTRAL_API_KEY Mistral Large console.mistral.ai
OPENAI_API_KEY o3-deep-research, o4-mini-deep-research platform.openai.com
GOOGLE_API_KEY deep-research-pro aistudio.google.com

CLI models (gemini, codex) use their respective CLI tools with OAuth authentication — no API key needed, but usage may be subject to each provider's terms and rate limits. Install and authenticate the Gemini CLI and Codex CLI separately.

Verify

Ask Claude Code: "list the available squall models". If Squall is connected, it will call listmodels and show what's available.

Updating

After pulling new changes:

./install.sh          # rebuild + reinstall
./install.sh --skills # skills only (skip build)
./install.sh --build  # build only (skip skills)

Tools

Squall exposes seven tools to Claude Code.

review

The flagship tool. Fan out a prompt to multiple models in parallel. Each model can get a different expertise lens via per_model_system_prompts — one focused on security, another on correctness, another on architecture.

Returns when all models finish or the straggler cutoff fires (default 180s). Models that don't finish in time return partial results. Results persist to .squall/reviews/ so they survive context compaction — if Claude's context window resets, the results_file path still works.

Key parameters:

  • models — which models to query (defaults to config if omitted)
  • per_model_system_prompts — map of model name to expertise lens
  • deep: true — raises timeout to 600s, reasoning effort to high, max tokens to 16384
  • diff — unified diff text to include in the prompt
  • file_paths + working_directory — source files injected as context

Models with less than 70% success rate (over 5+ reviews) are automatically excluded by a hard gate. This prevents known-broken models from wasting dispatch slots.

chat

Query a single model via HTTP (OpenAI-compatible API). Pass file_paths and working_directory to inject source files as context. Good for one-off questions to a specific model.

clink

Query a single CLI model (gemini, codex) as a subprocess. The model gets filesystem access via its native CLI — it can read your code directly. Useful when you need a model that can see the full project, not just the files you pass.

listmodels

List all available models with metadata: provider, backend, speed tier, precision tier, strengths, and weaknesses. Call this before review to see what's available.

memorize

Save a learning to persistent memory. Three categories:

  • pattern — a recurring finding across reviews (e.g., "JoinError after abort silently drops panics")
  • tactic — a prompt strategy that works (e.g., "Kimi needs a security lens to find real bugs")
  • recommend — a model recommendation (e.g., "deepseek-v3.1 is fastest for Rust reviews")

Duplicate patterns auto-merge with evidence counting. Patterns reaching 5 occurrences get confirmed status. Scoped to branch or codebase, auto-detected from git context.

memory

Read persistent memory. Returns model performance stats, recurring patterns, proven prompt tactics, or model recommendations with recency-weighted confidence scores. Call this before reviews to inform model selection and lens assignment.

flush

Clean up branch-scoped memory after a PR merge. Graduates high-evidence patterns to codebase scope, archives the rest, and prunes model events older than 30 days.

Models

Three dispatch backends: HTTP (OpenAI-compatible), CLI (subprocess, OAuth), and async-poll (deep research, launch-then-poll).

Model Provider Backend Speed Best for
grok xAI HTTP fast Quick triage, broad coverage
gemini Google CLI (OAuth) medium Systems-level bugs, concurrency
codex OpenAI CLI (OAuth) medium Highest precision, zero false positives
kimi-k2.5 Together HTTP medium Edge cases, adversarial scenarios
deepseek-v3.1 Together HTTP medium Strong coder, finds real bugs
deepseek-r1 DeepSeek HTTP medium Deep reasoning, logic-heavy analysis
qwen-3.5 Together HTTP medium Pattern matching, multilingual
qwen3-coder Together HTTP medium Purpose-built for code review
z-ai/glm-5 OpenRouter HTTP medium Architectural framing
mistral-large Mistral HTTP fast Efficient, multilingual
o3-deep-research OpenAI async-poll minutes Deep web research
o4-mini-deep-research OpenAI async-poll minutes Faster deep research
deep-research-pro Google async-poll minutes Google-powered deep research

All models are configurable via TOML. Add your own models, swap providers, or override defaults.

Configuration

Squall uses a three-layer TOML config system. Later layers override earlier ones:

  1. Built-in defaults — 13 models, 5 providers, shipped with the binary
  2. User config (~/.config/squall/config.toml) — personal overrides
  3. Project config (.squall/config.toml) — project-specific settings

Adding a custom model

[providers.custom]
base_url = "https://my-api.example.com/v1/chat/completions"
api_key_env = "CUSTOM_API_KEY"

[models.my-model]
provider = "custom"
backend = "http"
description = "My custom model"
speed_tier = "fast"
strengths = ["domain expertise"]

Review defaults

When models is omitted from a review call, Squall dispatches to these defaults:

[review]
default_models = ["gemini", "codex", "grok"]

Override in your user or project config to change the default ensemble.

Memory

Squall learns from every review and uses what it learns to make better decisions next time.

Three files in .squall/memory/:

  • models.md — Per-model performance stats (latency, success rate, common failures). Updated automatically after every review. Claude reads this before each review to pick models, and Squall's hard gate uses it to auto-exclude models below 70% success rate.

  • patterns.md — Recurring findings across reviews with evidence counting. Patterns found by multiple models in multiple reviews get confirmed status. Capped at 50 entries with automatic pruning.

  • tactics.md — Proven system prompts and model+lens combinations. Claude reads this to assign the right expertise lens to each model — e.g., "Kimi performs best with a security-focused lens on Rust code."

The learning loop

  1. Before review — Claude calls memory to check which models are performing well, which lenses work, and what patterns keep recurring. This drives model selection and prompt assignment.
  2. After review — Claude calls memorize to record what worked: which model found what, which lens was effective, which model missed obvious things.
  3. After PR merge — call flush with the branch name. Graduates high-evidence patterns to codebase scope, archives the rest.

The result: reviews get better over time. Models that consistently fail get excluded. Lens assignments that produce good results get reused. The system adapts without manual tuning.

Global memory

When built with the global-memory feature (enabled by default), Squall also maintains a DuckDB database at ~/.local/share/squall/global.duckdb. This tracks model performance across all projects:

  • Cross-project intelligence — latency percentiles, success rates, and token costs aggregated across every project you use Squall in. A model that's fast for Python reviews but slow for Rust reviews will show different stats per project.
  • Automatic recording — every chat, clink, and review call records a model event (latency, tokens, success/failure, project context). No manual action needed.
  • Global recommendationsmemory with category recommend returns recency-weighted model recommendations informed by all your projects, not just the current one.
  • Local-first — the database lives on your machine. Nothing is sent anywhere.

To disable: build with --no-default-features. The file-based memory (.squall/memory/) works independently and is always available.

Skills

Squall ships with Claude Code skills — prompt templates that teach Claude how to orchestrate the tools. You trigger them with natural language or slash commands:

You say Skill What happens
"review", "review this diff", "code review" squall-unified-review Auto-depth code review — Claude scores the diff and picks the right depth
"deep review", "thorough review" squall-unified-review Forces DEEP depth — full investigation + more models + longer timeouts
"swarm review", "team review" squall-unified-review Forces SWARM depth — 3 independent agent teams (security, correctness, architecture)
"quick review", "quick check" squall-unified-review Forces QUICK depth — single fast model, instant triage
"research [topic]" squall-research Team swarm — multiple agents investigating different vectors in parallel
"deep research [question]" squall-deep-research Web-sourced research via Codex and Gemini deep research

Auto-depth review

Claude automatically picks the right review intensity based on what changed:

Depth When Models What's different
QUICK Small non-critical changes 1 (grok) Fast triage, no parallel dispatch
STANDARD Normal PRs 5 (3 core + 2 picked by memory stats) Per-model lenses, Claude agent for local investigation
DEEP Security, auth, critical infra 5+ models, deep mode Claude investigates first, forms hypotheses, then models + agent validate in parallel
SWARM Large + security + memory patterns 3 agents × 3 models each 3 independent investigation agents (security, correctness, architecture), each with local shell access + its own Squall review dispatch

For STANDARD and DEEP, Claude spawns a background agent alongside the external model dispatch. The external models only see what's in the prompt — the agent has full access to your codebase. It reads changed files, traces callers, checks test coverage, runs git blame, and greps for related patterns. This is the perspective that static text analysis can't provide: cross-file interactions, test gaps, and git history context.

For SWARM, Claude spawns 3 independent agents via agent teams — each with a different lens (security, correctness, architecture). Each agent does its own local investigation AND dispatches its own Squall review with 3 models. The team lead synthesizes across all agents using a cross-reference matrix. SWARM degrades gracefully to DEEP if agent teams are unavailable.

Claude reads memory before each review to check model success rates, proven tactics, and recurring patterns — then picks the best ensemble for this specific diff. You can always override: "deep review" forces DEEP, "quick review" forces QUICK, "swarm review" forces SWARM.

Skills are markdown files in .claude/skills/. They teach Claude how to use the tools — they don't change the server.

SWARM reviews and research swarms require Claude Code's experimental agent teams feature:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

Without this, SWARM auto-degrades to DEEP. Research swarms won't work at all.

How it works

Claude Code (orchestrator)
    |
    +-- review -----> fan out to N models in parallel
    |                   |-- HTTP models get file content injected as context
    |                   |-- CLI models get filesystem access via subprocess
    |                   +-- straggler cutoff returns partial results for slow models
    |
    +-- memory/memorize/flush --> .squall/memory/ (per-project learning)
    |                        \-> ~/.local/share/squall/global.duckdb (cross-project stats)
    |
    +-- chat/clink --> single model query
    |
    +-- listmodels --> model discovery with metadata

Claude is the intelligence. Squall is transport + memory. Claude decides what to ask, which models to query, and how to synthesize results. Squall handles authenticated dispatch, file context injection, parallel fan-out, and persistent learning — both per-project (markdown files) and cross-project (DuckDB).

Safety

  • Path sandboxing — rejects absolute paths, .. traversal, and symlink escapes
  • No shell — CLI dispatch uses direct exec with discrete args, no shell interpolation
  • Process group kill — timeouts kill the entire process tree via kill(-pgid), not just the leader
  • Five-layer timeouts — per-model (configurable), straggler cutoff, MCP deadline, HTTP client timeout, process group kill
  • Capped reads — HTTP responses: 2MB. CLI output: capped. File context: pre-checked via metadata
  • Concurrency limits — semaphores: 8 HTTP, 4 CLI, 4 async-poll. Prevents resource exhaustion under parallel fan-out
  • No cascade errors — MCP results never set is_error: true, preventing Claude Code sibling tool failures
  • Error sanitization — user-facing messages never leak internal URLs or credentials
  • Input sanitization — all user inputs (content, tags, metadata, scope) are sanitized against newline injection in memory files

Contributing

Setup

cargo build
cargo test
cargo clippy --all-targets -- -D warnings

All tests must pass. Zero clippy warnings. Clippy all lints are denied.

Pre-commit

Run the full check suite locally before pushing:

./scripts/pre-commit.sh

This runs: rustfmt --check, clippy (default + no-default-features), tests (default + no-default-features). The same checks run in CI on every push and PR.

Adding a model

To the built-in defaults — add a [models.name] entry to BUILTIN_DEFAULTS in src/config.rs. HTTP models need a provider with base_url and api_key_env. CLI models need a parser in src/dispatch/cli.rs.

For personal use — add to ~/.config/squall/config.toml. Same TOML format, no code changes needed.

Pull requests

  • One feature per PR
  • Tests for new behavior
  • ./scripts/pre-commit.sh clean before submitting

About

Lean Rust MCP server for fast async dispatch to external AI models via HTTP and CLI subprocesses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors