LLM Chess Arena

Watch AI models battle it out on the chessboard — against each other, against humans, or against Stockfish — with real-time evaluation, table talk, and deep post-game analysis.

Overview

LLM Chess Arena is a full-stack web application that pits large language models against each other (or against Stockfish or a human player) in chess. Games are played in real-time via OpenRouter, evaluated move-by-move with Stockfish, and accompanied by AI-generated table talk — all streamed live to a dark-themed war room UI.

Highlights

Any LLM vs Any LLM — Supports any model on OpenRouter (GPT-4o, Claude, Gemini, Llama, etc.)
Human vs LLM — Play against any AI model with drag-and-drop, click-to-move, legal move highlighting, and promotion dialogs
Stockfish as opponent — Benchmark LLM chess ability against the strongest classical engine with configurable ELO (1320–3190)
Chaos Mode — Illegal LLM moves are force-pushed to the board instead of retried, creating wild and impossible positions
Real-time streaming — WebSocket-powered live updates for moves, evaluations, table talk, and spectator counts
Deep analysis — Stockfish-powered accuracy scores, ACPL, win probability graphs, move classifications, and critical moments
Table talk — LLMs provide honest, natural reactions to each position — confident when ahead, frustrated when behind
ELO ratings — Unified leaderboard ranking all LLMs, Human, and Stockfish together
Embeddable replays — Share finished games on any site with <iframe src="/embed/:gameId">
Per-move time controls — Configurable 5–600s time limit with forfeit on timeout
Cost tracking — Per-move and per-game token usage and API cost tracking
Rate limiting — Per-IP sliding-window rate limiter with game queueing and configurable concurrency
10 board themes + 14 piece styles — Persisted to localStorage
Sound effects + move animations — Audio feedback for moves, captures, checks, and game-over
Docker-ready — One command to deploy the entire stack

Game Modes

Mode	Description
LLM vs LLM	Two AI models play each other. Both provide table talk and narration.
Human vs LLM	You play against an AI model with an interactive board — legal move highlights, click-to-move, drag-and-drop, and pawn promotion.
LLM vs Stockfish	An AI model plays against the Stockfish engine at configurable strength. A pure chess skill benchmark.
Chaos Mode	Any game mode with at least one LLM. Illegal LLM moves are force-pushed to the board — creating impossible positions. Excluded from ELO.

Note: Every game must have at least one LLM. Human vs Stockfish games are not allowed — this is an LLM arena, not a chess website.

Screenshots

Live Game Viewer

Interactive chessboard with eval bar, win probability graph, classified move list, playback controls, table talk, and AI commentary.

Post-Game Analysis

Accuracy comparison, ACPL, classification breakdown, critical moments, token usage per move, and cost analysis.

Games List

Browse active and completed games with search, filtering by outcome/opening, and URL-synced state.

New Game Dialog

Three-way player type toggle (LLM / Human / Stockfish) per side with a searchable model dropdown from OpenRouter. Supports temperature, reasoning effort, time controls, Stockfish ELO, and chaos mode.

Leaderboard

Unified ELO rankings for LLMs, Human, and Stockfish — with accuracy, ACPL, average cost, and response time stats.

Model Detail

Deep dive into any model's performance: ELO history graph, win rates, head-to-head records, classification distribution, and recent games.

Cost & Performance Dashboard

Platform-wide cost tracking with per-model breakdowns, token usage analysis, and response time comparisons.

Tech Stack

Layer	Technology
Backend	FastAPI + SQLModel + aiosqlite
LLM Orchestration	pydantic-ai via OpenRouter
Chess Engine	Stockfish (depth 18 live, depth 22 post-game)
Chess Logic	python-chess
Real-time	WebSocket (FastAPI to React)
Frontend	React 19 + TypeScript + Vite
Board UI	react-chessboard + chess.js
Charts	Recharts
Deployment	Docker Compose (nginx + uvicorn)

Getting Started

Prerequisites

Docker and Docker Compose (recommended), or:
Python 3.12+, Node.js 20+, and Stockfish (apt install stockfish)
OpenRouter API key — openrouter.ai

Quick Start with Docker

git clone https://github.com/DeadPackets/LLMChessArena.git
cd LLMChessArena

cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY

docker compose up --build -d

The app will be available at http://localhost. The backend waits for a health check before the frontend starts.

Local Development

Backend:

cd backend
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env   # edit with your API key
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:

cd frontend
npm install
npm run dev

Dev server at http://localhost:5173 — proxies /api and /ws to the backend.

Production Deployment

The Docker Compose setup is production-ready and designed to sit behind a reverse proxy like Cloudflare.

Architecture

Internet -> Cloudflare (TLS, DDoS, HSTS) -> nginx (port 80) -> backend (port 8000)
                                                             -> static assets

What's built in

CORS locked to ALLOWED_ORIGINS (default: https://llmchess.deadpackets.pw)
Rate limiting — per-IP sliding window with 4 tiers (game creation, API reads, game stop, WebSocket). Returns 429 with Retry-After and X-RateLimit-* headers
Game queueing — MAX_CONCURRENT_GAMES enforced by asyncio semaphore. Excess games queue with position tracking
Cloudflare IP detection — rate limiter reads CF-Connecting-IP for real client IPs
Security headers — X-Content-Type-Options, X-XSS-Protection, Referrer-Policy, Permissions-Policy via nginx
SQLite WAL mode — enabled at startup for concurrent read/write performance
Health checks — Docker health checks on both services; frontend waits for backend to be healthy
Global exception handler — unhandled errors return {"detail": "Internal server error"} instead of stack traces
Proxy headers — uvicorn runs with --proxy-headers --forwarded-allow-ips=*

Cloudflare settings

If deploying behind Cloudflare with proxy enabled:

SSL/TLS: Full (strict) if you have an origin cert, or Flexible for HTTP-only origins
HSTS: Enable in Cloudflare dashboard (edge-level, no nginx config needed)
WebSockets: Enabled by default on all Cloudflare plans
Caching: Static assets get 30-day Cache-Control: public, immutable from nginx

Embedding Games

Share finished game replays on any website:

<iframe
  src="https://llmchess.deadpackets.pw/embed/GAME_ID"
  width="800" height="500"
  style="border: none; border-radius: 8px;"
></iframe>

The embed viewer includes a chessboard, move list, playback controls, and a "View full game" link back to the main site. Supports ?move=N to start at a specific position.

Active games show a "Watch Live" link instead of the replay viewer.

Configuration

All settings are configurable via environment variables with sensible defaults.

Required

Variable	Description
`OPENROUTER_API_KEY`	Your OpenRouter API key

Optional

Variable	Default	Description
`STOCKFISH_PATH`	`/usr/games/stockfish`	Path to Stockfish binary
`MAX_MOVES_PER_SIDE`	`200`	Maximum moves per side before forced draw
`MAX_CONSECUTIVE_ILLEGAL_MOVES`	`10`	Illegal moves before forfeit
`MAX_CONCURRENT_GAMES`	`3`	Simultaneous games allowed (excess queued)
`ELO_K_FACTOR`	`32`	ELO rating K-factor
`DEFAULT_MODEL_ELO`	`1500.0`	Starting ELO for new models
`STOCKFISH_THREADS`	`2`	Stockfish eval engine threads
`STOCKFISH_HASH_MB`	`128`	Stockfish eval engine hash table (MB)
`STOCKFISH_DEPTH_LIVE`	`18`	Stockfish search depth during live games
`STOCKFISH_DEPTH_DEEP`	`22`	Stockfish search depth for post-game analysis
`STOCKFISH_PLAYER_THREADS`	`1`	Stockfish player engine threads
`STOCKFISH_PLAYER_HASH_MB`	`64`	Stockfish player engine hash table (MB)
`STOCKFISH_PLAYER_MOVE_TIME`	`1.0`	Stockfish player time per move (seconds)
`STOCKFISH_MIN_ELO`	`1320`	Minimum selectable Stockfish ELO
`STOCKFISH_MAX_ELO`	`3190`	Maximum selectable Stockfish ELO
`DRAW_ADJUDICATION_CP`	`20`	Centipawn threshold for draw adjudication
`DRAW_ADJUDICATION_MOVES`	`30`	Consecutive moves within threshold to declare draw
`NARRATION_CHAR_CAP`	`128`	Maximum characters for LLM narration/table talk
`RATE_LIMIT_GAME_CREATE`	`5`	Game creation requests per minute per IP
`RATE_LIMIT_API_READ`	`60`	API read requests per minute per IP
`RATE_LIMIT_GAME_STOP`	`10`	Game stop requests per minute per IP
`RATE_LIMIT_WS_CONNECT`	`20`	WebSocket connections per minute per IP
`ALLOWED_ORIGINS`	`https://llmchess.deadpackets.pw`	Comma-separated CORS origins
`LOG_LEVEL`	`INFO`	Python logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)

API

Method	Endpoint	Description
`GET`	`/api/games`	List games (filter by status, outcome, model, opening, search)
`POST`	`/api/games`	Create a new game
`GET`	`/api/games/queue-status`	Active/queued game counts
`GET`	`/api/games/:id`	Game detail with moves and analysis
`GET`	`/api/games/:id/pgn`	Download PGN
`POST`	`/api/games/:id/stop`	Stop an active game (requires player secret)
`GET`	`/api/models`	List all models
`GET`	`/api/models/leaderboard`	ELO leaderboard with accuracy, ACPL, cost, response time
`GET`	`/api/models/compare`	Head-to-head comparison (`?model_a=...&model_b=...`)
`GET`	`/api/models/:id`	Model detail with stats and recent games
`GET`	`/api/models/:id/elo-history`	ELO rating progression
`GET`	`/api/models/:id/head-to-head`	Head-to-head records vs all opponents
`GET`	`/api/stats/overview`	Platform-wide cost and token overview
`GET`	`/api/stats/openings`	Opening statistics across all games
`GET`	`/api/openrouter/models`	Cached OpenRouter model list
`GET`	`/health`	Service health check
`WS`	`/ws/game/:id`	Real-time game stream (moves, eval, table talk, spectators)

All API responses include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.

Project Structure

LLMChessArena/
├── backend/
│   ├── app/
│   │   ├── main.py                # FastAPI app, lifespan, exception handler
│   │   ├── config.py              # All env-var-driven configuration
│   │   ├── database.py            # SQLModel tables + WAL init
│   │   ├── middleware/
│   │   │   └── rate_limiter.py    # Per-IP sliding-window rate limiter
│   │   ├── models/
│   │   │   ├── api_models.py      # Pydantic request/response schemas
│   │   │   └── chess_models.py    # GameConfig, ChessMove, MoveRecord, GameResult
│   │   ├── routers/
│   │   │   ├── games.py           # Game CRUD, stop, queue status
│   │   │   ├── models_router.py   # Leaderboard, model detail, H2H, ELO history
│   │   │   ├── stats_router.py    # Platform overview, opening stats
│   │   │   ├── ws.py              # WebSocket game streaming + human moves
│   │   │   └── openrouter_proxy.py
│   │   └── services/
│   │       ├── chess_agent.py     # pydantic-ai LLM agent (structured output)
│   │       ├── game_engine.py     # Game loop (LLM, Human, Stockfish, Chaos)
│   │       ├── game_manager.py    # Concurrency, queueing, ELO updates
│   │       ├── stockfish_service.py       # Async Stockfish UCI (eval)
│   │       ├── stockfish_player_service.py # Strength-limited Stockfish player
│   │       ├── move_classifier.py  # Move classification (best/good/inaccuracy/mistake/blunder)
│   │       ├── elo_service.py      # ELO calculation
│   │       ├── stats_service.py    # ACPL, accuracy, aggregates, H2H
│   │       └── opening_detector.py # ECO opening book (3600+ positions)
│   ├── Dockerfile
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── pages/                 # GameList, GameViewer, GameEmbed, Leaderboard,
│   │   │                          # ModelDetail, CostDashboard, HeadToHead, OpeningExplorer
│   │   ├── components/            # ChessboardPanel, EvalBar, MoveList, WinProbGraph,
│   │   │                          # TableTalkPanel, AnalysisPanel, GameControls, etc.
│   │   ├── hooks/                 # useGameWebSocket, useReplayControls, useBoardTheme,
│   │   │                          # useOpenRouterModels, useSoundEffects
│   │   ├── utils/                 # formatModel (shared), sound helpers
│   │   ├── api/client.ts          # REST API client with timeout
│   │   └── types/                 # TypeScript interfaces (API + WebSocket)
│   ├── Dockerfile                 # Multi-stage: npm build -> nginx
│   ├── nginx.conf                 # Reverse proxy + security headers
│   └── package.json
├── docker-compose.yml             # Backend + frontend with health checks
└── .env.example

License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with Claude Code

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Chess Arena

Overview

Highlights

Game Modes

Screenshots

Tech Stack

Getting Started

Prerequisites

Quick Start with Docker

Local Development

Production Deployment

Architecture

What's built in

Cloudflare settings

Embedding Games

Configuration

Required

Optional

API

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Chess Arena

Overview

Highlights

Game Modes

Screenshots

Tech Stack

Getting Started

Prerequisites

Quick Start with Docker

Local Development

Production Deployment

Architecture

What's built in

Cloudflare settings

Embedding Games

Configuration

Required

Optional

API

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages