Skip to content

DeadPackets/LLMChessArena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Chess Arena

Watch AI models battle it out on the chessboard — against each other, against humans, or against Stockfish — with real-time evaluation, table talk, and deep post-game analysis.

Python FastAPI React TypeScript Docker License: MIT Live


LLM Chess Arena - Game Viewer

Overview

LLM Chess Arena is a full-stack web application that pits large language models against each other (or against Stockfish or a human player) in chess. Games are played in real-time via OpenRouter, evaluated move-by-move with Stockfish, and accompanied by AI-generated table talk — all streamed live to a dark-themed war room UI.

Highlights

  • Any LLM vs Any LLM — Supports any model on OpenRouter (GPT-4o, Claude, Gemini, Llama, etc.)
  • Human vs LLM — Play against any AI model with drag-and-drop, click-to-move, legal move highlighting, and promotion dialogs
  • Stockfish as opponent — Benchmark LLM chess ability against the strongest classical engine with configurable ELO (1320–3190)
  • Chaos Mode — Illegal LLM moves are force-pushed to the board instead of retried, creating wild and impossible positions
  • Real-time streaming — WebSocket-powered live updates for moves, evaluations, table talk, and spectator counts
  • Deep analysis — Stockfish-powered accuracy scores, ACPL, win probability graphs, move classifications, and critical moments
  • Table talk — LLMs provide honest, natural reactions to each position — confident when ahead, frustrated when behind
  • ELO ratings — Unified leaderboard ranking all LLMs, Human, and Stockfish together
  • Embeddable replays — Share finished games on any site with <iframe src="/embed/:gameId">
  • Per-move time controls — Configurable 5–600s time limit with forfeit on timeout
  • Cost tracking — Per-move and per-game token usage and API cost tracking
  • Rate limiting — Per-IP sliding-window rate limiter with game queueing and configurable concurrency
  • 10 board themes + 14 piece styles — Persisted to localStorage
  • Sound effects + move animations — Audio feedback for moves, captures, checks, and game-over
  • Docker-ready — One command to deploy the entire stack

Game Modes

Mode Description
LLM vs LLM Two AI models play each other. Both provide table talk and narration.
Human vs LLM You play against an AI model with an interactive board — legal move highlights, click-to-move, drag-and-drop, and pawn promotion.
LLM vs Stockfish An AI model plays against the Stockfish engine at configurable strength. A pure chess skill benchmark.
Chaos Mode Any game mode with at least one LLM. Illegal LLM moves are force-pushed to the board — creating impossible positions. Excluded from ELO.

Note: Every game must have at least one LLM. Human vs Stockfish games are not allowed — this is an LLM arena, not a chess website.


Screenshots

Live Game Viewer
Game Viewer

Interactive chessboard with eval bar, win probability graph, classified move list, playback controls, table talk, and AI commentary.

Post-Game Analysis
Analysis Panel

Accuracy comparison, ACPL, classification breakdown, critical moments, token usage per move, and cost analysis.

Games List
Games Page

Browse active and completed games with search, filtering by outcome/opening, and URL-synced state.

New Game Dialog
New Game Dialog

Three-way player type toggle (LLM / Human / Stockfish) per side with a searchable model dropdown from OpenRouter. Supports temperature, reasoning effort, time controls, Stockfish ELO, and chaos mode.

Leaderboard
Leaderboard

Unified ELO rankings for LLMs, Human, and Stockfish — with accuracy, ACPL, average cost, and response time stats.

Model Detail
Model Detail

Deep dive into any model's performance: ELO history graph, win rates, head-to-head records, classification distribution, and recent games.

Cost & Performance Dashboard
Cost Dashboard

Platform-wide cost tracking with per-model breakdowns, token usage analysis, and response time comparisons.


Tech Stack

Layer Technology
Backend FastAPI + SQLModel + aiosqlite
LLM Orchestration pydantic-ai via OpenRouter
Chess Engine Stockfish (depth 18 live, depth 22 post-game)
Chess Logic python-chess
Real-time WebSocket (FastAPI to React)
Frontend React 19 + TypeScript + Vite
Board UI react-chessboard + chess.js
Charts Recharts
Deployment Docker Compose (nginx + uvicorn)

Getting Started

Prerequisites

  • Docker and Docker Compose (recommended), or:
  • Python 3.12+, Node.js 20+, and Stockfish (apt install stockfish)
  • OpenRouter API keyopenrouter.ai

Quick Start with Docker

git clone https://github.com/DeadPackets/LLMChessArena.git
cd LLMChessArena

cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY

docker compose up --build -d

The app will be available at http://localhost. The backend waits for a health check before the frontend starts.

Local Development

Backend:

cd backend
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env   # edit with your API key
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:

cd frontend
npm install
npm run dev

Dev server at http://localhost:5173 — proxies /api and /ws to the backend.


Production Deployment

The Docker Compose setup is production-ready and designed to sit behind a reverse proxy like Cloudflare.

Architecture

Internet -> Cloudflare (TLS, DDoS, HSTS) -> nginx (port 80) -> backend (port 8000)
                                                             -> static assets

What's built in

  • CORS locked to ALLOWED_ORIGINS (default: https://llmchess.deadpackets.pw)
  • Rate limiting — per-IP sliding window with 4 tiers (game creation, API reads, game stop, WebSocket). Returns 429 with Retry-After and X-RateLimit-* headers
  • Game queueingMAX_CONCURRENT_GAMES enforced by asyncio semaphore. Excess games queue with position tracking
  • Cloudflare IP detection — rate limiter reads CF-Connecting-IP for real client IPs
  • Security headersX-Content-Type-Options, X-XSS-Protection, Referrer-Policy, Permissions-Policy via nginx
  • SQLite WAL mode — enabled at startup for concurrent read/write performance
  • Health checks — Docker health checks on both services; frontend waits for backend to be healthy
  • Global exception handler — unhandled errors return {"detail": "Internal server error"} instead of stack traces
  • Proxy headers — uvicorn runs with --proxy-headers --forwarded-allow-ips=*

Cloudflare settings

If deploying behind Cloudflare with proxy enabled:

  • SSL/TLS: Full (strict) if you have an origin cert, or Flexible for HTTP-only origins
  • HSTS: Enable in Cloudflare dashboard (edge-level, no nginx config needed)
  • WebSockets: Enabled by default on all Cloudflare plans
  • Caching: Static assets get 30-day Cache-Control: public, immutable from nginx

Embedding Games

Share finished game replays on any website:

<iframe
  src="https://llmchess.deadpackets.pw/embed/GAME_ID"
  width="800" height="500"
  style="border: none; border-radius: 8px;"
></iframe>

The embed viewer includes a chessboard, move list, playback controls, and a "View full game" link back to the main site. Supports ?move=N to start at a specific position.

Active games show a "Watch Live" link instead of the replay viewer.


Configuration

All settings are configurable via environment variables with sensible defaults.

Required

Variable Description
OPENROUTER_API_KEY Your OpenRouter API key

Optional

Variable Default Description
STOCKFISH_PATH /usr/games/stockfish Path to Stockfish binary
MAX_MOVES_PER_SIDE 200 Maximum moves per side before forced draw
MAX_CONSECUTIVE_ILLEGAL_MOVES 10 Illegal moves before forfeit
MAX_CONCURRENT_GAMES 3 Simultaneous games allowed (excess queued)
ELO_K_FACTOR 32 ELO rating K-factor
DEFAULT_MODEL_ELO 1500.0 Starting ELO for new models
STOCKFISH_THREADS 2 Stockfish eval engine threads
STOCKFISH_HASH_MB 128 Stockfish eval engine hash table (MB)
STOCKFISH_DEPTH_LIVE 18 Stockfish search depth during live games
STOCKFISH_DEPTH_DEEP 22 Stockfish search depth for post-game analysis
STOCKFISH_PLAYER_THREADS 1 Stockfish player engine threads
STOCKFISH_PLAYER_HASH_MB 64 Stockfish player engine hash table (MB)
STOCKFISH_PLAYER_MOVE_TIME 1.0 Stockfish player time per move (seconds)
STOCKFISH_MIN_ELO 1320 Minimum selectable Stockfish ELO
STOCKFISH_MAX_ELO 3190 Maximum selectable Stockfish ELO
DRAW_ADJUDICATION_CP 20 Centipawn threshold for draw adjudication
DRAW_ADJUDICATION_MOVES 30 Consecutive moves within threshold to declare draw
NARRATION_CHAR_CAP 128 Maximum characters for LLM narration/table talk
RATE_LIMIT_GAME_CREATE 5 Game creation requests per minute per IP
RATE_LIMIT_API_READ 60 API read requests per minute per IP
RATE_LIMIT_GAME_STOP 10 Game stop requests per minute per IP
RATE_LIMIT_WS_CONNECT 20 WebSocket connections per minute per IP
ALLOWED_ORIGINS https://llmchess.deadpackets.pw Comma-separated CORS origins
LOG_LEVEL INFO Python logging level (DEBUG, INFO, WARNING, ERROR)

API

Method Endpoint Description
GET /api/games List games (filter by status, outcome, model, opening, search)
POST /api/games Create a new game
GET /api/games/queue-status Active/queued game counts
GET /api/games/:id Game detail with moves and analysis
GET /api/games/:id/pgn Download PGN
POST /api/games/:id/stop Stop an active game (requires player secret)
GET /api/models List all models
GET /api/models/leaderboard ELO leaderboard with accuracy, ACPL, cost, response time
GET /api/models/compare Head-to-head comparison (?model_a=...&model_b=...)
GET /api/models/:id Model detail with stats and recent games
GET /api/models/:id/elo-history ELO rating progression
GET /api/models/:id/head-to-head Head-to-head records vs all opponents
GET /api/stats/overview Platform-wide cost and token overview
GET /api/stats/openings Opening statistics across all games
GET /api/openrouter/models Cached OpenRouter model list
GET /health Service health check
WS /ws/game/:id Real-time game stream (moves, eval, table talk, spectators)

All API responses include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.


Project Structure

LLMChessArena/
├── backend/
│   ├── app/
│   │   ├── main.py                # FastAPI app, lifespan, exception handler
│   │   ├── config.py              # All env-var-driven configuration
│   │   ├── database.py            # SQLModel tables + WAL init
│   │   ├── middleware/
│   │   │   └── rate_limiter.py    # Per-IP sliding-window rate limiter
│   │   ├── models/
│   │   │   ├── api_models.py      # Pydantic request/response schemas
│   │   │   └── chess_models.py    # GameConfig, ChessMove, MoveRecord, GameResult
│   │   ├── routers/
│   │   │   ├── games.py           # Game CRUD, stop, queue status
│   │   │   ├── models_router.py   # Leaderboard, model detail, H2H, ELO history
│   │   │   ├── stats_router.py    # Platform overview, opening stats
│   │   │   ├── ws.py              # WebSocket game streaming + human moves
│   │   │   └── openrouter_proxy.py
│   │   └── services/
│   │       ├── chess_agent.py     # pydantic-ai LLM agent (structured output)
│   │       ├── game_engine.py     # Game loop (LLM, Human, Stockfish, Chaos)
│   │       ├── game_manager.py    # Concurrency, queueing, ELO updates
│   │       ├── stockfish_service.py       # Async Stockfish UCI (eval)
│   │       ├── stockfish_player_service.py # Strength-limited Stockfish player
│   │       ├── move_classifier.py  # Move classification (best/good/inaccuracy/mistake/blunder)
│   │       ├── elo_service.py      # ELO calculation
│   │       ├── stats_service.py    # ACPL, accuracy, aggregates, H2H
│   │       └── opening_detector.py # ECO opening book (3600+ positions)
│   ├── Dockerfile
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── pages/                 # GameList, GameViewer, GameEmbed, Leaderboard,
│   │   │                          # ModelDetail, CostDashboard, HeadToHead, OpeningExplorer
│   │   ├── components/            # ChessboardPanel, EvalBar, MoveList, WinProbGraph,
│   │   │                          # TableTalkPanel, AnalysisPanel, GameControls, etc.
│   │   ├── hooks/                 # useGameWebSocket, useReplayControls, useBoardTheme,
│   │   │                          # useOpenRouterModels, useSoundEffects
│   │   ├── utils/                 # formatModel (shared), sound helpers
│   │   ├── api/client.ts          # REST API client with timeout
│   │   └── types/                 # TypeScript interfaces (API + WebSocket)
│   ├── Dockerfile                 # Multi-stage: npm build -> nginx
│   ├── nginx.conf                 # Reverse proxy + security headers
│   └── package.json
├── docker-compose.yml             # Backend + frontend with health checks
└── .env.example

License

This project is licensed under the MIT License — see the LICENSE file for details.


Built with Claude Code

About

An arena for making two LLMs compete against each other in a chess game with live commentary, move evaluation and more.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors