An LLM capability mapping framework for systematic evaluation of language models in computer science problem-solving
This branch contains the actively maintained PrismBench framework.
For the paper replication package, use the replication-package branch.
PrismBench evaluates LLM coding ability through a three-phase Monte Carlo Tree Search (MCTS) workflow:
- Phase 1: Initial capability mapping across concepts and difficulties
- Phase 2: Challenge discovery for weak regions of the search space
- Phase 3: Deep evaluation of challenging combinations
PrismBench runs as four cooperating services:
- GUI (
localhost:3000) - web interface for starting and monitoring runs - Search (
localhost:8002) - async MCTS orchestration (/initialize,/run,/tasks/{task_id}) - Environment (
localhost:8001) - challenge execution runtime (/run-challenge) - LLM Interface (
localhost:8000) - role-based LLM gateway (/interact,/session_history/{id})
PrismBench/
├── src/services/
│ ├── gui/ # Next.js frontend
│ ├── search/ # MCTS orchestrator service
│ ├── environment/ # Challenge execution service
│ └── llm_interface/ # LLM interaction gateway
├── src/analysis/ # Analysis utilities
├── configs/ # Agent/environment/tree/phase configs
├── docs/ # Wiki source files
├── docker/ # Dockerfiles and compose stack
├── Makefile # Common dev and runtime commands
└── apis.key.template # API key template
git clone https://github.com/CommissarSilver/PrismBench
cd PrismBench
# Show available commands
make help
# Bootstrap environment + keys template
make setup
# Edit apis.key, then start services
make startAfter startup:
- GUI: http://localhost:3000
- Search API docs: http://localhost:8002/docs
- Environment API docs: http://localhost:8001/docs
- LLM Interface API docs: http://localhost:8000/docs
PrismBench uses docker/docker-compose.yaml with a shared base image.
# Build all services
make build
# Build base image only
make build-base
# Rebuild all services from scratch
make rebuild
# Stop services
make stopContributions are welcome. See CONTRIBUTING.md.
This project is licensed under the MIT License.
If you use PrismBench in your research, please cite:
@article{
majdinasab2026prismbench,
title={PrismBench: Dynamic and Flexible Benchmarking of LLMs' Code Generation with Monte Carlo Tree Search},
author={Vahid Majdinasab, Amin Nikanjam, Foutse Khomh},
journal={Transactions on Machine Learning Research},
year={2026},
url={https://openreview.net/forum?id=O0bsC6FDly},
}