PrismBench

An LLM capability mapping framework for systematic evaluation of language models in computer science problem-solving

This branch contains the actively maintained PrismBench framework.
For the paper replication package, use the replication-package branch.

What Is PrismBench?

PrismBench evaluates LLM coding ability through a three-phase Monte Carlo Tree Search (MCTS) workflow:

Phase 1: Initial capability mapping across concepts and difficulties
Phase 2: Challenge discovery for weak regions of the search space
Phase 3: Deep evaluation of challenging combinations

Architecture (4 Services)

PrismBench runs as four cooperating services:

GUI (localhost:3000) - web interface for starting and monitoring runs
Search (localhost:8002) - async MCTS orchestration (/initialize, /run, /tasks/{task_id})
Environment (localhost:8001) - challenge execution runtime (/run-challenge)
LLM Interface (localhost:8000) - role-based LLM gateway (/interact, /session_history/{id})

Repository Layout

PrismBench/
├── src/services/
│   ├── gui/                    # Next.js frontend
│   ├── search/                 # MCTS orchestrator service
│   ├── environment/            # Challenge execution service
│   └── llm_interface/          # LLM interaction gateway
├── src/analysis/               # Analysis utilities
├── configs/                    # Agent/environment/tree/phase configs
├── docs/                       # Wiki source files
├── docker/                     # Dockerfiles and compose stack
├── Makefile                    # Common dev and runtime commands
└── apis.key.template           # API key template

Quick Start

git clone https://github.com/CommissarSilver/PrismBench
cd PrismBench

# Show available commands
make help

# Bootstrap environment + keys template
make setup

# Edit apis.key, then start services
make start

After startup:

GUI: http://localhost:3000
Search API docs: http://localhost:8002/docs
Environment API docs: http://localhost:8001/docs
LLM Interface API docs: http://localhost:8000/docs

Docker Build/Run Commands

PrismBench uses docker/docker-compose.yaml with a shared base image.

# Build all services
make build

# Build base image only
make build-base

# Rebuild all services from scratch
make rebuild

# Stop services
make stop

Documentation

Service READMEs

Contributing

Contributions are welcome. See CONTRIBUTING.md.

License

This project is licensed under the MIT License.

Citation

If you use PrismBench in your research, please cite:

@article{
majdinasab2026prismbench,
title={PrismBench: Dynamic and Flexible Benchmarking of LLMs' Code Generation with Monte Carlo Tree Search},
author={Vahid Majdinasab, Amin Nikanjam, Foutse Khomh},
journal={Transactions on Machine Learning Research},
year={2026},
url={https://openreview.net/forum?id=O0bsC6FDly},
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.devcontainer		.devcontainer
configs		configs
docker		docker
docs		docs
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
apis.key.template		apis.key.template
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrismBench

What Is PrismBench?

Architecture (4 Services)

Repository Layout

Quick Start

Docker Build/Run Commands

Documentation

Service READMEs

Contributing

License

Citation

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrismBench

What Is PrismBench?

Architecture (4 Services)

Repository Layout

Quick Start

Docker Build/Run Commands

Documentation

Service READMEs

Contributing

License

Citation

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages