Home

PrismBench

Systematic evaluation of language models through Monte Carlo Tree Search

Overview

PrismBench is a comprehensive framework for evaluating Large Language Model capabilities in computer science problem-solving. Using a three-phase Monte Carlo Tree Search approach, it systematically maps model strengths, discovers challenging areas, and provides detailed performance analysis.

Core Approach:

Phase 1: Maps initial capabilities across CS concepts
Phase 2: Discovers challenging concept combinations
Phase 3: Conducts comprehensive evaluation of weaknesses

Getting Started

New to PrismBench? Follow our quick start guide to get running in 5 minutes.

Quick Start Guide →

Need detailed setup? See our comprehensive configuration documentation.

Configuration Guide →

Core Documentation

Framework Components

Component	Description	Documentation
MCTS Algorithm	Three-phase search strategy for capability mapping	MCTS Algorithm →
Agent System	Multi-agent architecture for challenge creation and evaluation	Agent System →
Environment System	Pluggable evaluation environments for different scenarios	Environment System →
Architecture	System design and component interactions	Architecture Overview →

Analysis & Results

Topic	Description	Documentation
Results Analysis	Understanding and interpreting evaluation results	Results Analysis →
Tree Structure	Search tree implementation and concept organization	Tree Structure →

Extending PrismBench

PrismBench is designed to be extensible, allowing you to add custom agents, environments, and MCTS phases.

System Architecture

PrismBench follows a microservices architecture with four services:

┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ GUI          │    │ Search          │    │ Environment      │    │ LLM Interface   │
│ Port 3000    │◄──►│ Port 8002       │◄──►│ Port 8001        │◄──►│ Port 8000       │
│ Web Frontend │    │ MCTS Engine     │    │ Challenge Exec   │    │ Model Comm      │
└──────────────┘    └─────────────────┘    └──────────────────┘    └─────────────────┘

Detailed Architecture →

Key Features

Systematic Evaluation through MCTS-driven exploration
Challenge Discovery automatically identifies model weaknesses
Comprehensive Analysis with detailed performance metrics
Containerized Deployment with Docker support
API Compatible with OpenAI-compatible endpoints
Extensible Architecture for custom components

Support

Resource	Description
Troubleshooting	Common issues and solutions
GitHub Discussions	Community support and questions
Issue Tracker	Bug reports and feature requests

Contributing

We welcome contributions to PrismBench! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.

Contributing Guide →

Related Pages

Get Started

Quick Start - Setup and first run
Configuration Overview - Complete configuration guide
Architecture Overview - System design and components

Core Framework

MCTS Algorithm - Monte Carlo Tree Search implementation
Agent System - Multi-agent architecture
Environment System - Evaluation environments

Advanced Usage

Extending PrismBench - Framework extensions
Results Analysis - Understanding evaluation results
Troubleshooting - Common issues and solutions

Made with enough ☕️ to fell an elephant and a whole lot of ❤️ by Ahura Majdinasab - SWAT Lab - Polytechnique Montreal

Home

PrismBench

Overview

Getting Started

Core Documentation

Framework Components

Analysis & Results

Extending PrismBench

System Architecture

Key Features

Support

Contributing

Related Pages

Get Started

Core Framework

Advanced Usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PrismBench Wiki

Getting Started

Core Framework

Configuration Reference

Development

Analysis & Results

Examples & Tutorials

Support

Community

Clone this wiki locally