-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Systematic evaluation of language models through Monte Carlo Tree Search
PrismBench is a comprehensive framework for evaluating Large Language Model capabilities in computer science problem-solving. Using a three-phase Monte Carlo Tree Search approach, it systematically maps model strengths, discovers challenging areas, and provides detailed performance analysis.
Core Approach:
- Phase 1: Maps initial capabilities across CS concepts
- Phase 2: Discovers challenging concept combinations
- Phase 3: Conducts comprehensive evaluation of weaknesses
New to PrismBench? Follow our quick start guide to get running in 5 minutes.
Need detailed setup? See our comprehensive configuration documentation.
| Component | Description | Documentation |
|---|---|---|
| MCTS Algorithm | Three-phase search strategy for capability mapping | MCTS Algorithm → |
| Agent System | Multi-agent architecture for challenge creation and evaluation | Agent System → |
| Environment System | Pluggable evaluation environments for different scenarios | Environment System → |
| Architecture | System design and component interactions | Architecture Overview → |
| Topic | Description | Documentation |
|---|---|---|
| Results Analysis | Understanding and interpreting evaluation results | Results Analysis → |
| Tree Structure | Search tree implementation and concept organization | Tree Structure → |
PrismBench is designed to be extensible, allowing you to add custom agents, environments, and MCTS phases.
- Extending PrismBench →
- Custom Agents →
- Custom Environments →
- Custom MCTS Phases →
- Extension Combinations →
PrismBench follows a microservices architecture with four services:
┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ GUI │ │ Search │ │ Environment │ │ LLM Interface │
│ Port 3000 │◄──►│ Port 8002 │◄──►│ Port 8001 │◄──►│ Port 8000 │
│ Web Frontend │ │ MCTS Engine │ │ Challenge Exec │ │ Model Comm │
└──────────────┘ └─────────────────┘ └──────────────────┘ └─────────────────┘
- Systematic Evaluation through MCTS-driven exploration
- Challenge Discovery automatically identifies model weaknesses
- Comprehensive Analysis with detailed performance metrics
- Containerized Deployment with Docker support
- API Compatible with OpenAI-compatible endpoints
- Extensible Architecture for custom components
| Resource | Description |
|---|---|
| Troubleshooting | Common issues and solutions |
| GitHub Discussions | Community support and questions |
| Issue Tracker | Bug reports and feature requests |
We welcome contributions to PrismBench! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.
- Quick Start - Setup and first run
- Configuration Overview - Complete configuration guide
- Architecture Overview - System design and components
- MCTS Algorithm - Monte Carlo Tree Search implementation
- Agent System - Multi-agent architecture
- Environment System - Evaluation environments
- Extending PrismBench - Framework extensions
- Results Analysis - Understanding evaluation results
- Troubleshooting - Common issues and solutions
Made with enough ☕️ to fell an elephant and a whole lot of ❤️ by Ahura Majdinasab - SWAT Lab - Polytechnique Montreal
MCTS System
- MCTS Algorithm
- Core MCTS Process
- Key Components
- PrismBench's Three-Phase MCTS
- Tree Structure
- Node Structure
Agent System
Environment System
- Environment Overview
- Environment Types
- Environment Registry
- Agent Integration
- Environment Configuration
Main Configuration
- Configuration Overview
- Agent Configurations
- Environment Configurations
- Phase Configurations
- Tree Configurations
Extension