GitHub - HKUST-KnowComp/MarPT: Code for Prospect Theory Fails for LLMs: Instability of Decision-Making under Epistemic Uncertainty

Experiment Implementation Guide

This repository implements the 3-stage experiment described in the paper.

Directory Structure

/generator: Unified LLM generation module (calls vLLM-served models via OpenAI-compatible API)
/risk: Tests the baseline PT parameters
/marker: Implements experiment for marker mappings
/risk_marker: Replaces numeric probabilities with epistemic markers and runs PT parameter measurement again

Experiment Steps

1. Setup

pip install -r requirements.txt

2. Start vLLM Server

Start a vLLM server with your target model. For example:

vllm serve Qwen/Qwen3-32B --port 8000

The experiments expect an OpenAI-compatible API at http://localhost:8000/v1 by default. You can override this by setting the VLLM_BASE_URL environment variable:

export VLLM_BASE_URL=http://your-server:8000/v1

3. Stage Selection

Navigate to the directory corresponding to the stage you want to run. For example, to run baseline PT measurement:

cd risk

4. Directory Contents

Each stage directory has the following structure:

risk/
    |---plot/
    |---processed/
    |---result/
    analyze.py
    elicitation.py
    mle.py
    prompt.py
    values_probs.py

5. Model Configuration

Open elicitation.py
Edit models_to_test to select your target model(s)
Optionally adjust base_url, sample_num, and batchsize

6. Running the Experiment

python elicitation.py

Results will be saved under result/

7. Processing Results

python process.py > result.txt

Measured parameters will be saved in processed/
Results will also be shown in result.txt

8. Customizing Marker Substitution (Stage 3 only)

To customize marker substitution:

Open risk_marker/prompt.py
Modify the function safe_sub as needed

Notes

The generator/ module provides a unified VLLMGenerator class used by all three stages.
- risk/ and risk_marker/ use chat completion (/v1/chat/completions) with multi-turn conversations.
- marker/ uses text completion (/v1/completions) with raw prompts.
For Qwen3 models, thinking mode is automatically disabled via chat_template_kwargs.
For QwQ models, a system message suppressing reasoning is automatically prepended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiment Implementation Guide

Directory Structure

Experiment Steps

1. Setup

2. Start vLLM Server

3. Stage Selection

4. Directory Contents

5. Model Configuration

6. Running the Experiment

7. Processing Results

8. Customizing Marker Substitution (Stage 3 only)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
generator		generator
marker		marker
risk		risk
risk_marker		risk_marker
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Experiment Implementation Guide

Directory Structure

Experiment Steps

1. Setup

2. Start vLLM Server

3. Stage Selection

4. Directory Contents

5. Model Configuration

6. Running the Experiment

7. Processing Results

8. Customizing Marker Substitution (Stage 3 only)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages