This repository implements the 3-stage experiment described in the paper.
/generator: Unified LLM generation module (calls vLLM-served models via OpenAI-compatible API)/risk: Tests the baseline PT parameters/marker: Implements experiment for marker mappings/risk_marker: Replaces numeric probabilities with epistemic markers and runs PT parameter measurement again
pip install -r requirements.txtStart a vLLM server with your target model. For example:
vllm serve Qwen/Qwen3-32B --port 8000The experiments expect an OpenAI-compatible API at http://localhost:8000/v1 by default.
You can override this by setting the VLLM_BASE_URL environment variable:
export VLLM_BASE_URL=http://your-server:8000/v1Navigate to the directory corresponding to the stage you want to run. For example, to run baseline PT measurement:
cd riskEach stage directory has the following structure:
risk/
|---plot/
|---processed/
|---result/
analyze.py
elicitation.py
mle.py
prompt.py
values_probs.py
- Open
elicitation.py - Edit
models_to_testto select your target model(s) - Optionally adjust
base_url,sample_num, andbatchsize
python elicitation.pyResults will be saved under result/
python process.py > result.txt- Measured parameters will be saved in
processed/ - Results will also be shown in
result.txt
To customize marker substitution:
- Open
risk_marker/prompt.py - Modify the function
safe_subas needed
- The
generator/module provides a unifiedVLLMGeneratorclass used by all three stages.risk/andrisk_marker/use chat completion (/v1/chat/completions) with multi-turn conversations.marker/uses text completion (/v1/completions) with raw prompts.
- For Qwen3 models, thinking mode is automatically disabled via
chat_template_kwargs. - For QwQ models, a system message suppressing reasoning is automatically prepended.