add option to save top-k logprobs with vf-eval by hallerite · Pull Request #903 · PrimeIntellect-ai/verifiers

hallerite · 2026-02-12T23:59:52Z

Description

For synthetic data generation, we may want to save the logprobs. This PR adds this functionality to vf-eval.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches request parameter normalization and rollout state/output persistence; main risk is provider-specific response shape differences causing extraction/assertion failures when --top-logprobs is enabled.

Overview
Adds a --top-logprobs K / top_logprobs eval config option to persist per-token top‑K alternatives during prime eval runs, automatically enabling --save-results and saving completion_top_tokens/completion_top_logprobs into output JSONL.

Implements end-to-end wiring: CLI/TOML config parsing injects logprobs/top_logprobs into sampling args and auto-adds the new state columns; multi-turn rollouts extract and accumulate top‑K data from both chat and completion responses; sampling args are normalized so the completions API receives integer logprobs and strips unsupported top_logprobs.

Includes a small robustness fix for token parsing when token_ids/prompt_token_ids are present but None, plus new unit tests covering the sampling-arg normalization behavior and updated docs/skill guidance.

^{Written by Cursor Bugbot for commit 9d5979f. This will update automatically on new commits. Configure here.}

samsja · 2026-02-13T00:03:33Z

    )
+    parser.add_argument(
+        "--top-logprobs",
+        "-L",


I would add a one letter shortcut here, ideally we keep one letter shortcut for only the most used param

samsja · 2026-02-13T00:04:30Z

+async def extract_top_logprobs(
+    response: ModelResponse, message_type: MessageType
+) -> TopLogprobs:
+    """Extract top-k logprobs from a standard OpenAI/vLLM response.
+
+    Returns a ``TopLogprobs`` with two parallel ``list[list[...]]``
+    (tokens and logprobs), coupled by index.


all the hastatr seems a bit shady tbh, isn't it a way to have more certainty on the attributes of the object that goes into this function ?

agreed, we should be able to assume that response is not None here, so typing it as ChatCompletion | Completion now.

ah yeah makes stuff some much better !

samsja · 2026-02-13T00:05:34Z

+        top_logprobs_list = getattr(choice.logprobs, "top_logprobs", None)
+        if not top_logprobs_list:
+            raise ValueError(
+                "Response logprobs has no top_logprobs. "
+                "The endpoint may not support the top_logprobs parameter."
+            )


this patter doesnt makes sense imo. we can just call choice.logprobs which will fail if it doesn't exist

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

samsja

lgtm but will for @willccbb @mikasenghaas to check tho

mikasenghaas

as said on discord: not convinced we need to make this first class. you can populate any req args via sampling_args and get the full serialized response which includes the top logprobs.

hallerite · 2026-02-13T15:45:56Z

Closing this after realizing that the existing --state-columns trajectory path already saves all the logprob data when the right sampling args are passed:

--sampling-args '{"logprobs":true,"top_logprobs":5,"extra_body" :{"return_token_ids":true,"return_prompt_token_ids":true}}'  --state-columns trajectory --save-results

Making this first-class would save disk by not including the full trajectory, but for now the generic path works without any code changes, so it's fine to keep it as is.

save logps

127055b

samsja reviewed Feb 13, 2026

View reviewed changes

cursor Bot reviewed Feb 13, 2026

View reviewed changes

Comment thread verifiers/scripts/eval.py

Comment thread verifiers/scripts/eval.py

Comment thread verifiers/scripts/eval.py

hallerite added 3 commits February 13, 2026 00:24

address review

34c1538

fix ty

64a3d84

remove one letter shortcut

d54a8c8

cursor Bot reviewed Feb 13, 2026

View reviewed changes

Comment thread verifiers/scripts/eval.py

hallerite added 2 commits February 13, 2026 02:22

fix completions API receiving wrong logprobs count

424f3f4

add back comment

9d5979f

samsja reviewed Feb 13, 2026

View reviewed changes

mikasenghaas reviewed Feb 13, 2026

View reviewed changes

hallerite closed this Feb 13, 2026

hallerite deleted the hallerite/save-logps branch February 13, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add option to save top-k logprobs with vf-eval#903

add option to save top-k logprobs with vf-eval#903
hallerite wants to merge 6 commits intomainfrom
hallerite/save-logps

hallerite commented Feb 12, 2026 •

edited by cursor Bot

Loading

Uh oh!

samsja Feb 13, 2026

Uh oh!

samsja Feb 13, 2026

Uh oh!

hallerite Feb 13, 2026

Uh oh!

samsja Feb 13, 2026

Uh oh!

samsja Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

samsja left a comment

Uh oh!

mikasenghaas left a comment

Uh oh!

hallerite commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hallerite commented Feb 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

samsja Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

samsja Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

hallerite Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

samsja Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

samsja Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

hallerite commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented Feb 12, 2026 •

edited by cursor Bot

Loading

hallerite commented Feb 13, 2026 •

edited

Loading