Skip to content

add option to save top-k logprobs with vf-eval#903

Closed
hallerite wants to merge 6 commits intomainfrom
hallerite/save-logps
Closed

add option to save top-k logprobs with vf-eval#903
hallerite wants to merge 6 commits intomainfrom
hallerite/save-logps

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented Feb 12, 2026

Description

For synthetic data generation, we may want to save the logprobs. This PR adds this functionality to vf-eval.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches request parameter normalization and rollout state/output persistence; main risk is provider-specific response shape differences causing extraction/assertion failures when --top-logprobs is enabled.

Overview
Adds a --top-logprobs K / top_logprobs eval config option to persist per-token top‑K alternatives during prime eval runs, automatically enabling --save-results and saving completion_top_tokens/completion_top_logprobs into output JSONL.

Implements end-to-end wiring: CLI/TOML config parsing injects logprobs/top_logprobs into sampling args and auto-adds the new state columns; multi-turn rollouts extract and accumulate top‑K data from both chat and completion responses; sampling args are normalized so the completions API receives integer logprobs and strips unsupported top_logprobs.

Includes a small robustness fix for token parsing when token_ids/prompt_token_ids are present but None, plus new unit tests covering the sampling-arg normalization behavior and updated docs/skill guidance.

Written by Cursor Bugbot for commit 9d5979f. This will update automatically on new commits. Configure here.

Comment thread verifiers/scripts/eval.py Outdated
)
parser.add_argument(
"--top-logprobs",
"-L",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a one letter shortcut here, ideally we keep one letter shortcut for only the most used param

Comment on lines +150 to +156
async def extract_top_logprobs(
response: ModelResponse, message_type: MessageType
) -> TopLogprobs:
"""Extract top-k logprobs from a standard OpenAI/vLLM response.

Returns a ``TopLogprobs`` with two parallel ``list[list[...]]``
(tokens and logprobs), coupled by index.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the hastatr seems a bit shady tbh, isn't it a way to have more certainty on the attributes of the object that goes into this function ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we should be able to assume that response is not None here, so typing it as ChatCompletion | Completion now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah makes stuff some much better !

Comment thread verifiers/utils/response_utils.py Outdated
Comment on lines +231 to +236
top_logprobs_list = getattr(choice.logprobs, "top_logprobs", None)
if not top_logprobs_list:
raise ValueError(
"Response logprobs has no top_logprobs. "
"The endpoint may not support the top_logprobs parameter."
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patter doesnt makes sense imo. we can just call choice.logprobs which will fail if it doesn't exist

Comment thread verifiers/scripts/eval.py
Comment thread verifiers/scripts/eval.py
Comment thread verifiers/scripts/eval.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread verifiers/scripts/eval.py
Copy link
Copy Markdown
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but will for @willccbb @mikasenghaas to check tho

Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said on discord: not convinced we need to make this first class. you can populate any req args via sampling_args and get the full serialized response which includes the top logprobs.

@hallerite
Copy link
Copy Markdown
Member Author

hallerite commented Feb 13, 2026

Closing this after realizing that the existing --state-columns trajectory path already saves all the logprob data when the right sampling args are passed:

--sampling-args '{"logprobs":true,"top_logprobs":5,"extra_body" :{"return_token_ids":true,"return_prompt_token_ids":true}}'  --state-columns trajectory --save-results

Making this first-class would save disk by not including the full trajectory, but for now the generic path works without any code changes, so it's fine to keep it as is.

@hallerite hallerite closed this Feb 13, 2026
@hallerite hallerite deleted the hallerite/save-logps branch February 13, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants