add option to save top-k logprobs with vf-eval#903
Conversation
| ) | ||
| parser.add_argument( | ||
| "--top-logprobs", | ||
| "-L", |
There was a problem hiding this comment.
I would add a one letter shortcut here, ideally we keep one letter shortcut for only the most used param
| async def extract_top_logprobs( | ||
| response: ModelResponse, message_type: MessageType | ||
| ) -> TopLogprobs: | ||
| """Extract top-k logprobs from a standard OpenAI/vLLM response. | ||
|
|
||
| Returns a ``TopLogprobs`` with two parallel ``list[list[...]]`` | ||
| (tokens and logprobs), coupled by index. |
There was a problem hiding this comment.
all the hastatr seems a bit shady tbh, isn't it a way to have more certainty on the attributes of the object that goes into this function ?
There was a problem hiding this comment.
agreed, we should be able to assume that response is not None here, so typing it as ChatCompletion | Completion now.
There was a problem hiding this comment.
ah yeah makes stuff some much better !
| top_logprobs_list = getattr(choice.logprobs, "top_logprobs", None) | ||
| if not top_logprobs_list: | ||
| raise ValueError( | ||
| "Response logprobs has no top_logprobs. " | ||
| "The endpoint may not support the top_logprobs parameter." | ||
| ) |
There was a problem hiding this comment.
this patter doesnt makes sense imo. we can just call choice.logprobs which will fail if it doesn't exist
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
samsja
left a comment
There was a problem hiding this comment.
lgtm but will for @willccbb @mikasenghaas to check tho
mikasenghaas
left a comment
There was a problem hiding this comment.
as said on discord: not convinced we need to make this first class. you can populate any req args via sampling_args and get the full serialized response which includes the top logprobs.
|
Closing this after realizing that the existing Making this first-class would save disk by not including the full trajectory, but for now the generic path works without any code changes, so it's fine to keep it as is. |
Description
For synthetic data generation, we may want to save the logprobs. This PR adds this functionality to
vf-eval.Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Touches request parameter normalization and rollout state/output persistence; main risk is provider-specific response shape differences causing extraction/assertion failures when
--top-logprobsis enabled.Overview
Adds a
--top-logprobs K/top_logprobseval config option to persist per-token top‑K alternatives duringprime evalruns, automatically enabling--save-resultsand savingcompletion_top_tokens/completion_top_logprobsinto output JSONL.Implements end-to-end wiring: CLI/TOML config parsing injects
logprobs/top_logprobsinto sampling args and auto-adds the new state columns; multi-turn rollouts extract and accumulate top‑K data from both chat and completion responses; sampling args are normalized so the completions API receives integerlogprobsand strips unsupportedtop_logprobs.Includes a small robustness fix for token parsing when
token_ids/prompt_token_idsare present butNone, plus new unit tests covering the sampling-arg normalization behavior and updated docs/skill guidance.Written by Cursor Bugbot for commit 9d5979f. This will update automatically on new commits. Configure here.