Show conversations instead of decoded text in the completions table#5309
Show conversations instead of decoded text in the completions table#5309qgallouedec wants to merge 11 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| self._logs["completion"].extend(gather_object(completions_text)) | ||
| # Log prompts and completions | ||
| self._logs["prompt"].extend(gather_object(prompts)) | ||
| self._logs["completion"].extend(gather_object(completions)) |
There was a problem hiding this comment.
Logging conversation lists breaks drop_duplicates on unhashable type
High Severity
Previously self._logs["prompt"] contained flat strings from batch_decode, but now it stores conversation lists (list of dicts). Downstream, when log_unique_prompts=True, df.drop_duplicates(subset=["prompt"]) is called on these values. Lists are unhashable in Python, so this will raise a TypeError and crash the logging step. This affects both grpo_trainer.py and rloo_trainer.py.
Additional Locations (1)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30a67b60a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 00a5fd0. Configure here.
| axis=1, | ||
| copy=False, | ||
| ) | ||
| images.append([logging_backend.Image(image) for image in image_list]) |
There was a problem hiding this comment.
Removed null guard crashes on None image lists
High Severity
Removing the if image_list: guard in the image logging loop causes a TypeError. In VLM batches, self._logs["images"] can contain None entries for prompts without images. Iterating over these None values raises TypeError: 'NoneType' object is not iterable, crashing logging for mixed-image batches.
Reviewed by Cursor Bugbot for commit 00a5fd0. Configure here.
|
I think showing decoded text is better but I terminal based debugging can be limited especially for very long outputs. IMHO, we'll need to separate the debug info from how we vizualize it. We can decide to use UI based debugging (using |


Two related fixes to make the completions table more readable, especially for tool-calling and multi-turn setups.
Changes
Log conversations instead of decoded strings (WDYT?)
Previously, prompts and completions were decoded with
batch_decode(flat strings) before being logged:Now the raw conversation lists are logged directly and passed to
print_prompt_completions_sample(which already knows how to render them turn-by-turn):Tradeoff: the rendered system prompt (including tool definitions injected by the chat template) is no longer visible in the table, since we show the original messages rather than the fully-templated string. The gain in readability for multi-turn and tool-calling conversations outweighs this.
And for VLMs:
What do you guys think? What's the most useful?
Fix: render
tool_callsTool-calling turns were silently blank (the assistant message has
tool_callsinstead ofcontent), which the old code ignored. Now renders asname(arg=value, ...):Note
Medium Risk
Changes the shape/serialization of logged
prompt/completiondata (strings -> message lists) and alters how multimodal/tool-call content is rendered, which may affect downstream logging/analysis expecting decoded text.Overview
Completions logging now records full chat conversations instead of decoded strings in
GRPOTrainerandRLOOTrainer, improving readability for multi-turn and tool-calling rollouts.The completions table/console rendering is updated to display
tool_callsand VLM blocks (showing text blocks and a[IMAGE]placeholder), and a new_strip_images_from_messageshelper removes non-serializable PIL objects before writing parquet / logging tables. Tests are updated to assert the new tool-call rendering output.Reviewed by Cursor Bugbot for commit 00a5fd0. Bugbot is set up for automated code reviews on this repo. Configure here.