Skip to content

Show conversations instead of decoded text in the completions table#5309

Draft
qgallouedec wants to merge 11 commits intomainfrom
log-conversations-and-tool-calls
Draft

Show conversations instead of decoded text in the completions table#5309
qgallouedec wants to merge 11 commits intomainfrom
log-conversations-and-tool-calls

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented Mar 19, 2026

Two related fixes to make the completions table more readable, especially for tool-calling and multi-turn setups.

Changes

Log conversations instead of decoded strings (WDYT?)

Previously, prompts and completions were decoded with batch_decode (flat strings) before being logged:

Screenshot 2026-03-18 at 8 38 08 PM

Now the raw conversation lists are logged directly and passed to print_prompt_completions_sample (which already knows how to render them turn-by-turn):

Screenshot 2026-03-18 at 8 39 23 PM

Tradeoff: the rendered system prompt (including tool definitions injected by the chat template) is no longer visible in the table, since we show the original messages rather than the fully-templated string. The gain in readability for multi-turn and tool-calling conversations outweighs this.

And for VLMs:

Screenshot 2026-03-19 at 12 12 25 PM

What do you guys think? What's the most useful?

Fix: render tool_calls

Tool-calling turns were silently blank (the assistant message has tool_calls instead of content), which the old code ignored. Now renders as name(arg=value, ...):


Note

Medium Risk
Changes the shape/serialization of logged prompt/completion data (strings -> message lists) and alters how multimodal/tool-call content is rendered, which may affect downstream logging/analysis expecting decoded text.

Overview
Completions logging now records full chat conversations instead of decoded strings in GRPOTrainer and RLOOTrainer, improving readability for multi-turn and tool-calling rollouts.

The completions table/console rendering is updated to display tool_calls and VLM blocks (showing text blocks and a [IMAGE] placeholder), and a new _strip_images_from_messages helper removes non-serializable PIL objects before writing parquet / logging tables. Tests are updated to assert the new tool-call rendering output.

Reviewed by Cursor Bugbot for commit 00a5fd0. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

self._logs["completion"].extend(gather_object(completions_text))
# Log prompts and completions
self._logs["prompt"].extend(gather_object(prompts))
self._logs["completion"].extend(gather_object(completions))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging conversation lists breaks drop_duplicates on unhashable type

High Severity

Previously self._logs["prompt"] contained flat strings from batch_decode, but now it stores conversation lists (list of dicts). Downstream, when log_unique_prompts=True, df.drop_duplicates(subset=["prompt"]) is called on these values. Lists are unhashable in Python, so this will raise a TypeError and crash the logging step. This affects both grpo_trainer.py and rloo_trainer.py.

Additional Locations (1)
Fix in Cursor Fix in Web

Comment thread trl/trainer/utils.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30a67b60a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread trl/trainer/grpo_trainer.py
Comment thread trl/trainer/rloo_trainer.py Outdated
Comment thread trl/trainer/utils.py
Comment thread trl/trainer/rloo_trainer.py Outdated
Comment thread trl/trainer/rloo_trainer.py Outdated
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@qgallouedec qgallouedec marked this pull request as draft April 4, 2026 03:34
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 00a5fd0. Configure here.

axis=1,
copy=False,
)
images.append([logging_backend.Image(image) for image in image_list])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed null guard crashes on None image lists

High Severity

Removing the if image_list: guard in the image logging loop causes a TypeError. In VLM batches, self._logs["images"] can contain None entries for prompts without images. Iterating over these None values raises TypeError: 'NoneType' object is not iterable, crashing logging for mixed-image batches.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 00a5fd0. Configure here.

@AmineDiro
Copy link
Copy Markdown
Member

I think showing decoded text is better but I terminal based debugging can be limited especially for very long outputs. IMHO, we'll need to separate the debug info from how we vizualize it. We can decide to use UI based debugging (using trackio for example) or build specific TUI for the debug info. wdyt @qgallouedec ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants