Add support for logging extra columns in reward functions and update related tests by qgallouedec · Pull Request #5308 · huggingface/trl

qgallouedec · 2026-03-19T00:53:15Z

This PR uses the log_extra kwargs (added in #5233, cc @manueldeprada) in TRL's built-in reward functions, and add the console table to display extra columns.

Changes

accuracy_reward: logs solution, gold_parsed, and answer_parsed via log_extra
reasoning_accuracy_reward: same extra columns, plus "[incomplete reasoning]" sentinel for completions missing the reasoning delimiter.

All new parameters default to None so the functions can still be called directly outside a trainer (e.g., in tests).

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
from trl.rewards import accuracy_reward

dataset = load_dataset("trl-lib/DeepMath-103K", split="train").select(range(64))

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        num_generations=4,
        max_completion_length=512,
        log_completions=True,
        logging_steps=1,
        max_steps=10,
    ),
    train_dataset=dataset,
)
trainer.train()

Note

Low Risk
Low risk: changes are limited to logging/diagnostic paths and optional parameters, with minimal impact on training logic aside from additional metadata collection.

Overview
Adds support for displaying arbitrary extra per-completion metadata in the rich console completions table via a new extra argument to print_prompt_completions_sample, including subsampling behavior.

Extends accuracy_reward and reasoning_accuracy_reward to accept an optional log_extra callback and emit solution, gold_parsed, and answer_parsed (with sentinel values for unparseable/skipped/incomplete reasoning cases) for trainer logging.

Updates GRPOTrainer and RLOOTrainer to pass their collected extra logs into the console printer, and adds/updates tests to validate the new extra-column table output.

^{Reviewed by Cursor Bugbot for commit 69daa80. Bugbot is set up for automated code reviews on this repo. Configure here.}

…related tests

HuggingFaceDocBuilderDev · 2026-03-19T00:55:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 678d96dfa5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

qgallouedec · 2026-04-04T01:20:50Z

This PR is a minor thing and didn't get reviewed after 2 weeks, I'll go ahead and merge, feel free to suggest modification in subsequent PR is needed

Add support for logging extra columns in reward functions and update …

678d96d

…related tests

qgallouedec requested review from AmineDiro and albertvillanova March 19, 2026 00:53

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread trl/rewards/accuracy_rewards.py

Merge branch 'main' into reward-logging

7a69060

qgallouedec requested a review from kashif March 25, 2026 13:54

qgallouedec added 4 commits March 27, 2026 10:57

Merge branch 'main' into reward-logging

2eea83b

Merge branch 'main' into reward-logging

b0201c6

Merge branch 'main' into reward-logging

9742221

Merge branch 'main' into reward-logging

69daa80

qgallouedec enabled auto-merge (squash) April 4, 2026 01:20

qgallouedec disabled auto-merge April 4, 2026 01:49

qgallouedec merged commit 767595d into main Apr 4, 2026
14 checks passed

qgallouedec deleted the reward-logging branch April 4, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for logging extra columns in reward functions and update related tests#5308

Add support for logging extra columns in reward functions and update related tests#5308
qgallouedec merged 6 commits intomainfrom
reward-logging

qgallouedec commented Mar 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented Mar 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

qgallouedec commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Mar 19, 2026 •

edited by cursor Bot

Loading

qgallouedec commented Apr 4, 2026 •

edited

Loading