Skip to content

Add support for logging extra columns in reward functions and update related tests#5308

Merged
qgallouedec merged 6 commits intomainfrom
reward-logging
Apr 4, 2026
Merged

Add support for logging extra columns in reward functions and update related tests#5308
qgallouedec merged 6 commits intomainfrom
reward-logging

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented Mar 19, 2026

This PR uses the log_extra kwargs (added in #5233, cc @manueldeprada) in TRL's built-in reward functions, and add the console table to display extra columns.

Changes

  • accuracy_reward: logs solution, gold_parsed, and answer_parsed via log_extra
  • reasoning_accuracy_reward: same extra columns, plus "[incomplete reasoning]" sentinel for completions missing the reasoning delimiter.

All new parameters default to None so the functions can still be called directly outside a trainer (e.g., in tests).

Screenshot 2026-03-18 at 6 43 23 PM
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
from trl.rewards import accuracy_reward

dataset = load_dataset("trl-lib/DeepMath-103K", split="train").select(range(64))

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        num_generations=4,
        max_completion_length=512,
        log_completions=True,
        logging_steps=1,
        max_steps=10,
    ),
    train_dataset=dataset,
)
trainer.train()

Note

Low Risk
Low risk: changes are limited to logging/diagnostic paths and optional parameters, with minimal impact on training logic aside from additional metadata collection.

Overview
Adds support for displaying arbitrary extra per-completion metadata in the rich console completions table via a new extra argument to print_prompt_completions_sample, including subsampling behavior.

Extends accuracy_reward and reasoning_accuracy_reward to accept an optional log_extra callback and emit solution, gold_parsed, and answer_parsed (with sentinel values for unparseable/skipped/incomplete reasoning cases) for trainer logging.

Updates GRPOTrainer and RLOOTrainer to pass their collected extra logs into the console printer, and adds/updates tests to validate the new extra-column table output.

Reviewed by Cursor Bugbot for commit 69daa80. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 678d96dfa5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread trl/rewards/accuracy_rewards.py
@qgallouedec qgallouedec requested a review from kashif March 25, 2026 13:54
@qgallouedec qgallouedec enabled auto-merge (squash) April 4, 2026 01:20
@qgallouedec
Copy link
Copy Markdown
Member Author

qgallouedec commented Apr 4, 2026

This PR is a minor thing and didn't get reviewed after 2 weeks, I'll go ahead and merge, feel free to suggest modification in subsequent PR is needed

@qgallouedec qgallouedec disabled auto-merge April 4, 2026 01:49
@qgallouedec qgallouedec merged commit 767595d into main Apr 4, 2026
14 checks passed
@qgallouedec qgallouedec deleted the reward-logging branch April 4, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants