(5/5) async grpo metrics by AmineDiro · Pull Request #5322 · huggingface/trl

AmineDiro · 2026-03-20T09:44:58Z

🔴 🔴 IMPORTANT: Depends on: feature/async-grpo-data-classes

What does this PR do?

Introduces _extract_tool_metrics and _compute_rollout_metrics to the async rollout worker.
Leverages the TurnRecord dataclass (from feature/async-grpo-data-classes) to compute and log granular execution times for individual tool calls and generations.
Logs system metrics (like generation_tok_per_s, scoring_time_ms, wait_scoring_ms) and appends them to the trainer's metrics dictionary.
Surfaces reward_mean and reward_std directly inside the worker's output logs.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

Medium Risk
Refactors async GRPO rollout outputs from flat arrays into structured per-turn records and changes how scoring builds masks/logprobs, which could affect training data integrity. Also adds new timing/metrics logging paths that may introduce subtle double-logging/overhead if not validated.

Overview
Refactors async GRPO rollouts to return a structured RolloutCompletion (with TurnRecord/ToolCallRecord) instead of parallel lists of completion ids/logprobs/tool masks and tool call counters.

Generation now records per-turn token ids/logprobs, tool-result suffix ids, tool call failures, and wall-clock durations, and scoring derives completion_mask/logprobs and computes new per-tool duration/failure metrics via _extract_tool_metrics.

Adds _compute_rollout_metrics to attach system-level throughput/latency metrics (generation_tok_per_s, scoring_time_ms, wait_scoring_ms, buffer size) to each RolloutSample, and logs reward_mean/reward_std from within the worker.

^{Written by Cursor Bugbot for commit 17dd54c. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-20T09:47:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-20T09:51:47Z

            t0 = time.monotonic()
            samples = await self._score_group(group)
            scoring_time = time.monotonic() - t0
+            self._compute_rollout_metrics(samples, scoring_time, wait_scoring)


Duplicate _compute_rollout_metrics calls produce double logging

Medium Severity

_compute_rollout_metrics is called twice in _score_loop — once at line 553 (newly added) and again at line 559 (pre-existing). This causes duplicate log output for every scoring iteration and redundant metric overwrites. The new call was likely intended to replace the old one, but the old call was not removed.

cursor · 2026-03-20T09:51:47Z

+            f"[inference] total_completion_tokens={self._total_completion_tokens}, "
+            f"generation_tok/s={generation_tok_per_sec:.1f}, scoring_time={scoring_time_ms:.1f}ms, "
+            f"wait_scoring={wait_scoring_ms:.1f}ms"
+        )


Duplicate _compute_rollout_metrics method definition in class

Low Severity

_compute_rollout_metrics is defined twice in the class — at lines 494–512 (pre-existing) and again at lines 514–532 (newly added, identical body). The second definition silently shadows the first. One definition needs to be removed.

cursor · 2026-03-20T09:51:47Z

+                        tool_execution_duration_s=0.0,
+                    )
+                )
+                return _build_completion(turns, truncated=False, total_duration=time.monotonic() - t_start)


truncated is always False, never reflects actual truncation

Medium Severity

_generate_one always returns truncated=False, even when the generation loop exits because iteration_num >= max_num_turns while tool calls were still requested. The rollout/truncated metric logged downstream will always be 0.0, making it useless for detecting truncated multi-turn trajectories.

cursor · 2026-03-20T09:51:47Z

-                    **{
-                        f"rewards/{name}": float(func_reward)
-                        for name, func_reward in zip(self.reward_func_names, per_func_rewards[:, i], strict=True)
+        logger.info(f"Rollout metrics: reward_mean={reward_mean:.4f}, reward_std={reward_std:.4f}")


Duplicate reward metrics log line in _score_group

Low Severity

The logger.info call for reward_mean/reward_std appears twice in _score_group — at line 774 (pre-existing) and again at line 799 (newly added). This produces duplicate log lines for every scored group.

AmineDiro added 2 commits March 20, 2026 09:32

Refactor RolloutCompletion to store turns and compute fields lazily

b39ac5e

feat: Collect granular rollout metrics in async_rollout_worker

1577dc8

cursor Bot reviewed Mar 20, 2026

View reviewed changes

AmineDiro requested review from albertvillanova and qgallouedec March 20, 2026 09:57

AmineDiro changed the title ~~Async grpo metrics~~ (5/5) async grpo metrics Mar 20, 2026

precommit

17dd54c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(5/5) async grpo metrics#5322

(5/5) async grpo metrics#5322
AmineDiro wants to merge 3 commits intomainfrom
feature/async-grpo-metrics

AmineDiro commented Mar 20, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 20, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 20, 2026

Uh oh!

cursor Bot Mar 20, 2026

Uh oh!

cursor Bot Mar 20, 2026

Uh oh!

cursor Bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmineDiro commented Mar 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 20, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 20, 2026

Choose a reason for hiding this comment

Duplicate _compute_rollout_metrics calls produce double logging

Uh oh!

cursor Bot Mar 20, 2026

Choose a reason for hiding this comment

Duplicate _compute_rollout_metrics method definition in class

Uh oh!

cursor Bot Mar 20, 2026

Choose a reason for hiding this comment

truncated is always False, never reflects actual truncation

Uh oh!

cursor Bot Mar 20, 2026

Choose a reason for hiding this comment

Duplicate reward metrics log line in _score_group

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AmineDiro commented Mar 20, 2026 •

edited by cursor Bot

Loading

Duplicate `_compute_rollout_metrics` calls produce double logging

Duplicate `_compute_rollout_metrics` method definition in class

`truncated` is always `False`, never reflects actual truncation

Duplicate reward metrics log line in `_score_group`