Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| t0 = time.monotonic() | ||
| samples = await self._score_group(group) | ||
| scoring_time = time.monotonic() - t0 | ||
| self._compute_rollout_metrics(samples, scoring_time, wait_scoring) |
There was a problem hiding this comment.
Duplicate _compute_rollout_metrics calls produce double logging
Medium Severity
_compute_rollout_metrics is called twice in _score_loop — once at line 553 (newly added) and again at line 559 (pre-existing). This causes duplicate log output for every scoring iteration and redundant metric overwrites. The new call was likely intended to replace the old one, but the old call was not removed.
| f"[inference] total_completion_tokens={self._total_completion_tokens}, " | ||
| f"generation_tok/s={generation_tok_per_sec:.1f}, scoring_time={scoring_time_ms:.1f}ms, " | ||
| f"wait_scoring={wait_scoring_ms:.1f}ms" | ||
| ) |
There was a problem hiding this comment.
Duplicate _compute_rollout_metrics method definition in class
Low Severity
_compute_rollout_metrics is defined twice in the class — at lines 494–512 (pre-existing) and again at lines 514–532 (newly added, identical body). The second definition silently shadows the first. One definition needs to be removed.
| tool_execution_duration_s=0.0, | ||
| ) | ||
| ) | ||
| return _build_completion(turns, truncated=False, total_duration=time.monotonic() - t_start) |
There was a problem hiding this comment.
truncated is always False, never reflects actual truncation
Medium Severity
_generate_one always returns truncated=False, even when the generation loop exits because iteration_num >= max_num_turns while tool calls were still requested. The rollout/truncated metric logged downstream will always be 0.0, making it useless for detecting truncated multi-turn trajectories.
| **{ | ||
| f"rewards/{name}": float(func_reward) | ||
| for name, func_reward in zip(self.reward_func_names, per_func_rewards[:, i], strict=True) | ||
| logger.info(f"Rollout metrics: reward_mean={reward_mean:.4f}, reward_std={reward_std:.4f}") |
There was a problem hiding this comment.


🔴 🔴 IMPORTANT: Depends on: feature/async-grpo-data-classes
What does this PR do?
_extract_tool_metricsand_compute_rollout_metricsto the async rollout worker.TurnRecorddataclass (from feature/async-grpo-data-classes) to compute and log granular execution times for individual tool calls and generations.generation_tok_per_s,scoring_time_ms,wait_scoring_ms) and appends them to the trainer's metrics dictionary.reward_meanandreward_stddirectly inside the worker's output logs.Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Note
Medium Risk
Refactors async GRPO rollout outputs from flat arrays into structured per-turn records and changes how scoring builds masks/logprobs, which could affect training data integrity. Also adds new timing/metrics logging paths that may introduce subtle double-logging/overhead if not validated.
Overview
Refactors async GRPO rollouts to return a structured
RolloutCompletion(withTurnRecord/ToolCallRecord) instead of parallel lists of completion ids/logprobs/tool masks and tool call counters.Generation now records per-turn token ids/logprobs, tool-result suffix ids, tool call failures, and wall-clock durations, and scoring derives
completion_mask/logprobs and computes new per-tool duration/failure metrics via_extract_tool_metrics.Adds
_compute_rollout_metricsto attach system-level throughput/latency metrics (generation_tok_per_s,scoring_time_ms,wait_scoring_ms, buffer size) to eachRolloutSample, and logsreward_mean/reward_stdfrom within the worker.Written by Cursor Bugbot for commit 17dd54c. This will update automatically on new commits. Configure here.