Skip to content

(2/5) Refactor RolloutCompletion in Async Rollout Worker #5318

Open
AmineDiro wants to merge 3 commits intomainfrom
feature/async-grpo-data-classes
Open

(2/5) Refactor RolloutCompletion in Async Rollout Worker #5318
AmineDiro wants to merge 3 commits intomainfrom
feature/async-grpo-data-classes

Conversation

@AmineDiro
Copy link
Copy Markdown
Member

@AmineDiro AmineDiro commented Mar 20, 2026

What does this PR do?

  • Introduces structured dataclasses for the async rollout pipeline: RolloutCompletion, TurnRecord, ToolCallRecord, and TaggedMessage.
  • Refactors RolloutGroup to store a unified list of RolloutCompletion objects instead of maintaining five separate parallel arrays (completions, completions_ids, tool_mask, etc.).
  • Updates _generate_one and _score_group to construct and interact with these new objects natively, drastically improving code readability and making it easier to track multi-turn properties (like execution time and tool arguments) in future PRs.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Moderate refactor of async rollout generation/scoring data flow (including tool-call accounting), which could affect token/logprob masks and metrics if any edge cases are missed; coverage is improved with new unit tests.

Overview
Refactors the async GRPO rollout pipeline to use structured dataclasses (RolloutCompletion, TurnRecord, ToolCallRecord, TaggedMessage) instead of parallel arrays for completion text/ids/logprobs/tool masks and tool-call counts.

_generate_one now returns a RolloutCompletion with per-turn timing, tool-call records (including args, failures, and durations), and a trajectory that avoids duplicating prompt/context messages across turns; _score_group is updated to consume these helpers for completion_ids, logprobs/masks, and tool metrics.

Adds a comprehensive test_async_rollout_worker.py suite covering completion flattening/masks, multi-turn trajectory uniqueness, truncation at max tool-calling turns, and tool metric computation.

Written by Cursor Bugbot for commit 21360b6. This will update automatically on new commits. Configure here.

@AmineDiro AmineDiro requested a review from qgallouedec March 20, 2026 09:32
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated
Comment thread trl/experimental/async_grpo/async_rollout_worker.py
@AmineDiro AmineDiro changed the title Refactor RolloutCompletion to store turns and compute fields lazily Refactor RolloutCompletion in Async Rollout Worker Mar 20, 2026
@AmineDiro AmineDiro changed the title Refactor RolloutCompletion in Async Rollout Worker (2/5) Refactor RolloutCompletion in Async Rollout Worker Mar 20, 2026
Tests cover RolloutCompletion message/ID handling, multi-turn
trajectory consistency, group scoring, and end-to-end generation flows
with tool calling and truncation scenarios.
Add comprehensive tests for AsyncRolloutWorker

Tests cover RolloutCompletion message/ID handling, multi-turn trajectory
construction, score group validation, and end-to-end generation flows
with tool calling. Also remove unused variable in _generate_one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants