fix: fix a bug in vLLM weight synchronization when `vllm_enable_sleep_mode=True` by muupan · Pull Request #5313 · huggingface/trl

muupan · 2026-03-19T13:18:51Z

What does this PR do?

For the description of the bug and how this PR resolves it, see #5312.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

Medium Risk
Touches vLLM sleep-mode weight lifecycle during generation; mistakes could cause stale weights or crashes/OOMs on colocated backends.

Overview
Fixes colocated vLLM generation when enable_sleep_mode is on by explicitly tracking whether vLLM weights are currently slept/discarded.

Introduces a _llm_weights_sleeping flag that is set on init/sleep and cleared on wake, and changes generate() to call sync_weights() (which wakes weights safely) only when weights are actually sleeping, instead of unconditionally waking/reloading weights.

^{Written by Cursor Bugbot for commit 7388350. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

muupan · 2026-03-19T15:01:55Z

I confirmed that 9ec02cd did not affect the results.

qgallouedec · 2026-03-21T00:49:08Z

thanks for the fix. I have a concern: sync_weights copies updated training weights into vLLM and is called by the trainer after each optimizer step. generate should only handle memory management (wake up sleeping weights), not trigger a full weight copy. Would the following work instead:

    def sync_weights(self):
        """Synchronize model weights to vLLM.

        Handles FSDP, DeepSpeed, PEFT weight synchronization.
        """
        # Wake up vLLM weights before loading to ensure device memory is mapped. Without this, load_weights() writes to
        # freed/unmapped memory when sleep mode is active, which crashes on backends with strict physical memory
        # management (e.g., Ascend NPU). See https://github.com/huggingface/trl/issues/5142
        if self.mode == "colocate" and self.enable_sleep_mode:
            empty_cache()  # required to avoid OOM in some cases
            self.llm.wake_up(tags=["weights"])
            # Work around for https://github.com/vllm-project/vllm/issues/29341
            try:
                self.llm.collective_rpc("reload_weights")
            except NotImplementedError:
                # Non-CUDA vLLM backends (e.g., vllm-ascend's NPUWorkerV1), don't implement reload_weights
                pass

remove the self.llm.collective_rpc("reload_weights") from generate

muupan · 2026-03-21T06:13:50Z

generate should only handle memory management (wake up sleeping weights), not trigger a full weight copy.

According to vllm-project/vllm#29341, as long as we use sleep level 2, calling wake_up alone is insufficient to make the model ready to generate, correct? So I guess generate must call either self.llm.collective_rpc("reload_weights") or sync_weights after wake_up to perform generation when the weights are sleeping. Since the former just reloads the initial weights, the latter seems to be the only option to use the latest weights.

muupan · 2026-04-12T02:56:29Z

@qgallouedec @albertvillanova Gentle ping on this PR when you have a chance.

Fix bug in vllm sleep weight sync

a7b3a3a

muupan mentioned this pull request Mar 19, 2026

vLLM weights are not synchronized with vllm_enable_sleep_mode=True #5312

Open

5 tasks

cursor Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread trl/generation/vllm_generation.py

Call sync_weights only when weights are sleeping

9ec02cd

qgallouedec requested a review from albertvillanova March 19, 2026 15:04

Merge branch 'main' into fix/sleep-mode-weight-sync

7388350

qgallouedec added the 🩹 for patch label Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix a bug in vLLM weight synchronization when `vllm_enable_sleep_mode=True`#5313

fix: fix a bug in vLLM weight synchronization when `vllm_enable_sleep_mode=True`#5313
muupan wants to merge 3 commits intohuggingface:mainfrom
muupan:fix/sleep-mode-weight-sync

muupan commented Mar 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

muupan commented Mar 19, 2026

Uh oh!

qgallouedec commented Mar 21, 2026 •

edited

Loading

Uh oh!

muupan commented Mar 21, 2026 •

edited

Loading

Uh oh!

muupan commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

muupan commented Mar 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

muupan commented Mar 19, 2026

Uh oh!

qgallouedec commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muupan commented Mar 19, 2026 •

edited by cursor Bot

Loading

qgallouedec commented Mar 21, 2026 •

edited

Loading

muupan commented Mar 21, 2026 •

edited

Loading