Skip to content

fix: fix a bug in vLLM weight synchronization when vllm_enable_sleep_mode=True#5313

Open
muupan wants to merge 3 commits intohuggingface:mainfrom
muupan:fix/sleep-mode-weight-sync
Open

fix: fix a bug in vLLM weight synchronization when vllm_enable_sleep_mode=True#5313
muupan wants to merge 3 commits intohuggingface:mainfrom
muupan:fix/sleep-mode-weight-sync

Conversation

@muupan
Copy link
Copy Markdown
Contributor

@muupan muupan commented Mar 19, 2026

What does this PR do?

Fixes #5312

For the description of the bug and how this PR resolves it, see #5312.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Touches vLLM sleep-mode weight lifecycle during generation; mistakes could cause stale weights or crashes/OOMs on colocated backends.

Overview
Fixes colocated vLLM generation when enable_sleep_mode is on by explicitly tracking whether vLLM weights are currently slept/discarded.

Introduces a _llm_weights_sleeping flag that is set on init/sleep and cleared on wake, and changes generate() to call sync_weights() (which wakes weights safely) only when weights are actually sleeping, instead of unconditionally waking/reloading weights.

Written by Cursor Bugbot for commit 7388350. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread trl/generation/vllm_generation.py
@muupan
Copy link
Copy Markdown
Contributor Author

muupan commented Mar 19, 2026

I confirmed that 9ec02cd did not affect the results.

Screenshot 2026-03-20 at 0 00 32

@qgallouedec
Copy link
Copy Markdown
Member

qgallouedec commented Mar 21, 2026

thanks for the fix. I have a concern: sync_weights copies updated training weights into vLLM and is called by the trainer after each optimizer step. generate should only handle memory management (wake up sleeping weights), not trigger a full weight copy. Would the following work instead:

    def sync_weights(self):
        """Synchronize model weights to vLLM.

        Handles FSDP, DeepSpeed, PEFT weight synchronization.
        """
        # Wake up vLLM weights before loading to ensure device memory is mapped. Without this, load_weights() writes to
        # freed/unmapped memory when sleep mode is active, which crashes on backends with strict physical memory
        # management (e.g., Ascend NPU). See https://github.com/huggingface/trl/issues/5142
        if self.mode == "colocate" and self.enable_sleep_mode:
            empty_cache()  # required to avoid OOM in some cases
            self.llm.wake_up(tags=["weights"])
            # Work around for https://github.com/vllm-project/vllm/issues/29341
            try:
                self.llm.collective_rpc("reload_weights")
            except NotImplementedError:
                # Non-CUDA vLLM backends (e.g., vllm-ascend's NPUWorkerV1), don't implement reload_weights
                pass
  • remove the self.llm.collective_rpc("reload_weights") from generate

@muupan
Copy link
Copy Markdown
Contributor Author

muupan commented Mar 21, 2026

generate should only handle memory management (wake up sleeping weights), not trigger a full weight copy.

According to vllm-project/vllm#29341, as long as we use sleep level 2, calling wake_up alone is insufficient to make the model ready to generate, correct? So I guess generate must call either self.llm.collective_rpc("reload_weights") or sync_weights after wake_up to perform generation when the weights are sleeping. Since the former just reloads the initial weights, the latter seems to be the only option to use the latest weights.

@muupan
Copy link
Copy Markdown
Contributor Author

muupan commented Apr 12, 2026

@qgallouedec @albertvillanova Gentle ping on this PR when you have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vLLM weights are not synchronized with vllm_enable_sleep_mode=True

2 participants