fix: fix a bug in vLLM weight synchronization when vllm_enable_sleep_mode=True#5313
fix: fix a bug in vLLM weight synchronization when vllm_enable_sleep_mode=True#5313muupan wants to merge 3 commits intohuggingface:mainfrom
vllm_enable_sleep_mode=True#5313Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
|
I confirmed that 9ec02cd did not affect the results.
|
|
thanks for the fix. I have a concern: def sync_weights(self):
"""Synchronize model weights to vLLM.
Handles FSDP, DeepSpeed, PEFT weight synchronization.
"""
# Wake up vLLM weights before loading to ensure device memory is mapped. Without this, load_weights() writes to
# freed/unmapped memory when sleep mode is active, which crashes on backends with strict physical memory
# management (e.g., Ascend NPU). See https://github.com/huggingface/trl/issues/5142
if self.mode == "colocate" and self.enable_sleep_mode:
empty_cache() # required to avoid OOM in some cases
self.llm.wake_up(tags=["weights"])
# Work around for https://github.com/vllm-project/vllm/issues/29341
try:
self.llm.collective_rpc("reload_weights")
except NotImplementedError:
# Non-CUDA vLLM backends (e.g., vllm-ascend's NPUWorkerV1), don't implement reload_weights
pass
|
According to vllm-project/vllm#29341, as long as we use sleep level 2, calling |
|
@qgallouedec @albertvillanova Gentle ping on this PR when you have a chance. |


What does this PR do?
Fixes #5312
For the description of the bug and how this PR resolves it, see #5312.
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Note
Medium Risk
Touches vLLM sleep-mode weight lifecycle during generation; mistakes could cause stale weights or crashes/OOMs on colocated backends.
Overview
Fixes colocated vLLM generation when
enable_sleep_modeis on by explicitly tracking whether vLLM weights are currently slept/discarded.Introduces a
_llm_weights_sleepingflag that is set on init/sleep and cleared on wake, and changesgenerate()to callsync_weights()(which wakes weights safely) only when weights are actually sleeping, instead of unconditionally waking/reloading weights.Written by Cursor Bugbot for commit 7388350. This will update automatically on new commits. Configure here.