cd /home/lzx/Diffulex-remote-main
export PYTHONFAULTHANDLER=1 \
http_proxy=http://127.0.0.1:17780 https_proxy=http://127.0.0.1:17780 \
HTTP_PROXY=http://127.0.0.1:17780 HTTPS_PROXY=http://127.0.0.1:17780 \
all_proxy=http://127.0.0.1:17780 ALL_PROXY=http://127.0.0.1:17780 \
no_proxy=localhost,127.0.0.1,::1 NO_PROXY=localhost,127.0.0.1,::1 \
CUDA_HOME=$HOME/cuda-12.2 PATH="$CUDA_HOME/bin:$PATH" \
LD_LIBRARY_PATH="$CUDA_HOME/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
CUDA_VISIBLE_DEVICES=0,1,2,3 UV_HTTP_TIMEOUT=180
uv run python examples/test_dream_dvllm_human_eval.py > log/test_dvllm_dream_human_eval.remote_main.log 2>&1
[rank0]: File "/home/lzx/Diffulex-remote-main/examples/test_dream_dvllm_human_eval.py", line 74, in <module>
[rank0]: outputs = LLM.generate(prompts[:], sampling_params)
[rank0]: File "/home/lzx/Diffulex/diffulex/legacy/engine/llm_engine.py", line 118, in generate
[rank0]: output, num_tokens, is_prefill, cur_n_diff_steps, _ = self.step()
[rank0]: File "/home/lzx/Diffulex/diffulex/legacy/engine/llm_engine.py", line 77, in step
[rank0]: sample_output = self.model_runner.call("run", seqs, is_prefill)
[rank0]: File "/home/lzx/Diffulex/diffulex/legacy/engine/model_runner.py", line 678, in run
[rank0]: input_ids, positions = self.prepare_prefill(seqs) if is_prefill else self.prepare_decode(seqs)
[rank0]: File "/home/lzx/Diffulex/diffulex/legacy/engine/model_runner.py", line 586, in prepare_decode
[rank0]: if cur_map[local_start_idx()] == seq.num_diffusion_blocks - 1:
while start_idx < end_idx and not is_last_block and not meet_active_block:
local_start_idx = lambda: start_idx % seq.block_size
diffusion_block = seq.diffusion_blocks[cur_map[local_start_idx()]]
...
if diffusion_block.is_in_cache:
...
start_idx += step
elif diffusion_block.is_to_cache:
...
start_idx += step
elif diffusion_block.is_active:
meet_active_block = True
# 其他状态未处理 → start_idx 不变,循环不退出
描述
origin/main(commite9e9bb08ad8396646c8c1378d252c0facdfabeb9)直接运行examples/test_dream_dvllm_human_eval.py,多次卡死在 decode 阶段的model_runner.prepare_decode内层 while 循环。diffulex/legacy/engine/model_runner.py的 decode 路径,cur_map[local_start_idx()]取到的 block 既非is_in_cache、也非is_to_cache、也非is_active,导致start_idx不推进、循环不退出。复现环境
/home/lzx/Diffulex-remote-main,来自origin/main(上述 commit),无本地修改(仅.venv未跟踪)。CUDA_HOME=$HOME/cuda-12.2,PATH="$CUDA_HOME/bin:$PATH",LD_LIBRARY_PATH="$CUDA_HOME/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"。CUDA_VISIBLE_DEVICES=0,1,2,3。http_proxy/https_proxy/all_proxy=http://127.0.0.1:17780(本地代理)。uv run(Python venv 在仓库.venv/)。PYTHONFAULTHANDLER=1,UV_HTTP_TIMEOUT=180。复现步骤
观察到的行为
Generating: 79%|█████ | 130/164 ...后无新输出,GPU 利用率掉到 0%,进程持续占用 CPU。prepare_decode中的 while 循环:期望行为
初步推测
else: break或记录异常并推进start_idx),同时输出遇到的 block 状态,帮助确认正确语义。