Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion fastdeploy/model_executor/layers/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ def forward(
forward_meta,
proxy_rmsnorm=None,
) -> paddle.Tensor:
if proxy_rmsnorm is None and self.qk_norm_fused and forward_meta.step_use_cudagraph:
if proxy_rmsnorm is None and self.qk_norm_fused:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 建议补充 Prefill 阶段的精度验证结果。

虽然 qk_rmsnorm_fused 已有单元测试验证,但该测试主要针对小 batch size(如 128)。Prefill 阶段 batch size 通常更大(可能达 4096+),建议补充大 batch 下的精度对比测试,确保 Triton kernel 在 Prefill 场景下的数值正确性。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 缺少准确性测试结果

PR 声称有性能提升("部分模型 单Kernel部分加速2~7倍"),但 Checklist 中未提供具体的准确性测试结果。

需要说明:

  1. prefill 阶段使用 triton kernel 与原来 paddle 实现的数值精度是否一致
  2. 是否有端到端测试对比修改前后的模型输出准确性

建议提供具体的 accuracy 测试数据,例如使用标准数据集(如 MMLU)的对比结果。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 历史限制原因未说明

根据 git 历史,提交 #6080fix opt qknorm)添加了 forward_meta.step_use_cudagraph 条件,限制了只在 decode 阶段使用 qk_rmsnorm_fused

PR 描述没有说明:

  1. 为什么 [BugFix] Fix qk_norm optimization #6080 要限制为 decode 阶段(是否有已知的 bug 或性能问题?)
  2. 现在移除限制是否安全,是否解决了 [BugFix] Fix qk_norm optimization #6080 当时遇到的问题

建议在 PR 描述中补充相关背景信息。

qkv_out = qk_rmsnorm_fused(
qkv_out,
self.q_norm.weight,
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e/test_Qwen3VL_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def test_consistency_between_runs(api_url, headers, consistent_payload):
content1 = result1["choices"][0]["message"]["content"]

# base result
content2 = "视频中手机支架的颜色是黑色的。"
content2 = "视频中手机支架的颜色是黑色。"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 测试文件变更应与核心修改分离到单独 PR

测试文件中将预期结果从"黑色的"改为"黑色"是一个独立的文本格式调整(去掉助词"的"),与本次 PR 的核心优化(移除 step_use_cudagraph 条件)无关。

建议将此文本格式调整移到单独的 PR,保持 PR 的单一职责原则。


# Verify that result is same as the base result
assert content1.startswith(content2), content1
Expand Down
Loading