-
Notifications
You must be signed in to change notification settings - Fork 736
[Optimization] Use triton qk_norm both in Prefill and Decode. #7213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,7 +341,7 @@ def forward( | |
| forward_meta, | ||
| proxy_rmsnorm=None, | ||
| ) -> paddle.Tensor: | ||
| if proxy_rmsnorm is None and self.qk_norm_fused and forward_meta.step_use_cudagraph: | ||
| if proxy_rmsnorm is None and self.qk_norm_fused: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 建议 缺少准确性测试结果 PR 声称有性能提升("部分模型 单Kernel部分加速2~7倍"),但 Checklist 中未提供具体的准确性测试结果。 需要说明:
建议提供具体的 accuracy 测试数据,例如使用标准数据集(如 MMLU)的对比结果。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❓ 疑问 历史限制原因未说明 根据 git 历史,提交 #6080( PR 描述没有说明:
建议在 PR 描述中补充相关背景信息。 |
||
| qkv_out = qk_rmsnorm_fused( | ||
| qkv_out, | ||
| self.q_norm.weight, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -173,7 +173,7 @@ def test_consistency_between_runs(api_url, headers, consistent_payload): | |
| content1 = result1["choices"][0]["message"]["content"] | ||
|
|
||
| # base result | ||
| content2 = "视频中手机支架的颜色是黑色的。" | ||
| content2 = "视频中手机支架的颜色是黑色。" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 建议 测试文件变更应与核心修改分离到单独 PR 测试文件中将预期结果从 建议将此文本格式调整移到单独的 PR,保持 PR 的单一职责原则。 |
||
|
|
||
| # Verify that result is same as the base result | ||
| assert content1.startswith(content2), content1 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 建议 建议补充 Prefill 阶段的精度验证结果。
虽然
qk_rmsnorm_fused已有单元测试验证,但该测试主要针对小 batch size(如 128)。Prefill 阶段 batch size 通常更大(可能达 4096+),建议补充大 batch 下的精度对比测试,确保 Triton kernel 在 Prefill 场景下的数值正确性。