support bf16 moe tp permute, group_gemm, unpermute#7194
support bf16 moe tp permute, group_gemm, unpermute#7194ckl117 wants to merge 1 commit intoPaddlePaddle:release/2.5from
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.5 #7194 +/- ##
==============================================
Coverage ? 69.47%
==============================================
Files ? 390
Lines ? 54384
Branches ? 8575
==============================================
Hits ? 37786
Misses ? 13869
Partials ? 2729
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-07 14:42 CST
📋 Review 摘要
PR 概述:为 Cutlass MoE 后端添加 bf16 数据类型的 moe_permute、group_gemm、moe_unpermute 支持,通过 FD_USE_PHI_MOE_PERMUTE 环境变量控制新代码路径
变更范围:model_executor/layers/moe/
影响面 Tag:OP Models
📝 PR 规范检查
PR 标题缺少有效的功能标签 Tag。
标题建议(可直接复制):
[OP] support bf16 moe tp permute, group_gemm, unpermute
描述建议:建议在 Motivation 和 Modifications 部分补充以下内容:
- Motivation:说明为何需要在 Cutlass 后端支持 bf16 MOE permute 路径
- Modifications:说明新增
deep_batch_gemm函数和FD_USE_PHI_MOE_PERMUTE分支的设计
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fused_moe_cutlass_backend.py:59 |
paddlefleet_ops 可能为 None,缺少防御性检查 |
| 🟡 建议 | fused_moe_cutlass_backend.py:352 |
permute_scale 变量未被使用 |
总体评价
代码逻辑正确,与 fused_moe_deepgemm_backend.py 中的实现模式保持一致。建议添加对 paddlefleet_ops 的防御性检查以提高代码健壮性。测试用例已在 tests/layers/test_deepgemm_fused_moe.py 中覆盖。
|
|
||
| def deep_batch_gemm(x, y, expert_idx_per_token): | ||
| out = paddle.empty([x.shape[0], y.shape[-1]], dtype=x.dtype) | ||
| paddlefleet_ops.deep_gemm.m_grouped_bf16_gemm_nn_contiguous(x, y, out, expert_idx_per_token) |
There was a problem hiding this comment.
🟡 建议 paddlefleet_ops 可能为 None,缺少防御性检查
paddlefleet_ops 是通过 try_import(["paddlefleet.ops"]) 导入的,当 paddlefleet 模块不可用时会返回 None。如果用户在未安装 paddlefleet 的环境下启用 FD_USE_PHI_MOE_PERMUTE,此处会抛出 AttributeError。
建议添加防御性检查:
def deep_batch_gemm(x, y, expert_idx_per_token):
if paddlefleet_ops is None:
raise RuntimeError(
"paddlefleet.ops is required for FD_USE_PHI_MOE_PERMUTE=1. "
"Please install paddlefleet or disable this feature."
)
out = paddle.empty([x.shape[0], y.shape[-1]], dtype=x.dtype)
paddlefleet_ops.deep_gemm.m_grouped_bf16_gemm_nn_contiguous(x, y, out, expert_idx_per_token)
return out| permute_input, | ||
| permute_indices_per_token, # == zipped_expertwise_rowmap | ||
| topk_weights, | ||
| permute_scale, |
There was a problem hiding this comment.
🟡 建议 permute_scale 变量未被使用
moe_permute 返回的 permute_scale 在后续代码中未被使用。如果确实不需要,建议用 _ 替代以明确表示该变量被有意忽略:
(
permute_input,
permute_indices_per_token,
topk_weights,
_, # permute_scale not used in bf16 path
expert_idx_per_token,
) = paddle.nn.functional.moe_permute(...)注意:行 423 处也存在相同情况。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.