[RL] support moe-topk use topk_reduce_func#7218
[RL] support moe-topk use topk_reduce_func#7218zoooo0820 wants to merge 4 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7218 +/- ##
==========================================
Coverage ? 72.82%
==========================================
Files ? 377
Lines ? 53217
Branches ? 8311
==========================================
Hits ? 38755
Misses ? 11739
Partials ? 2723
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy-bot
left a comment
There was a problem hiding this comment.
📋 Review 摘要
PR 概述:支持 MoE TopK 通过 topk_reduce_func 进行自定义归一化,当 FD_USE_PHI_MOE_TOPK 生效时在算子外部计算 normalize 和 scaling,并移除了此前用于数值对齐的 moe_topk_select 函数实现。
变更范围:fastdeploy/model_executor/layers/moe/、fastdeploy/model_executor/models/
影响面 Tag:[RL] [OP]
📝 PR 规范检查
PR 标题包含 [RL] tag,符合规范。PR 描述填写了 Motivation 和 Modifications,但 Checklist 中未勾选 Accuracy Tests。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/model_executor/layers/moe/moe.py:136 |
FD_USE_PHI_MOE_TOPK=True 且 renormalize=True 时,若 topk_reduce_func=None 则归一化不会执行 |
| 🟡 建议 | fastdeploy/model_executor/models/glm4_moe.py:185 |
其他模型未传入 topk_reduce_func,可能在特定环境下产生数值问题 |
总体评价
代码变更逻辑清晰,通过支持 topk_reduce_func 参数实现了 MoE TopK 的自定义归一化机制。但存在以下需要注意的问题:
-
行为不一致风险:当
FD_USE_PHI_MOE_TOPK=True且renormalize=True时,只有当topk_reduce_func不为 None 时才会执行归一化。如果调用方未传入该参数,归一化操作会被静默跳过,可能导致数值结果与预期不符。 -
模型兼容性问题:目前只有
glm4_moe.py中传入了topk_reduce_func,其他模型(如deepseek_v3.py、gpt_oss.py、ernie4_5_moe.py等)在使用FusedMoE时未传入该参数。如果这些模型在FD_USE_PHI_MOE_TOPK=True环境下运行且renormalize=True,可能会遇到数值问题。
建议在 FD_USE_PHI_MOE_TOPK 环境变量启用时,对 topk_reduce_func 为 None 的情况添加明确的警告或错误处理,或者提供默认的归一化行为。
4d8f70c to
36658ed
Compare
Motivation
由于不同模型在组网部分的moe-topk计算逻辑有少许差异,支持在组网时传入
topk_reduce_func保证数值的准确性,并在FD_USE_PHI_MOE_TOPK生效时,不在noaux_ac算子内部计算normalize和scaling,而是使用传入的topk_reduce_func在算子外部计算normalize和scaling由于验证上述组合方式可以保证数值准确性,移除了此前数值对齐的
moe_topk_select散OP实现Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.