[Cherry-pick][Optimization] enable trtllm_all_reduce fusion kernel in glm model by BingooYang · Pull Request #7219 · PaddlePaddle/FastDeploy

BingooYang · 2026-04-07T10:01:01Z

Motivation

FD接入trtllm_allreduce_fusion算子

Modifications

FD新增flashinfer allreduce fusion算子接入
更改GLM-Air-4.5模型组网结构接入trtllm_allreduce_fusion算子（默认不开启）
新增命令行参数--enable-flashinfer-allreduce-fusion，通过该参数来使能trtllm_allreduce_fusion
新增trtllm_allreduce_fusion算子单测
将def has_flashinfer()函数挪动到utils.py中

Usage or Command

H卡和B卡本地测试均通过
python -m fastdeploy.entrypoints.openai.api_server --model /root/paddlejob/workspace/bingoo/model/GLM-4.5-Air --tensor-parallel-size 4 --port 8185 --max-num-batched-tokens 2048 --enable-flashinfer-allreduce-fusion

Accuracy Tests

python -m paddle.distributed.launch --gpus=0,1 ./FastDeploy/tests/layers/test_rms_allreduce_fusion.py

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-07T10:01:09Z

Thanks for your contribution!

fastdeploy-bot

📋 Review 摘要

PR 概述：为 GLM-Air-4.5 模型启用 trtllm_all_reduce fusion kernel，通过新增 flashinfer 融合算子提升性能。

变更范围：model_executor/layers/、model_executor/models/glm4_moe.py、config/、engine/

影响面 Tag：[Optimization] [Models] [OP]

📝 PR 规范检查

PR 符合规范：

标题包含有效 Tag [Optimization]
Motivation 和 Modifications 描述清晰
Checklist 项目完整
已提供 Usage 和 Accuracy Tests

问题

级别	文件	概述
🔴 Bug	`normalization.py:249-252`	fusion 返回 None 时会导致 AttributeError
🟡 建议	`normalization.py:249`, `linear.py:941`, `flashinfer_comm_fusion.py:87,118`	max_token_num 硬编码 2048，建议从配置读取

总体评价

PR 实现了 trtllm_all_reduce fusion kernel 的接入，测试覆盖了主要路径。但存在一个关键的 bug：当 flashinfer 不可用时，fusion 函数返回 (None, None)，但调用方没有正确处理这种情况，会导致运行时错误。建议修复后合并。

fastdeploy-bot · 2026-04-08T03:04:17Z

fastdeploy/model_executor/layers/normalization.py

+            # enable trtllm all reduce fusion
+            elif self.enable_all_reduce_fusion and x.shape[0] <= 2048:
+                norm_out = flashinfer_allreduce_residual_rmsnorm(
+                    fd_config=self.fd_config, input_tensor=x, residual=residual_input, weight=self.weight, eps=self.eps


🔴 Bug 当 flashinfer_allreduce_residual_rmsnorm 返回 (None, None) 时（flashinfer 不可用或 workspace 未初始化），代码会尝试访问 norm_out[0].astype()，导致 AttributeError: 'NoneType' object has no attribute 'astype'。

建议修复方式：

# enable trtllm all reduce fusion elif self.enable_all_reduce_fusion and x.shape[0] <= 2048: norm_out = flashinfer_allreduce_residual_rmsnorm( fd_config=self.fd_config, input_tensor=x, residual=residual_input, weight=self.weight, eps=self.eps ) # Check if fusion succeeded, fallback to normal path if not if norm_out[0] is None or norm_out[1] is None: norm_out = self.norm_func( x, norm_weight=self.weight, norm_bias=None, epsilon=self.eps, begin_norm_axis=self.begin_norm_axis, bias=self.bias, residual=residual_input, quant_scale=(-1 if self.quant_scale is None else self.quant_scale), quant_round_type=self.quant_round_type, quant_max_bound=self.quant_max_bound, quant_min_bound=self.quant_min_bound, )

fastdeploy-bot · 2026-04-08T03:04:17Z

fastdeploy/model_executor/layers/flashinfer_comm_fusion.py

+    residual: paddle.Tensor,
+    weight: paddle.Tensor,
+    eps: float = 1e-6,
+    max_token_num: int = 2048,


🟡 建议 max_token_num 在多处硬编码为 2048，限制了配置灵活性。建议从 FDConfig 中读取此参数。

影响位置：

linear.py:941 - out.shape[0] <= 2048

normalization.py:249 - x.shape[0] <= 2048

flashinfer_comm_fusion.py:87 - max_token_num: int = 2048 (默认参数)

flashinfer_comm_fusion.py:118 - max_token_num: int = 2048 (默认参数)

建议在 FDConfig 中添加 flashinfer_allreduce_max_token_num 字段，统一配置。

BingooYang added 17 commits April 7, 2026 17:49

enable trtllm_all_reduce fusion kernel in glm model

0009064

update flashinfer paddle version

6bfe1d9

format update

e9dbd82

fix a bug

bb80bd8

modify test

756e6d8

modify test

fe3df94

support empty tensor and modify test

a46338c

fix test_linear config issues

1ae1e27

modify test name

777edc6

add edge test case

a3979f7

modify format

cbe082d

fix conflict

4e52802

modify default max token num in trtllm_allreduce_fusion

2d18ba6

add max token num branch for trtllm_allreduce_fusion

9b01b07

fix format

9e84a16

fix rmsnorm config issue

f825e4b

modify 2025 to 2026

6b1554f

BingooYang had a problem deploying to Metax_ci April 7, 2026 10:01 — with GitHub Actions Failure

del redundent file

5b2d2f8

BingooYang had a problem deploying to Metax_ci April 7, 2026 12:45 — with GitHub Actions Failure

fastdeploy-bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-pick][Optimization] enable trtllm_all_reduce fusion kernel in glm model#7219

[Cherry-pick][Optimization] enable trtllm_all_reduce fusion kernel in glm model#7219
BingooYang wants to merge 18 commits intoPaddlePaddle:release/2.5from
BingooYang:2.5/trtllm_allreduce

BingooYang commented Apr 7, 2026

Uh oh!

paddle-bot bot commented Apr 7, 2026

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Uh oh!

fastdeploy-bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BingooYang commented Apr 7, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 7, 2026

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants