integrate flydsl gdr decode by ganyi1996ppo · Pull Request #568 · ROCm/ATOM

ganyi1996ppo · 2026-04-15T06:22:28Z

Motivation

before

============ Serving Benchmark Result ============
Successful requests:                     64        
Failed requests:                         0         
Maximum request concurrency:             16        
Benchmark duration (s):                  66.15     
Total input tokens:                      524288    
Total generated tokens:                  65536     
Request throughput (req/s):              0.97      
Output token throughput (tok/s):         990.73    
Peak output token throughput (tok/s):    1200.00   
Peak concurrent requests:                32.00     
Total token throughput (tok/s):          8916.56   
---------------Time to First Token----------------
Mean TTFT (ms):                          1589.77   
Median TTFT (ms):                        1723.01   
P99 TTFT (ms):                           2607.30   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          14.60     
Median TPOT (ms):                        14.42     
P99 TPOT (ms):                           15.93     
---------------Inter-token Latency----------------
Mean ITL (ms):                           14.60     
Median ITL (ms):                         13.69     
P99 ITL (ms):                            14.54     
==================================================

after

============ Serving Benchmark Result ============
Successful requests:                     64        
Failed requests:                         0         
Maximum request concurrency:             16        
Benchmark duration (s):                  63.30     
Total input tokens:                      524288    
Total generated tokens:                  65536     
Request throughput (req/s):              1.01      
Output token throughput (tok/s):         1035.35   
Peak output token throughput (tok/s):    1248.00   
Peak concurrent requests:                32.00     
Total token throughput (tok/s):          9318.17   
---------------Time to First Token----------------
Mean TTFT (ms):                          1579.60   
Median TTFT (ms):                        1747.44   
P99 TTFT (ms):                           2566.66   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          13.91     
Median TPOT (ms):                        13.75     
P99 TPOT (ms):                           15.14     
---------------Inter-token Latency----------------
Mean ITL (ms):                           13.91     
Median ITL (ms):                         12.97     
P99 ITL (ms):                            13.71     
==================================================

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Integrates an optional FlyDSL-based gated-delta-rule (GDR) decode kernel into the vLLM plugin GDN attention backend, and freezes GDN-specific parameters in the Qwen3 Next model implementation.

Changes:

Add an optional flydsl_gdr_decode path for decode-time recurrent attention, with fallback to the existing fused kernel when unavailable.
Mark dt_bias and A_log as non-trainable (requires_grad=False) parameters in qwen3_next.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`atom/plugin/vllm/attention_backend/attention_gdn.py`	Adds FlyDSL GDR decode integration with import-time feature gating and fallback path.
`atom/models/qwen3_next.py`	Freezes `dt_bias` and `A_log` parameters (no gradients).

Comments suppressed due to low confidence (2)

atom/plugin/vllm/attention_backend/attention_gdn.py:44

The large commented-out maybe_dump_flydsl_gdr_inputs block adds maintenance overhead and makes it harder to review/grep the file. If this debug dump is needed, wire it up behind a real flag/env var and keep the helper active (or move it to a dedicated debug/util module); otherwise, remove the dead commented code.

        super().__init__()

    def forward(
        self,
        q: torch.Tensor,

atom/plugin/vllm/attention_backend/attention_gdn.py:438

The PR template sections (Motivation/Technical Details/Test Plan/Test Result) are still empty. Since this change alters the decode attention path and may depend on a specific aiter version, please document the required aiter version and how this was tested/benchmarked.

        else:
            core_attn_out[:num_actual_tokens] = core_attn_out_non_spec.squeeze(0)

        return core_attn_out

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T06:27:33Z

+    USE_FLYDSL_GDR = False
+    print(
+        "Failed to import flydsl_gdr_decode. Please make sure you have the latest version of aiter installed."
+    )



Avoid printing to stdout when the optional flydsl_gdr_decode import fails; this can spam logs in library/server contexts and in tests. Prefer the project’s logging/warnings mechanism (and ideally emit the message only when the decode path is selected) while falling back to the non-flydsl implementation.

ganyi1996ppo added 3 commits April 14, 2026 09:36

eager maybe right, score reggression

59acac9

Signed-off-by: ganyi <ygan@amd.com>

integrate gdr decode kernel

4f38725

Signed-off-by: ganyi <ygan@amd.com>

integrate flydsl gdr

4e2b9cd

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings April 15, 2026 06:22

ganyi1996ppo marked this pull request as draft April 15, 2026 06:22

remove redundant code

2322372

Signed-off-by: ganyi <ygan@amd.com>

Copilot started reviewing on behalf of ganyi1996ppo April 15, 2026 06:24 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

xytpai previously approved these changes Apr 15, 2026

View reviewed changes

ganyi1996ppo marked this pull request as ready for review April 15, 2026 14:37

Merge branch 'main' into ganyi/gdr_linear_decode

36c999d

ganyi1996ppo dismissed xytpai’s stale review via 36c999d April 15, 2026 14:39

valarLip approved these changes Apr 16, 2026

View reviewed changes

valarLip merged commit dcb9028 into main Apr 16, 2026
24 of 29 checks passed

valarLip deleted the ganyi/gdr_linear_decode branch April 16, 2026 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate flydsl gdr decode#568

integrate flydsl gdr decode#568
valarLip merged 5 commits intomainfrom
ganyi/gdr_linear_decode

ganyi1996ppo commented Apr 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ganyi1996ppo commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ganyi1996ppo commented Apr 15, 2026 •

edited

Loading