Skip to content

integrate flydsl gdr decode#568

Merged
valarLip merged 5 commits intomainfrom
ganyi/gdr_linear_decode
Apr 16, 2026
Merged

integrate flydsl gdr decode#568
valarLip merged 5 commits intomainfrom
ganyi/gdr_linear_decode

Conversation

@ganyi1996ppo
Copy link
Copy Markdown
Contributor

@ganyi1996ppo ganyi1996ppo commented Apr 15, 2026

Motivation

depends on ROCm/aiter#2746

before

============ Serving Benchmark Result ============
Successful requests:                     64        
Failed requests:                         0         
Maximum request concurrency:             16        
Benchmark duration (s):                  66.15     
Total input tokens:                      524288    
Total generated tokens:                  65536     
Request throughput (req/s):              0.97      
Output token throughput (tok/s):         990.73    
Peak output token throughput (tok/s):    1200.00   
Peak concurrent requests:                32.00     
Total token throughput (tok/s):          8916.56   
---------------Time to First Token----------------
Mean TTFT (ms):                          1589.77   
Median TTFT (ms):                        1723.01   
P99 TTFT (ms):                           2607.30   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          14.60     
Median TPOT (ms):                        14.42     
P99 TPOT (ms):                           15.93     
---------------Inter-token Latency----------------
Mean ITL (ms):                           14.60     
Median ITL (ms):                         13.69     
P99 ITL (ms):                            14.54     
==================================================

after

============ Serving Benchmark Result ============
Successful requests:                     64        
Failed requests:                         0         
Maximum request concurrency:             16        
Benchmark duration (s):                  63.30     
Total input tokens:                      524288    
Total generated tokens:                  65536     
Request throughput (req/s):              1.01      
Output token throughput (tok/s):         1035.35   
Peak output token throughput (tok/s):    1248.00   
Peak concurrent requests:                32.00     
Total token throughput (tok/s):          9318.17   
---------------Time to First Token----------------
Mean TTFT (ms):                          1579.60   
Median TTFT (ms):                        1747.44   
P99 TTFT (ms):                           2566.66   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          13.91     
Median TPOT (ms):                        13.75     
P99 TPOT (ms):                           15.14     
---------------Inter-token Latency----------------
Mean ITL (ms):                           13.91     
Median ITL (ms):                         12.97     
P99 ITL (ms):                            13.71     
==================================================

Technical Details

Test Plan

Test Result

Submission Checklist

Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: ganyi <ygan@amd.com>
Copilot AI review requested due to automatic review settings April 15, 2026 06:22
@ganyi1996ppo ganyi1996ppo marked this pull request as draft April 15, 2026 06:22
Signed-off-by: ganyi <ygan@amd.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Integrates an optional FlyDSL-based gated-delta-rule (GDR) decode kernel into the vLLM plugin GDN attention backend, and freezes GDN-specific parameters in the Qwen3 Next model implementation.

Changes:

  • Add an optional flydsl_gdr_decode path for decode-time recurrent attention, with fallback to the existing fused kernel when unavailable.
  • Mark dt_bias and A_log as non-trainable (requires_grad=False) parameters in qwen3_next.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
atom/plugin/vllm/attention_backend/attention_gdn.py Adds FlyDSL GDR decode integration with import-time feature gating and fallback path.
atom/models/qwen3_next.py Freezes dt_bias and A_log parameters (no gradients).
Comments suppressed due to low confidence (2)

atom/plugin/vllm/attention_backend/attention_gdn.py:44

  • The large commented-out maybe_dump_flydsl_gdr_inputs block adds maintenance overhead and makes it harder to review/grep the file. If this debug dump is needed, wire it up behind a real flag/env var and keep the helper active (or move it to a dedicated debug/util module); otherwise, remove the dead commented code.
        super().__init__()

    def forward(
        self,
        q: torch.Tensor,

atom/plugin/vllm/attention_backend/attention_gdn.py:438

  • The PR template sections (Motivation/Technical Details/Test Plan/Test Result) are still empty. Since this change alters the decode attention path and may depend on a specific aiter version, please document the required aiter version and how this was tested/benchmarked.
        else:
            core_attn_out[:num_actual_tokens] = core_attn_out_non_spec.squeeze(0)

        return core_attn_out


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +32 to 36
USE_FLYDSL_GDR = False
print(
"Failed to import flydsl_gdr_decode. Please make sure you have the latest version of aiter installed."
)

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid printing to stdout when the optional flydsl_gdr_decode import fails; this can spam logs in library/server contexts and in tests. Prefer the project’s logging/warnings mechanism (and ideally emit the message only when the decode path is selected) while falling back to the non-flydsl implementation.

Copilot uses AI. Check for mistakes.
xytpai
xytpai previously approved these changes Apr 15, 2026
@ganyi1996ppo ganyi1996ppo marked this pull request as ready for review April 15, 2026 14:37
@valarLip valarLip merged commit dcb9028 into main Apr 16, 2026
24 of 29 checks passed
@valarLip valarLip deleted the ganyi/gdr_linear_decode branch April 16, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants