Skip to content

[HIP] Optimized fused split GDR decode#2326

Merged
huizzhan merged 3 commits intomainfrom
dev/huizzhan/split_gated_gdr_decode_hip_prfix
Mar 24, 2026
Merged

[HIP] Optimized fused split GDR decode#2326
huizzhan merged 3 commits intomainfrom
dev/huizzhan/split_gated_gdr_decode_hip_prfix

Conversation

@huizzhan
Copy link
Copy Markdown
Contributor

@huizzhan huizzhan commented Mar 18, 2026

Motivation

Added optimized hip kernel for fused split GDR decode.

Technical Details

Added optimized hip kernel (swizzled ssm_state) for fused split GDR decode implementation to support Qwen3Next.

Test Plan

Tested hip fused split GDR decode with different configs(itype/l2norm/input_shape/...).

Test Result

Evaluated perf with Qwen3Next (speedup x1.47 and x1.62 compared with origin triton implementation on TP4 and TP8 cases) and passed unit tests.
gdn_update

Submission Checklist

@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2326 --add-label <label>

@huizzhan huizzhan force-pushed the dev/huizzhan/split_gated_gdr_decode_hip_prfix branch 4 times, most recently from 8cd03df to 45573b0 Compare March 18, 2026 09:45
@huizzhan huizzhan requested review from valarLip and yiijin and removed request for valarLip March 23, 2026 02:49
@huizzhan huizzhan changed the title [HIP] fused split GDR decode [HIP] Optimized fused split GDR decode Mar 23, 2026
@huizzhan huizzhan marked this pull request as ready for review March 23, 2026 02:54
@huizzhan huizzhan requested a review from a team March 23, 2026 02:54
@huizzhan huizzhan force-pushed the dev/huizzhan/split_gated_gdr_decode_hip_prfix branch 2 times, most recently from 0ea2eec to 2048487 Compare March 23, 2026 03:08
@huizzhan huizzhan force-pushed the dev/huizzhan/split_gated_gdr_decode_hip_prfix branch from 2048487 to d0d9cf1 Compare March 23, 2026 03:12
@huizzhan huizzhan requested a review from valarLip March 23, 2026 07:31
@huizzhan
Copy link
Copy Markdown
Contributor Author

Hi @valarLip, this PR is to support optimized fused split GDR decode HIP implementation for Linear Attention project, please review, thanks for your support!

@yiijin
Copy link
Copy Markdown
Contributor

yiijin commented Mar 23, 2026

please update the hip perf in the Test Result Table

@huizzhan
Copy link
Copy Markdown
Contributor Author

please update the hip perf in the Test Result Table

Updated, please review, thanks~

@huizzhan huizzhan merged commit 45f6d97 into main Mar 24, 2026
38 of 39 checks passed
@huizzhan huizzhan deleted the dev/huizzhan/split_gated_gdr_decode_hip_prfix branch March 24, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants