[Gluon] [Triton] [MI450] [MI350] Enable Triton/Gluon MLA with block_size 64 preshuffled kv_buffer option for decode by k50112113 · Pull Request #578 · ROCm/ATOM

k50112113 · 2026-04-15T23:07:19Z

This PR enable MLA option for decode (Triton for MI350 and Gluon for MI450)

On MI350, the implementation here is just to verify the results on MI350. The primary purpose of this PR is to enable Gluon MLA for decode on MI450.

Added an env var ATOM_ENABLE_TRITON_MLA_DECODE for the user to toggle between Triton/Gluon MLA verses ASM MLA
Added a switch at atom/model_ops/attentions/aiter_mla.py, to fix the block_size 64.
Re-enabled updating slot_mapping with block_tables.
Added concat_and_cache_mla (Triton) and fused_qk_rope_concat_and_cache_mla (Triton) that supports KV buffer pre-shuffling

This PR depends on ROCm/aiter#2492

Server commend:

model_path="/data/deepseek-ai/DeepSeek-R1-0528"
export ATOM_ENABLE_TRITON_MLA_DECODE=1

python -m atom.entrypoints.openai_server \
  --model $model_path --kv_cache_dtype fp8 -tp 8 --block-size 64

lm_eval results:

local-completions ({'model': '/data/deepseek-ai/DeepSeek-R1-0528', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 64, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9424|±  |0.0064|
|     |       |strict-match    |     3|exact_match|↑  |0.9378|±  |0.0067|

k50112113 added 3 commits April 15, 2026 03:50

add triton mla

435a7b1

add kernel

e5710ea

integrate mla decode gluon with shuffling

4b1467f

k50112113 requested a review from valarLip April 16, 2026 15:22

k50112113 added 5 commits April 16, 2026 15:32

formatting

0a5f279

format

0c611b4

merge with main

b56f4c2

format

7a90c64

guard import

fd128e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gluon] [Triton] [MI450] [MI350] Enable Triton/Gluon MLA with block_size 64 preshuffled kv_buffer option for decode#578

[Gluon] [Triton] [MI450] [MI350] Enable Triton/Gluon MLA with block_size 64 preshuffled kv_buffer option for decode#578
k50112113 wants to merge 8 commits intomainfrom
shaoclee/mla-gfx12

k50112113 commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k50112113 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

k50112113 commented Apr 15, 2026 •

edited

Loading