Skip to content

[Gluon] [Triton] [MI450] [MI350] Enable Triton/Gluon MLA with block_size 64 preshuffled kv_buffer option for decode#578

Open
k50112113 wants to merge 8 commits intomainfrom
shaoclee/mla-gfx12
Open

[Gluon] [Triton] [MI450] [MI350] Enable Triton/Gluon MLA with block_size 64 preshuffled kv_buffer option for decode#578
k50112113 wants to merge 8 commits intomainfrom
shaoclee/mla-gfx12

Conversation

@k50112113
Copy link
Copy Markdown
Contributor

@k50112113 k50112113 commented Apr 15, 2026

This PR enable MLA option for decode (Triton for MI350 and Gluon for MI450)

On MI350, the implementation here is just to verify the results on MI350. The primary purpose of this PR is to enable Gluon MLA for decode on MI450.

  1. Added an env var ATOM_ENABLE_TRITON_MLA_DECODE for the user to toggle between Triton/Gluon MLA verses ASM MLA
  2. Added a switch at atom/model_ops/attentions/aiter_mla.py, to fix the block_size 64.
  3. Re-enabled updating slot_mapping with block_tables.
  4. Added concat_and_cache_mla (Triton) and fused_qk_rope_concat_and_cache_mla (Triton) that supports KV buffer pre-shuffling

This PR depends on ROCm/aiter#2492

Server commend:

model_path="/data/deepseek-ai/DeepSeek-R1-0528"
export ATOM_ENABLE_TRITON_MLA_DECODE=1

python -m atom.entrypoints.openai_server \
  --model $model_path --kv_cache_dtype fp8 -tp 8 --block-size 64

lm_eval results:

local-completions ({'model': '/data/deepseek-ai/DeepSeek-R1-0528', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 64, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9424|±  |0.0064|
|     |       |strict-match    |     3|exact_match|↑  |0.9378|±  |0.0067|

@k50112113 k50112113 requested a review from valarLip April 16, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant