Fix SDPA vmap with GQA/MQA shapes (n_heads != n_kv_heads) by Brooooooklyn · Pull Request #3385 · ml-explore/mlx

Brooooooklyn · 2026-04-08T02:14:14Z

Proposed changes

ScaledDotProductAttention relied on Custom::vmap which re-vmapped the
fallback lambda. This always took the decomposed matmul-softmax-matmul
path, bypassing the fused Metal/CUDA kernel even when it was available.
On MLX 0.31.1 this also caused SIGSEGV/hang with GQA shapes due to a
since-fixed transforms infrastructure bug.

Add a dedicated vmap override that merges the vmap axis into the batch
dimension and re-invokes scaled_dot_product_attention directly, so the
fused kernel is dispatched under vmap just as it is without vmap.
Falls back to Custom::vmap when attention sinks are present.

Close #3383

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

The ScaledDotProductAttention primitive relied on Custom::vmap which re-vmapped the fallback lambda. That lambda captured n_q_heads and n_kv_heads at creation time, causing shape mismatches (SIGSEGV/hang) when vmap changed the array dimensions. Add a dedicated vmap override that merges the vmap axis into the batch dimension and re-invokes scaled_dot_product_attention, which recomputes head counts from actual shapes. Falls back to Custom::vmap for sinks.

Brooooooklyn force-pushed the fix/sdpa-vmap-gqa branch from 4e42445 to 6fbbec5 Compare April 8, 2026 02:33

Brooooooklyn force-pushed the fix/sdpa-vmap-gqa branch from 6fbbec5 to 14700f3 Compare April 8, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SDPA vmap with GQA/MQA shapes (n_heads != n_kv_heads)#3385

Fix SDPA vmap with GQA/MQA shapes (n_heads != n_kv_heads)#3385
Brooooooklyn wants to merge 1 commit intoml-explore:mainfrom
mlx-node:fix/sdpa-vmap-gqa

Brooooooklyn commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Brooooooklyn commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Brooooooklyn commented Apr 8, 2026 •

edited

Loading