Fix int16 overflow in SDPA NAX mask indexing for KV sequences > 32K by Clydingus · Pull Request #3361 · ml-explore/mlx

Clydingus · 2026-04-03T19:46:57Z

Proposed changes

Test

import mlx.core as mx
import numpy as np

D_HEAD = 64

def test(cap, active, n_heads=32, n_kv=16, t=512, dtype=mx.bfloat16):
    np.random.seed(0)
    q = mx.array(np.random.randn(1, n_heads, t, D_HEAD).astype(np.float32)).astype(dtype)
    k = mx.array(np.random.randn(1, n_kv, cap, D_HEAD).astype(np.float32)).astype(dtype)
    v = mx.array(np.random.randn(1, n_kv, cap, D_HEAD).astype(np.float32)).astype(dtype)
    mask = mx.full((1, 1, 1, cap), -1e4).astype(dtype)
    mask[:, :, :, cap - active:] = 0.0
    mx.eval(q, k, v, mask)

    y_fast = mx.fast.scaled_dot_product_attention(q, k, v, scale=D_HEAD**-0.5, mask=mask)
    mx.eval(y_fast)

    kk = mx.repeat(k, n_heads // n_kv, axis=1) if n_heads != n_kv else k
    vv = mx.repeat(v, n_heads // n_kv, axis=1) if n_heads != n_kv else v
    scores = (q @ kk.transpose(0, 1, 3, 2)) * (D_HEAD**-0.5) + mask
    y_ref = mx.softmax(scores.astype(mx.float32), axis=-1).astype(dtype) @ vv
    mx.eval(y_ref)

    yf = np.array(y_fast.astype(mx.float32))
    yr = np.array(y_ref.astype(mx.float32))
    ratio = np.nanmax(np.abs(yf)) / (np.max(np.abs(yr)) + 1e-10)
    ok = abs(ratio - 1.0) < 0.05
    status = "PASS" if ok else f"FAIL ({ratio:.3f})"
    print(f"  cap={cap:6d}  active={active:5d}  {status}")
    return ok

print(f"MLX {mx.__version__}\n")
n_fail = 0
for cap in [8192, 32768, 36864, 49152, 66048]:
    if not test(cap, 1024): n_fail += 1
print(f"\n{'ALL PASS' if n_fail == 0 else f'{n_fail} FAILED'}")

Correct output:

  cap=  8192  active= 1024  PASS
  cap= 32768  active= 1024  PASS
  cap= 36864  active= 1024  PASS
  cap= 49152  active= 1024  PASS
  cap= 66048  active= 1024  PASS

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

zcbenz

Can you add a test?

Clydingus · 2026-04-04T10:51:14Z

Sorry, missed that😓. Added now. Tried adding it under the existing test_sdpa function, but the divergence mainly shows up on sparse masks at long sequence lengths. Let me know if any other changes required.

angeloskath

Thanks for finding that.

There was a pretty big slow-down for masked attention due to constant bounds checking for the mask (which became worse by switching to integers). I changed that to only bounds check at the edges otherwise load as is.

Results (lower is better)

Shape (Q K D)      Before   After
4096  5000  64      2.028    1.366
2048 32121  64      9.175    5.664
4096  5000 128      4.538    3.313
2048 32121 128     18.012   11.511

zcbenz

🚀

Fix int16 overflow in SDPA NAX mask indexing for KV sequences > 32K

ccfad15

Clydingus mentioned this pull request Apr 3, 2026

[BUG] SDPA NAX kernel: int16 overflow in mask col_pos for KV sequences > 32K #3360

Closed

zcbenz requested changes Apr 4, 2026

View reviewed changes

Clydingus added 2 commits April 4, 2026 18:38

Add SDPA test for overflow in mask col_pos of KV sequences > 32K

fe183de

Add unittest skipif not on gpu

6f14a8c

Clydingus requested a review from zcbenz April 8, 2026 09:19

angeloskath added 2 commits April 9, 2026 14:38

Improve the mask check for NAX attention

ed93902

Fix condition

6f95bb0

angeloskath approved these changes Apr 9, 2026

View reviewed changes

zcbenz approved these changes Apr 9, 2026

View reviewed changes

angeloskath merged commit a33b791 into ml-explore:main Apr 10, 2026
16 checks passed

Clydingus deleted the fix/sdpa-nax-int16-overflow branch April 10, 2026 08:08

Landon-Molt mentioned this pull request Apr 12, 2026

Feature: Quantized KV cache support in scaled_dot_product_attention (TurboQuant) #3404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix int16 overflow in SDPA NAX mask indexing for KV sequences > 32K#3361

Fix int16 overflow in SDPA NAX mask indexing for KV sequences > 32K#3361
angeloskath merged 5 commits intoml-explore:mainfrom
Clydingus:fix/sdpa-nax-int16-overflow

Clydingus commented Apr 3, 2026

Uh oh!

zcbenz left a comment

Uh oh!

Clydingus commented Apr 4, 2026

Uh oh!

angeloskath left a comment

Uh oh!

zcbenz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Clydingus commented Apr 3, 2026

Proposed changes

Checklist

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Clydingus commented Apr 4, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants