Skip to content

fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082

Open
blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
blake-snc:fix/sm12x-family-arch-guard
Open

fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082
blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
blake-snc:fix/sm12x-family-arch-guard

Conversation

@blake-snc
Copy link
Copy Markdown

@blake-snc blake-snc commented Feb 27, 2026

Summary

  • Replace hardcoded arch == Arch.sm_120a equality check with arch.is_family_of(Arch.sm_120a) in MmaSM120BlockScaledOp.__post_init__
  • Add "sm_121a" to admissible_archs list for consistency with the error message

Problem

MmaSM120BlockScaledOp guards its arch check with if not arch == Arch.sm_120a, which rejects sm_121a (DGX Spark / GB10) even though the block-scaled MMA instruction set (mma.sync.aligned.block_scale) is identical across the SM12x family. The error message already references admissible_archs, showing the intent was to support multiple archs, but the guard ignores the list entirely:

admissible_archs = [
    "sm_120a",
]

def __post_init__(self) -> None:
    arch = BaseDSL._get_dsl().get_arch_enum()
    if not arch == Arch.sm_120a:  # ignores admissible_archs
        raise OpError(
            self,
            f"expects arch to be one of {self.admissible_archs}, but got {arch}",
            ...
        )

Fix

Use the existing is_family_of() method which was designed for exactly this purpose:

if not arch.is_family_of(Arch.sm_120a):

This accepts sm_120a, sm_120f, sm_121a, sm_121f — all SM12x family members that share the same block-scaled MMA instructions.

Validation (DGX Spark, SM121a)

Tested on NVIDIA GB10 (sm_121a) by patching the installed nvidia-cutlass-dsl package and running each test in a separate process (the DSL caches arch at init):

Before fix — sm_121a rejected:

$ CUTE_DSL_ARCH=sm_121a python3 -c "from cutlass.cute.nvgpu.warp.mma import MmaMXF4Op; ..."
FAILED: OpError: expects arch to be one of ['sm_120a'], but got Arch.sm_121a

After fix — all tests pass:

$ CUTE_DSL_ARCH=sm_121a python3 -c "...MmaMXF4Op..."
PASS: sm_121a accepted

$ CUTE_DSL_ARCH=sm_121a python3 -c "...MmaMXF4NVF4Op..."
PASS: MmaMXF4NVF4Op on sm_121a accepted

$ CUTE_DSL_ARCH=sm_120a python3 -c "...MmaMXF4Op..."
PASS: sm_120a accepted (no regression)

$ CUTE_DSL_ARCH=sm_90a python3 -c "...MmaMXF4Op..."
PASS: sm_90a correctly rejected (non-sm12x guard still works)

Contributed by Second Nature Computing (https://joinsecondnature.com)

The arch check in MmaSM120BlockScaledOp.__post_init__ uses a hardcoded
equality check against Arch.sm_120a, which rejects sm_121a (DGX Spark)
even though the block-scaled MMA instruction set is identical across
the SM12x family. The error message already references admissible_archs,
showing the intent was to support multiple archs.

Replace `arch == Arch.sm_120a` with `arch.is_family_of(Arch.sm_120a)`
so that sm_121a (and any future sm12x variants) are accepted. Also add
"sm_121a" to the admissible_archs list for consistency with the error
message.

Contributed by Second Nature Computing (https://joinsecondnature.com)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@blake-snc blake-snc changed the title fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp fix: Use is_family_of() for arch guards in warp and warpgroup MMA ops Feb 28, 2026
@blake-snc blake-snc force-pushed the fix/sm12x-family-arch-guard branch from 3047d61 to e85cc91 Compare February 28, 2026 02:11
@blake-snc blake-snc changed the title fix: Use is_family_of() for arch guards in warp and warpgroup MMA ops fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp Feb 28, 2026
blake-snc added a commit to blake-snc/cutlass that referenced this pull request Feb 28, 2026
Replace hardcoded `arch == Arch.sm_90a` with
`arch.is_family_of(Arch.sm_90a)` in warpgroup/mma.py's MmaOp for
consistency with the warp-level MMA fix in NVIDIA#3082.

While functionally equivalent today (sm_90a is the only Hopper
"a"-suffix arch in practice), this makes the arch guard consistent
with the is_family_of() pattern and future-proofs against potential
Hopper variants.

Validated:
- is_family_of(Arch.sm_90a) returns True for sm_90a
- is_family_of(Arch.sm_90a) returns False for sm_120a, sm_100a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@blake-snc
Copy link
Copy Markdown
Author

Ping for review — small fix to use is_family_of() for the SM12x arch guard in MmaSM120BlockScaledOp so SM121a works correctly.

@johnnynunez
Copy link
Copy Markdown

cc @depaulmillz

@github-actions
Copy link
Copy Markdown

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants