fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp by blake-snc · Pull Request #3082 · NVIDIA/cutlass

blake-snc · 2026-02-27T23:48:17Z

Summary

Replace hardcoded arch == Arch.sm_120a equality check with arch.is_family_of(Arch.sm_120a) in MmaSM120BlockScaledOp.__post_init__
Add "sm_121a" to admissible_archs list for consistency with the error message

Problem

MmaSM120BlockScaledOp guards its arch check with if not arch == Arch.sm_120a, which rejects sm_121a (DGX Spark / GB10) even though the block-scaled MMA instruction set (mma.sync.aligned.block_scale) is identical across the SM12x family. The error message already references admissible_archs, showing the intent was to support multiple archs, but the guard ignores the list entirely:

admissible_archs = [
    "sm_120a",
]

def __post_init__(self) -> None:
    arch = BaseDSL._get_dsl().get_arch_enum()
    if not arch == Arch.sm_120a:  # ignores admissible_archs
        raise OpError(
            self,
            f"expects arch to be one of {self.admissible_archs}, but got {arch}",
            ...
        )

Fix

Use the existing is_family_of() method which was designed for exactly this purpose:

if not arch.is_family_of(Arch.sm_120a):

This accepts sm_120a, sm_120f, sm_121a, sm_121f — all SM12x family members that share the same block-scaled MMA instructions.

Validation (DGX Spark, SM121a)

Tested on NVIDIA GB10 (sm_121a) by patching the installed nvidia-cutlass-dsl package and running each test in a separate process (the DSL caches arch at init):

Before fix — sm_121a rejected:

$ CUTE_DSL_ARCH=sm_121a python3 -c "from cutlass.cute.nvgpu.warp.mma import MmaMXF4Op; ..."
FAILED: OpError: expects arch to be one of ['sm_120a'], but got Arch.sm_121a

After fix — all tests pass:

$ CUTE_DSL_ARCH=sm_121a python3 -c "...MmaMXF4Op..."
PASS: sm_121a accepted

$ CUTE_DSL_ARCH=sm_121a python3 -c "...MmaMXF4NVF4Op..."
PASS: MmaMXF4NVF4Op on sm_121a accepted

$ CUTE_DSL_ARCH=sm_120a python3 -c "...MmaMXF4Op..."
PASS: sm_120a accepted (no regression)

$ CUTE_DSL_ARCH=sm_90a python3 -c "...MmaMXF4Op..."
PASS: sm_90a correctly rejected (non-sm12x guard still works)

Contributed by Second Nature Computing (https://joinsecondnature.com)

The arch check in MmaSM120BlockScaledOp.__post_init__ uses a hardcoded equality check against Arch.sm_120a, which rejects sm_121a (DGX Spark) even though the block-scaled MMA instruction set is identical across the SM12x family. The error message already references admissible_archs, showing the intent was to support multiple archs. Replace `arch == Arch.sm_120a` with `arch.is_family_of(Arch.sm_120a)` so that sm_121a (and any future sm12x variants) are accepted. Also add "sm_121a" to the admissible_archs list for consistency with the error message. Contributed by Second Nature Computing (https://joinsecondnature.com) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded `arch == Arch.sm_90a` with `arch.is_family_of(Arch.sm_90a)` in warpgroup/mma.py's MmaOp for consistency with the warp-level MMA fix in NVIDIA#3082. While functionally equivalent today (sm_90a is the only Hopper "a"-suffix arch in practice), this makes the arch guard consistent with the is_family_of() pattern and future-proofs against potential Hopper variants. Validated: - is_family_of(Arch.sm_90a) returns True for sm_90a - is_family_of(Arch.sm_90a) returns False for sm_120a, sm_100a Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

blake-snc · 2026-03-12T21:33:54Z

Ping for review — small fix to use is_family_of() for the SM12x arch guard in MmaSM120BlockScaledOp so SM121a works correctly.

johnnynunez · 2026-03-20T16:50:08Z

cc @depaulmillz

github-actions · 2026-04-19T17:15:16Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

blake-snc changed the title ~~fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp~~ fix: Use is_family_of() for arch guards in warp and warpgroup MMA ops Feb 28, 2026

blake-snc force-pushed the fix/sm12x-family-arch-guard branch from 3047d61 to e85cc91 Compare February 28, 2026 02:11

blake-snc changed the title ~~fix: Use is_family_of() for arch guards in warp and warpgroup MMA ops~~ fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp Feb 28, 2026

blake-snc mentioned this pull request Feb 28, 2026

fix: Use is_family_of() for SM90 arch guard in warpgroup MmaOp #3084

Open

github-actions bot added the inactive-30d label Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082

fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082
blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
blake-snc:fix/sm12x-family-arch-guard

blake-snc commented Feb 27, 2026 •

edited

Loading

Uh oh!

blake-snc commented Mar 12, 2026

Uh oh!

johnnynunez commented Mar 20, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blake-snc commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Validation (DGX Spark, SM121a)

Uh oh!

blake-snc commented Mar 12, 2026

Uh oh!

johnnynunez commented Mar 20, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blake-snc commented Feb 27, 2026 •

edited

Loading