fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082
Open
blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
Open
fix: Use is_family_of() for SM12x arch guard in MmaSM120BlockScaledOp#3082blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
blake-snc wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
The arch check in MmaSM120BlockScaledOp.__post_init__ uses a hardcoded equality check against Arch.sm_120a, which rejects sm_121a (DGX Spark) even though the block-scaled MMA instruction set is identical across the SM12x family. The error message already references admissible_archs, showing the intent was to support multiple archs. Replace `arch == Arch.sm_120a` with `arch.is_family_of(Arch.sm_120a)` so that sm_121a (and any future sm12x variants) are accepted. Also add "sm_121a" to the admissible_archs list for consistency with the error message. Contributed by Second Nature Computing (https://joinsecondnature.com) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3047d61 to
e85cc91
Compare
blake-snc
added a commit
to blake-snc/cutlass
that referenced
this pull request
Feb 28, 2026
Replace hardcoded `arch == Arch.sm_90a` with `arch.is_family_of(Arch.sm_90a)` in warpgroup/mma.py's MmaOp for consistency with the warp-level MMA fix in NVIDIA#3082. While functionally equivalent today (sm_90a is the only Hopper "a"-suffix arch in practice), this makes the arch guard consistent with the is_family_of() pattern and future-proofs against potential Hopper variants. Validated: - is_family_of(Arch.sm_90a) returns True for sm_90a - is_family_of(Arch.sm_90a) returns False for sm_120a, sm_100a Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Ping for review — small fix to use |
|
cc @depaulmillz |
|
This PR has been labeled |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
arch == Arch.sm_120aequality check witharch.is_family_of(Arch.sm_120a)inMmaSM120BlockScaledOp.__post_init__"sm_121a"toadmissible_archslist for consistency with the error messageProblem
MmaSM120BlockScaledOpguards its arch check withif not arch == Arch.sm_120a, which rejectssm_121a(DGX Spark / GB10) even though the block-scaled MMA instruction set (mma.sync.aligned.block_scale) is identical across the SM12x family. The error message already referencesadmissible_archs, showing the intent was to support multiple archs, but the guard ignores the list entirely:Fix
Use the existing
is_family_of()method which was designed for exactly this purpose:This accepts
sm_120a,sm_120f,sm_121a,sm_121f— all SM12x family members that share the same block-scaled MMA instructions.Validation (DGX Spark, SM121a)
Tested on NVIDIA GB10 (sm_121a) by patching the installed
nvidia-cutlass-dslpackage and running each test in a separate process (the DSL caches arch at init):Before fix — sm_121a rejected:
After fix — all tests pass:
Contributed by Second Nature Computing (https://joinsecondnature.com)