feat: primus-turbo attn add sbhd format support by RuibinCheung · Pull Request #650 · AMD-AGI/Primus

RuibinCheung · 2026-04-08T11:30:40Z

Primus Turbo attention add sbhd format support. Eliminate extra transpose kernel call in Attention.

Copilot

Pull request overview

Adds experimental support for additional QKV tensor layouts (notably sbhd) in the Primus Turbo attention wrapper, while introducing special-casing for sink attention to force a specific layout.

Changes:

Removes the previous manual sbhd -> bshd transpose and instead forwards qkv_format into the underlying flash_attn op.
Introduces a use_sink_attn flag and forces sink-attention execution to use bshd, including explicit tensor permutations for Q/K/V and the output.

Copilot · 2026-04-08T11:36:36Z

+            # NOTE: sink attention only support bshd format
+            query = query.permute(1, 0, 2, 3).contiguous()
+            key = key.permute(1, 0, 2, 3).contiguous()
+            value = value.permute(1, 0, 2, 3).contiguous()


When use_sink_attn is enabled, query/key/value are always permuted as if the incoming layout were sbhd (S,B,H,D) -> bshd (B,S,H,D). If qkv_format is already bshd (or any non-sbhd value coming from packed_seq_params), this permutation will corrupt the tensor layout while qkv_format is forced to "bshd", creating a format/tensor mismatch.

Consider either (a) explicitly asserting qkv_format == "sbhd" before permuting in the sink-attention path, or (b) permuting conditionally based on qkv_format and ensuring o is permuted back consistently to preserve the expected output layout.

Copilot · 2026-04-08T11:36:37Z

@@ -465,9 +465,6 @@ def forward(
        )

        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)


qkv_format is now passed through to self.attn(...) without any local validation. Since this value can come from packed_seq_params, an unexpected value will likely fail deeper in the kernel with a less actionable error (and could also interact badly with the explicit format conversions in the sink-attention path).

Recommend validating qkv_format against the set of supported formats in this backend and raising a clear ValueError (or keeping an assert) before using it.

Suggested change

qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

supported_qkv_formats = ("sbhd", "bshd", "thd")

if qkv_format not in supported_qkv_formats:

raise ValueError(

f"Unsupported qkv_format: {qkv_format}. "

f"Supported formats: {supported_qkv_formats}"

)

Copilot · 2026-04-08T11:36:37Z

                else:
                    window_size = (self.sink_sliding_window, 0)
+
+            # NOTE: sink attention only support bshd format


Typo/grammar in the new comment: "sink attention only support bshd format" → "sink attention only supports bshd format".

Suggested change

# NOTE: sink attention only support bshd format

# NOTE: sink attention only supports bshd format

…grb/add_sbhd_format_support

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-17T09:22:02Z

+        if qkv_format == "sbhd":
+            query = query.permute(1, 0, 2, 3)
+            key = key.permute(1, 0, 2, 3)
+            value = value.permute(1, 0, 2, 3)


qkv_format is now only handled for the "sbhd" case; any other value falls through and is treated as if inputs are already BSHD, which can silently produce wrong results if qkv_format is misspelled/unsupported. Please add explicit validation (e.g., accept only sbhd and bshd, and raise ValueError otherwise) rather than relying on implicit fallthrough or assert.

Suggested change

value = value.permute(1, 0, 2, 3)

value = value.permute(1, 0, 2, 3)

elif qkv_format == "bshd":

pass

else:

raise ValueError(

f"Unsupported qkv_format: {qkv_format!r}. Expected one of ('sbhd', 'bshd')."

)

Copilot · 2026-04-17T09:22:03Z

+        if qkv_format == "sbhd":
+            query = query.permute(1, 0, 2, 3)
+            key = key.permute(1, 0, 2, 3)
+            value = value.permute(1, 0, 2, 3)
+


This adds conditional layout handling based on qkv_format, but there are no tests covering the new sbhd/bshd behavior. Please add a focused unit test that sets qkv_format to both values and asserts the returned tensor layout/shape matches expectations.

RuibinCheung and others added 2 commits April 2, 2026 08:58

[Megatron-LM] feat: add qkv sbhd format support

a865a3f

workaround sink attn

1e3e9f2

RuibinCheung requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners April 8, 2026 11:30

Copilot AI review requested due to automatic review settings April 8, 2026 11:30

RuibinCheung changed the title ~~[No Merge][WIP] feat: add sbhd format support~~ [No Merge][WIP] feat: primus-turbo attn add sbhd format support Apr 8, 2026

RuibinCheung marked this pull request as draft April 8, 2026 11:31

Copilot started reviewing on behalf of RuibinCheung April 8, 2026 11:32 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

RuibinCheung added 3 commits April 15, 2026 11:27

update

9e198e6

update turbo version

d9530d7

Merge commit 'f3f2b11862c0a203a8fe22b7af65d25cc5de4e56' into dev/zhan…

d519d15

…grb/add_sbhd_format_support

Copilot AI review requested due to automatic review settings April 17, 2026 09:16

Copilot started reviewing on behalf of RuibinCheung April 17, 2026 09:17 View session

update turbo version

f229872

RuibinCheung marked this pull request as ready for review April 17, 2026 09:17

RuibinCheung changed the title ~~[No Merge][WIP] feat: primus-turbo attn add sbhd format support~~ feat: primus-turbo attn add sbhd format support Apr 17, 2026

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: primus-turbo attn add sbhd format support#650

feat: primus-turbo attn add sbhd format support#650
RuibinCheung wants to merge 6 commits intomainfrom
dev/zhangrb/add_sbhd_format_support

RuibinCheung commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -465,9 +465,6 @@ def forward(
		)

		qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

-        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)
+        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)
+        supported_qkv_formats = ("sbhd", "bshd", "thd")
+        if qkv_format not in supported_qkv_formats:
+            raise ValueError(
+                f"Unsupported qkv_format: {qkv_format}. "
+                f"Supported formats: {supported_qkv_formats}"
+            )

	# NOTE: sink attention only support bshd format
	# NOTE: sink attention only supports bshd format

Conversation

RuibinCheung commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RuibinCheung commented Apr 8, 2026 •

edited

Loading