[SDPA][HipDNN] ASM kernel loading and dispatch by AnaghaRaoAMD · Pull Request #5686 · ROCm/rocm-libraries

AnaghaRaoAMD · 2026-03-20T21:24:07Z

Motivation

Implements ASM kernel loading and dispatch

Technical Details

Kernel Execution Implementation

Implements the complete kernel loading and dispatch pipeline for fwd_hd128_bf16_rtne.co ASM kernel:

SdpaKernelPlan - Kernel execution state (26 member variables):
- Stores kernel module/function handles
- Tensor UIDs and metadata (dims, strides)
- Attention scale
- execute(): Populates fmha_fwd_v3_args (656 bytes) and launches via hipModuleLaunchKernel()
SdpaKernelPlanBuilder - Plan creation:
- buildPlan(): Loads kernel via hipModuleLoad() and extracts tensor metadata from graph
- Parses Q/K/V tensor dimensions and strides from SDPA graph node
- Computes attention scale (default: 1/√D_qk)
Key Implementation Details:
- Kernel launch: HIP_LAUNCH_PARAM mechanism for large arg structures
- Grid dimensions: [ceil(S_q/256), H_q, B]
- Block dimensions: [512, 1, 1] (fixed for this kernel)
- Strides: Converted from elements to bytes (stride × 2 for BF16)
- Module lifecycle: Loaded on plan build, unloaded in destructor

Modified Files:

src/SdpaKernelPlan.{hpp,cpp} - Execution implementation
src/SdpaKernelPlanBuilder.cpp - Plan building with graph parsing

Test Plan

Verified with existing unit test infrastructure and integration tests

ninja install
 ./bin/sdpa_kernel_plugin_integration_tests
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from IntegrationSdpaKernelNoEngines
[ RUN      ] IntegrationSdpaKernelNoEngines.BatchnormInferenceGraphBuildFails
[       OK ] IntegrationSdpaKernelNoEngines.BatchnormInferenceGraphBuildFails (1476 ms)
[----------] 1 test from IntegrationSdpaKernelNoEngines (1476 ms total)

[----------] 1 test from Smoke/IntegrationGpuSdpaFwdBf16
[ RUN      ] Smoke/IntegrationGpuSdpaFwdBf16.Correctness/0
[       OK ] Smoke/IntegrationGpuSdpaFwdBf16.Correctness/0 (1806 ms)
[----------] 1 test from Smoke/IntegrationGpuSdpaFwdBf16 (1806 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test suites ran. (3283 ms total)
[  PASSED  ] 2 tests.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

DarylHawkinsAMD

Looks good, just had one question inline

…pda-poc' into user/anarao/kernelLoadDispatch

jerehartAMD

Looks good! I have a couple of comments, but I think the only one that's especially important to address is the move constructor one

## Motivation Implements the complete kernel loading and dispatch pipeline for the fwd_hd128_bf16_rtne.co ASM kernel, including workspace size calculation. ## Technical Details **Kernel Execution Implementation** 1. **SdpaFwdPlan** - Kernel execution state: - Stores kernel module/function handles, tensor UIDs and metadata (dims, strides), and attention scale - execute(): Populates fmha_fwd_v3_args (656 bytes) and launches via hipModuleLaunchKernel() - Grid dimensions: [ceil(S_q/256), H_q, B] - Block dimensions: [512, 1, 1] (fixed for this kernel) - Strides: Converted from elements to bytes (stride x 2 for BF16) - Module lifecycle: Loaded on plan build, unloaded in destructor 2. **SdpaFwdPlanBuilder** - Plan creation: - buildPlan(): Loads kernel via hipModuleLoad() and extracts tensor metadata from graph - Parses Q/K/V tensor dimensions and strides from SDPA graph node - Computes attention scale (default: 1/sqrt(D_qk)) 3. **Workspace**: Forward-only inference kernel uses 64KB LDS internally and requires no external workspace allocation; getWorkspaceSize() returns 0. LSE (log-sum-exp) buffer is an optional output tensor (stats_tensor_uid), not workspace. Builds on workspace sizing from PR #5626 and cleanup from PR #5632.

AnaghaRaoAMD added 3 commits March 19, 2026 17:49

comment unused arg and remove unused variable

9641fb7

initial kernel launch changes

ad0ec8f

update to use HIP_LAUNCH_PARAM mechanism for large args

890656b

AnaghaRaoAMD changed the title ~~User/anarao/kernel load dispatch~~ [SDPA][HipDNN] ASM kernel loading and dispatch Mar 20, 2026

AnaghaRaoAMD marked this pull request as ready for review March 20, 2026 21:33

AnaghaRaoAMD requested review from DarylHawkinsAMD and jerehartAMD March 20, 2026 21:33

assistant-librarian Bot added the organization: ROCm label Mar 20, 2026

DarylHawkinsAMD reviewed Mar 20, 2026

View reviewed changes

Comment thread dnn-providers/sdpa-kernel-provider/src/SdpaKernelPlan.cpp

Merge remote-tracking branch 'origin/users/dahawkin/hipdnn-aiter-ck-s…

a55d414

…pda-poc' into user/anarao/kernelLoadDispatch

jerehartAMD reviewed Mar 20, 2026

View reviewed changes

Comment thread dnn-providers/sdpa-kernel-provider/src/SdpaKernelPlan.cpp

Comment thread dnn-providers/sdpa-kernel-provider/src/SdpaKernelPlan.cpp Outdated

Comment thread dnn-providers/sdpa-kernel-provider/src/SdpaKernelPlan.cpp Outdated

address review comments

0bcbea1

jerehartAMD approved these changes Mar 23, 2026

View reviewed changes

AnaghaRaoAMD merged commit 1f85499 into users/dahawkin/hipdnn-aiter-ck-spda-poc Mar 23, 2026
6 checks passed

AnaghaRaoAMD deleted the user/anarao/kernelLoadDispatch branch March 23, 2026 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDPA][HipDNN] ASM kernel loading and dispatch#5686

[SDPA][HipDNN] ASM kernel loading and dispatch#5686
AnaghaRaoAMD merged 5 commits intousers/dahawkin/hipdnn-aiter-ck-spda-pocfrom
user/anarao/kernelLoadDispatch

AnaghaRaoAMD commented Mar 20, 2026 •

edited

Loading

Uh oh!

DarylHawkinsAMD left a comment

Uh oh!

Uh oh!

jerehartAMD left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AnaghaRaoAMD commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Submission Checklist

Uh oh!

DarylHawkinsAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerehartAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnaghaRaoAMD commented Mar 20, 2026 •

edited

Loading