Skip to content

[libspirv] Convert __assert_fail.ll and image_helpers.ll to CL files#21547

Open
wenju-he wants to merge 8 commits intointel:syclfrom
wenju-he:image_helpers.ll-to-cl
Open

[libspirv] Convert __assert_fail.ll and image_helpers.ll to CL files#21547
wenju-he wants to merge 8 commits intointel:syclfrom
wenju-he:image_helpers.ll-to-cl

Conversation

@wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Mar 18, 2026

LLVM IR (.ll) files doesn't work with cmake add_library change in libclc.

image_helpers.cl PTX output verified against reference PTX generated from image_helpers.ll.
Bindless image tests passed (no regressions) on NV 2060.

LLVM IR (.ll) files doesn't work with cmake add_library change in libclc.

__assert_fail.cl:
- Implemented __strlen_assert helper using pointer arithmetic
- Built error message string using OCKL fprintf functions
- Retrieved work-item IDs using __oclc_get_* builtins
- Used __builtin_trap() to halt execution after assertion

image_helpers.cl:
- Converted ~105 image_helpers functions from LLVM intrinsics to inline PTX asm
- Inline assembly + union-based type conversion gets fully optimized
- Compiler eliminates conversion entirely (identical memory layout)
- Direct register-to-register operation, no function calls
- Eliminates ~105 function call sites and associated memory operations
- PTX output verified against reference PTX generated from image_helpers.ll

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
@wenju-he wenju-he requested review from a team and Maetveis as code owners March 18, 2026 00:21
@wenju-he wenju-he requested a review from kekaczma March 18, 2026 00:21
Copy link
Contributor

@Maetveis Maetveis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me in general, with some questions.
I'm trusting you verified that the implementations still work, I didn't go over the inline PTX line-by-line to check it against the llvm intrinsic calls.

@Maetveis
Copy link
Contributor

Maetveis commented Mar 18, 2026

__assert_fail.cl:

  • Implemented __strlen_assert helper using pointer arithmetic
  • Built error message string using OCKL fprintf functions
  • Retrieved work-item IDs using __oclc_get_* builtins
  • Used __builtin_trap() to halt execution after assertion

@wenju-he is the PR description AI generated? Have you double-checked it is accurate? The last two bullets are not true AFAICT.

@wenju-he
Copy link
Contributor Author

__assert_fail.cl:

  • Implemented __strlen_assert helper using pointer arithmetic
  • Built error message string using OCKL fprintf functions
  • Retrieved work-item IDs using __oclc_get_* builtins
  • Used __builtin_trap() to halt execution after assertion

@wenju-he is the PR description AI generated? Have you double-checked it is accurate? The last two bullets are not true AFAICT.

yes, it is AI generated.

@wenju-he
Copy link
Contributor Author

I'm trusting you verified that the implementations still work

Only verified on CUDA bindless image. We don't have AMDGPU to verify with.

@Maetveis
Copy link
Contributor

I'm trusting you verified that the implementations still work

Only verified on CUDA bindless image. We don't have AMDGPU to verify with.

Okay, I went ahead and reviewed the AMDGPU change against the llvm-ir file. I have some comments and suggestions, see them above.

@Maetveis
Copy link
Contributor

Maetveis commented Mar 18, 2026

__assert_fail.cl:

  • Implemented __strlen_assert helper using pointer arithmetic
  • Built error message string using OCKL fprintf functions
  • Retrieved work-item IDs using __oclc_get_* builtins
  • Used __builtin_trap() to halt execution after assertion

@wenju-he is the PR description AI generated? Have you double-checked it is accurate? The last two bullets are not true AFAICT.

yes, it is AI generated.

Could you please update it to be accurately reflect the changes done after applying suggestions?

wenju-he and others added 2 commits March 18, 2026 17:42
Co-authored-by: Mészáros Gergely <maetveis@gmail.com>
Co-authored-by: Wenju He <wenju.he@intel.com>
@wenju-he
Copy link
Contributor Author

__assert_fail.cl:

  • Implemented __strlen_assert helper using pointer arithmetic
  • Built error message string using OCKL fprintf functions
  • Retrieved work-item IDs using __oclc_get_* builtins
  • Used __builtin_trap() to halt execution after assertion

@wenju-he is the PR description AI generated? Have you double-checked it is accurate? The last two bullets are not true AFAICT.

yes, it is AI generated.

Could you please update it to be accurately reflect the changes done after applying suggestions?

Done

@Maetveis
Copy link
Contributor

Could you please update it to be accurately reflect the changes done after applying suggestions?
Done

I would remove these too as some of them are inacurrate and the others not relevant IMO.

  • Inline assembly + union-based type conversion gets fully optimized
  • Compiler eliminates conversion entirely (identical memory layout)
  • Direct register-to-register operation, no function calls
  • Eliminates ~105 function call sites and associated memory operations

Copy link
Contributor

@Maetveis Maetveis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Co-authored-by: Mészáros Gergely <maetveis@gmail.com>
Co-authored-by: Wenju He <wenju.he@intel.com>
@wenju-he
Copy link
Contributor Author

I would remove these too as some of them are inacurrate and the others not relevant IMO.

  • Inline assembly + union-based type conversion gets fully optimized
  • Compiler eliminates conversion entirely (identical memory layout)
  • Direct register-to-register operation, no function calls
  • Eliminates ~105 function call sites and associated memory operations

done, removed.

…l_id.h

And remove extra declaration from __assert_fail.cl.
@Maetveis
Copy link
Contributor

@wenju-he I've pushed a commit to fix needing to declare __spirv_BuiltInGlobalInvocationId in __assert_fail.cl. I thought it will be quicker and easier than giving you a patch. That was the last thing I had in here, the rest looks all good.
Thank you for the work, and for sticking through the review, I realize it took a few iterations :).

@wenju-he
Copy link
Contributor Author

@wenju-he I've pushed a commit to fix needing to declare __spirv_BuiltInGlobalInvocationId in __assert_fail.cl. I thought it will be quicker and easier than giving you a patch. That was the last thing I had in here, the rest looks all good. Thank you for the work, and for sticking through the review, I realize it took a few iterations :).

thanks you for the detailed review

@wenju-he
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge, thanks. GEN12 fail is unrelated because this PR only affect AMDGPU and CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants