Skip to content

[Driver][SYCL][NewOffloadModel] Pull in device libraries at device compile#21396

Open
mdtoguchi wants to merge 30 commits intointel:syclfrom
mdtoguchi:link-bitcode-compile
Open

[Driver][SYCL][NewOffloadModel] Pull in device libraries at device compile#21396
mdtoguchi wants to merge 30 commits intointel:syclfrom
mdtoguchi:link-bitcode-compile

Conversation

@mdtoguchi
Copy link
Contributor

@mdtoguchi mdtoguchi commented Feb 27, 2026

Pull in the device libraries during compile time when using the new offload model. This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

This frees up the responsibility of linking in the device libraries during link time, and requiring the driver to pass any libraries needed to the clang-linker-wrapper or having the clang-linker-wrapper figure out what device libraries are needed at link.

With this move, the new.o device libraries are no longer used during the device linking stage using the clang-linker-wrapper. Made updates to the build to stop creating these binaries.

…mpile time

Pull in the device libraries during compile time when using the new offload
model.  This is done by using the `-mlink-builtin-bitcode` option, passing
in each individual device library as needed.

This frees up the responsibility of linking in the device libraries during
link time, and requiring the driver to pass any libraries needed to the
clang-linker-wrapper or having the clang-linker-wrapper figure out what
device libraries are needed at link.
@mdtoguchi mdtoguchi changed the title [Driver][SYCL][NewOffloadModel] Pull in device libraries at device co… [Driver][SYCL][NewOffloadModel] Pull in device libraries at device compile Feb 27, 2026
@mdtoguchi mdtoguchi added the new-offload-model Enables testing with NewOffloadModel. label Feb 27, 2026
@jinge90
Copy link
Contributor

jinge90 commented Mar 12, 2026

Hi, @mdtoguchi @bader
When I tried this PR to reproduce asan issue, I encountered following warning:
clang-linker-wrapper: warning: SYCL device library file list is empty
It seems that we should remove this warning.
And I see the failures for msan/tsan aot, let me reproduce in my side and take a look.
Thanks very much.

jinge90 added 3 commits March 11, 2026 23:16
Signed-off-by: jinge90 <ge.jin@intel.com>
Signed-off-by: jinge90 <ge.jin@intel.com>
Signed-off-by: jinge90 <ge.jin@intel.com>
@jinge90
Copy link
Contributor

jinge90 commented Mar 12, 2026

Hi, @mdtoguchi @bader When I tried this PR to reproduce asan issue, I encountered following warning: clang-linker-wrapper: warning: SYCL device library file list is empty It seems that we should remove this warning. And I see the failures for msan/tsan aot, let me reproduce in my side and take a look. Thanks very much.

The root cause is msan and tsan includes 'device globals', the workflow of mlink-builtin-bitcode in compilation phase is different from current devicelib linking. Each device image will be linked with required device library bitcode and those linked device images will be linked finally to get final device image. If msan/tsan is applied, all device image will be linked with msan/asan device library bitcode and all include the same device global, this is why we see multiple symbol definition in pre-ci failures.
I just push a little fix to bypass these failures to unblock this PR, just make all device globals to be weak. Let us wait for the pre-ci results.

Thanks very much.

 - Uses BitCodeLibraryInfo for determining how to link in the device libs
 - Removes warning from clang-linker-wrapper about no device libs passed
@mdtoguchi mdtoguchi marked this pull request as ready for review March 16, 2026 15:56
@mdtoguchi mdtoguchi requested a review from a team as a code owner March 16, 2026 15:56
@srividya-sundaram
Copy link
Contributor

Pull in the device libraries during compile time when using the new offload model. This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

Will this increase the build time? Especially when compiling for multiple targets or when there are multiple source files?

@@ -189,6 +188,10 @@ class LLVM_LIBRARY_VISIBILITY SYCLToolChain : public ToolChain {
const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;

llvm::SmallVector<BitCodeLibraryInfo, 12>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation comments for all newly added functions?

@mdtoguchi
Copy link
Contributor Author

Pull in the device libraries during compile time when using the new offload model. This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

Will this increase the build time? Especially when compiling for multiple targets or when there are multiple source files?

I did a quick -ftime-trace check against the device compilation and the LinkInModulesPass adds about 3ms-4ms of time to the overall compilation of that source file. Compared to the overall compilation time, this should be negligible.

@bader
Copy link
Contributor

bader commented Mar 18, 2026

This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

This doesn't sound right. There must be just one library of builtins. We need to extend this pattern to SPIR-V target now -

if (TargetTriple.isNVPTX()) {
if (!NoOffloadLib)
LibraryList.push_back(
Args.MakeArgString("devicelib-nvptx64-nvidia-cuda.bc"));
return LibraryList;
}
if (TargetTriple.isAMDGCN()) {
if (!NoOffloadLib)
LibraryList.push_back(
Args.MakeArgString("devicelib-amdgcn-amd-amdhsa.bc"));
return LibraryList;
}
// Ignore no-offloadlib for NativeCPU device library, it provides some
// critical builtins which must be linked with user's device image.
if (TargetTriple.isNativeCPU()) {
LibraryList.push_back(Args.MakeArgString("libsycl-nativecpu_utils.bc"));
return LibraryList;
}

@mdtoguchi
Copy link
Contributor Author

This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

This doesn't sound right. There must be just one library of builtins. We need to extend this pattern to SPIR-V target now -

if (TargetTriple.isNVPTX()) {
if (!NoOffloadLib)
LibraryList.push_back(
Args.MakeArgString("devicelib-nvptx64-nvidia-cuda.bc"));
return LibraryList;
}
if (TargetTriple.isAMDGCN()) {
if (!NoOffloadLib)
LibraryList.push_back(
Args.MakeArgString("devicelib-amdgcn-amd-amdhsa.bc"));
return LibraryList;
}
// Ignore no-offloadlib for NativeCPU device library, it provides some
// critical builtins which must be linked with user's device image.
if (TargetTriple.isNativeCPU()) {
LibraryList.push_back(Args.MakeArgString("libsycl-nativecpu_utils.bc"));
return LibraryList;
}

We can create a single .bc file and link it in similar to the other targets. @bader Is there an issue with pulling in multiple bitcode files?

@bader
Copy link
Contributor

bader commented Mar 18, 2026

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc.
Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

@mdtoguchi
Copy link
Contributor Author

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc. Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

As there is no current functional problem with multiple bitcode files - I would like to move forward with this PR and work on providing a single devicelib in a separate change. Is that OK?

@bader
Copy link
Contributor

bader commented Mar 18, 2026

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc. Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

As there is no current functional problem with multiple bitcode files - I would like to move forward with this PR and work on providing a single devicelib in a separate change. Is that OK?

Yes. Thanks!

@jinge90
Copy link
Contributor

jinge90 commented Mar 19, 2026

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc. Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

As there is no current functional problem with multiple bitcode files - I would like to move forward with this PR and work on providing a single devicelib in a separate change. Is that OK?

Yes. Thanks!

Hi, @bader @mdtoguchi
I suggest to have follwoing device lib files:
for cmath related devicelib files--->libm.bc
for other such as string, assert, random--->libc.bc
For intel math function---->libimf.bc
For device sanitzer libs: not changed
This can align with community gpu libc better, libm.bc/a and libc.bc/a are provided in LLVM libc for GPU target. By doing so, we don't need to change driver code when switching to LLVM libc for libdevice functionalities.
For intel math function, we can align with intel compiler style for normal CPU target, we have libimf.a/so in intel compiler package for CPU platform.
For sanitizer libs, I suggest to keep separate lib files since they are not linked by default but controlled by option "-fsanitize=xxx".

Thanks very much.

@bader
Copy link
Contributor

bader commented Mar 19, 2026

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc. Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

As there is no current functional problem with multiple bitcode files - I would like to move forward with this PR and work on providing a single devicelib in a separate change. Is that OK?

Yes. Thanks!

Hi, @bader @mdtoguchi I suggest to have follwoing device lib files: for cmath related devicelib files--->libm.bc for other such as string, assert, random--->libc.bc For intel math function---->libimf.bc For device sanitzer libs: not changed This can align with community gpu libc better, libm.bc/a and libc.bc/a are provided in LLVM libc for GPU target. By doing so, we don't need to change driver code when switching to LLVM libc for libdevice functionalities. For intel math function, we can align with intel compiler style for normal CPU target, we have libimf.a/so in intel compiler package for CPU platform. For sanitizer libs, I suggest to keep separate lib files since they are not linked by default but controlled by option "-fsanitize=xxx".

Thanks very much.

sounds good to me. @jinge90, I expect you to implement this proposal not @mdtoguchi. Right?

@mdtoguchi
Copy link
Contributor Author

Hi, @bader @mdtoguchi I suggest to have follwoing device lib files: for cmath related devicelib files--->libm.bc for other such as string, assert, random--->libc.bc For intel math function---->libimf.bc For device sanitzer libs: not changed This can align with community gpu libc better, libm.bc/a and libc.bc/a are provided in LLVM libc for GPU target. By doing so, we don't need to change driver code when switching to LLVM libc for libdevice functionalities. For intel math function, we can align with intel compiler style for normal CPU target, we have libimf.a/so in intel compiler package for CPU platform. For sanitizer libs, I suggest to keep separate lib files since they are not linked by default but controlled by option "-fsanitize=xxx".
Thanks very much.

sounds good to me. @jinge90, I expect you to implement this proposal not @mdtoguchi. Right?

@bader, I will follow this plan - thanks!

The device libs are expected and internal to the compiler, emit an error
if any are not found.  Update lit testing to check if the device libs have
been added to the targets, so we can better manipulate the testing.
@mdtoguchi mdtoguchi requested a review from a team as a code owner March 19, 2026 20:32
@jinge90
Copy link
Contributor

jinge90 commented Mar 20, 2026

@bader Is there an issue with pulling in multiple bitcode files?

Basically, using a single file simplifies compiler implementation, debugging, distribution, etc. Technically, multiple bitcode files works as well, but using multiple files complicates the work with the built-ins library without other benefits. Here is GitHub issue related to this topic - #21512.

As there is no current functional problem with multiple bitcode files - I would like to move forward with this PR and work on providing a single devicelib in a separate change. Is that OK?

Yes. Thanks!

Hi, @bader @mdtoguchi I suggest to have follwoing device lib files: for cmath related devicelib files--->libm.bc for other such as string, assert, random--->libc.bc For intel math function---->libimf.bc For device sanitzer libs: not changed This can align with community gpu libc better, libm.bc/a and libc.bc/a are provided in LLVM libc for GPU target. By doing so, we don't need to change driver code when switching to LLVM libc for libdevice functionalities. For intel math function, we can align with intel compiler style for normal CPU target, we have libimf.a/so in intel compiler package for CPU platform. For sanitizer libs, I suggest to keep separate lib files since they are not linked by default but controlled by option "-fsanitize=xxx".
Thanks very much.

sounds good to me. @jinge90, I expect you to implement this proposal not @mdtoguchi. Right?

Hi, @bader
Yes, I will do this once this PR is merged.
Thanks very much.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SYCL “new offload model” flow so that SYCL device libraries are pulled in during the device compile step (via -mlink-builtin-bitcode / -mlink-bitcode-file) rather than being handled later during link time (e.g., by clang-linker-wrapper). It also removes generation/usage of the *.new.o/*.new.obj device library artifacts and updates driver tests and diagnostics accordingly.

Changes:

  • Link SYCL device libraries into SPIR/SPIR-V device compilation with -mlink-builtin-bitcode (and post-opt linking), and add a diagnostic when expected device libraries are missing.
  • Remove build + usage paths for *.new.o/*.new.obj SYCL device libraries; adjust wrapper/linker behavior and tests to match the new model.
  • Add weak definitions for certain DeviceGlobal symbols to avoid multiple-definition issues when libraries are pulled in earlier.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
libdevice/sanitizer/tsan_rtl.cpp Marks __TsanLaunchInfo as weak to avoid duplicate definitions.
libdevice/sanitizer/msan_rtl.cpp Marks __MsanLaunchInfo as weak to avoid duplicate definitions.
libdevice/crt_wrapper.cpp Marks RandNext DeviceGlobal as weak to avoid duplicate definitions.
libdevice/cmake/modules/SYCLLibdevice.cmake Stops producing/installing “new offload” (*.new.o/*.new.obj) device lib variants; simplifies IMF host library build.
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp Skips the “second device linking step” when no extracted device libs are provided.
clang/test/lit.site.cfg.py.in Exposes LLVM_ENABLE_PROJECTS to lit as config.enable_projects.
clang/test/lit.cfg.py Adds a libdevice feature when libdevice is enabled in projects.
clang/test/Driver/sycl-spirv-ext.cpp Restricts test to Linux and fixes line continuation formatting.
clang/test/Driver/sycl-spirv-default-options.cpp Restricts test to Linux.
clang/test/Driver/sycl-offload-new-driver.cpp Removes wrapper option checks that referenced *.new.o device libs.
clang/test/Driver/sycl-instrumentation.cpp Updates checks to expect -mlink-builtin-bitcode usage instead of wrapper-linked *.new.o.
clang/test/Driver/sycl-device-lib.cpp Updates device library expectations from *.new.o to *.bc compile-time linking flags.
clang/test/Driver/sycl-device-lib-diag.cpp New test verifying error diagnostic when expected SYCL device libs are missing.
clang/lib/Driver/ToolChains/SYCL.h Refactors device-lib listing APIs: adds getDeviceLibNames/getDeviceLibs returning BitCodeLibraryInfo.
clang/lib/Driver/ToolChains/SYCL.cpp Implements compile-time device-lib injection for SPIR/SPIR-V in the new model; adds missing-lib diagnostic path.
clang/lib/Driver/ToolChains/Clang.cpp Stops passing SPIR/SPIR-V SYCL device libraries to clang-linker-wrapper via -sycl-device-libraries; uses --bitcode-library for non-SPIR targets.
clang/lib/Driver/Driver.cpp Updates device-lib selection to use SYCLToolChain::getDeviceLibNames (now .bc names) during device-link construction.
clang/include/clang/Basic/DiagnosticDriverKinds.td Adds err_drv_no_sycl_device_lib diagnostic used when device libraries can’t be found.
Comments suppressed due to low confidence (1)

clang/lib/Driver/ToolChains/Clang.cpp:11535

  • appendToList is declared but never used after the switch to --bitcode-library= arguments, which will trigger an unused-variable warning (and can break builds that treat warnings as errors). Remove this lambda or reintroduce its usage if still needed.
    auto appendToList = [](SmallString<256> &List, const Twine &Arg) {
      if (List.size() > 0)
        List += ",";
      List += Arg.str();
    };

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see FE changes other than the diagnostic which LGTM

@mdtoguchi
Copy link
Contributor Author

@intel/llvm-reviewers-runtime and @intel/dpcpp-clang-driver-reviewers, please take a look. Thanks!

Copy link
Contributor

@hchilama hchilama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srividya-sundaram
Copy link
Contributor

Pull in the device libraries during compile time when using the new offload model. This is done by using the -mlink-builtin-bitcode option, passing in each individual device library as needed.

This frees up the responsibility of linking in the device libraries during link time, and requiring the driver to pass any libraries needed to the clang-linker-wrapper or having the clang-linker-wrapper figure out what device libraries are needed at link.

With this move, the new.o device libraries are no longer used during the device linking stage using the clang-linker-wrapper. Made updates to the build to stop creating these binaries.

Can an end-to-end test/integration test be added for the full compile-link pipeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-offload-model Enables testing with NewOffloadModel.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants