Skip to content

Fix CUDA tests#866

Merged
vgvassilev merged 2 commits intocompiler-research:mainfrom
aaronj0:fix-cuda-tests
Mar 25, 2026
Merged

Fix CUDA tests#866
vgvassilev merged 2 commits intocompiler-research:mainfrom
aaronj0:fix-cuda-tests

Conversation

@aaronj0
Copy link
Copy Markdown
Collaborator

@aaronj0 aaronj0 commented Mar 23, 2026

Drop hardcoded CUDA arch provided to the IncrementalCompilerBuilder in createClangInterpreter. While debugging issues with the CUDA support that seems broken on master, after fixing the interpreter creation, I found that the second interpreter creation held an error in cudaGetLastError. CUDA and HIP reuse that option so it would not indicate whether the interpreter was CUDA enabled. All 4 CUDA tests pass locally with an sm_89 device, LLVM 21

@aaronj0 aaronj0 requested a review from vgvassilev March 23, 2026 16:05
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
auto sym = (ret(*)(__VA_ARGS__))Lib.getAddressOfSymbol(#sym); \
if (!sym) \
return "";
LOAD(cuInit, int, unsigned)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  LOAD(cuInit, int, unsigned)
  ^
Additional context

lib/CppInterOp/Compatibility.h:222: expanded from macro 'LOAD'

  auto sym = (ret(*)(__VA_ARGS__))Lib.getAddressOfSymbol(#sym);                \
             ^

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
if (!sym) \
return "";
LOAD(cuInit, int, unsigned)
LOAD(cuDeviceGet, int, uint32_t*, int)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "uint32_t" is directly included [misc-include-cleaner]

lib/CppInterOp/Compatibility.h:14:

+ #include <cstdint>

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h Outdated
cuDeviceGetAttribute(&maj, /*MAJOR*/ 75, dev) ||
cuDeviceGetAttribute(&min, /*MINOR*/ 76, dev))
return "";
return "sm_" + std::to_string(maj) + std::to_string(min);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::to_string" is directly included [misc-include-cleaner]

  return "sm_" + std::to_string(maj) + std::to_string(min);
                      ^

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
static bool HasCudaSDK() {
auto supportsCudaSDK = []() {
// perform SDK and runtime checks on a single CUDA interpreter
static std::pair<bool, bool> CheckCudaSupport() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::pair" is directly included [misc-include-cleaner]

unittests/CppInterOp/CUDATest.cpp:9:

+ #include <utility>

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h Outdated
CB.SetOffloadArch("sm_35");
for (const auto* arg : args)
if (llvm::StringRef(arg).starts_with("--offload-arch="))
OffloadArch = llvm::StringRef(arg).substr(strlen("--offload-arch="));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a stupid question. Do we have any logic to check that the user provided gpu architechture is a valid one? I'm saying this assuming the answer is yes.

compat::Interpreter* interp = &getInterp(I);
const auto& LO = interp->getCI()->getLangOpts();

// CUDA and HIP reuse C++ language standards, so LangStd alone reports CXX.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't specific to this PR, but given we are looking at gpu related stuff, can the interpreter handle, c++ + openmp (gpu offloading)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to pass -fopenmp to the interpreter, but so far only CPU has been tested and not GPU offloading

Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment on lines +263 to +264
if (llvm::StringRef(arg).starts_with("--offload-arch="))
OffloadArch = llvm::StringRef(arg).substr(strlen("--offload-arch="));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we will need some generic way to rely on what's already in clang. Here is what gemini proposes and I think it is worth investigating:

#include "clang/Driver/CudaInstallationDetector.h"
#include "clang/Driver/Options.h"
#include "llvm/Option/ArgList.h"
#include "llvm/Option/OptTable.h"
#include "llvm/TargetParser/Host.h"

// ... inside createClangInterpreter ...

// 1. Parse the raw args into a format the Detector understands
const auto &Opts = clang::driver::getDriverOptTable();
unsigned MissingArgIndex, MissingArgCount;
// Use 0 as the included/excluded flags for a general parse
auto InputArgs = Opts.ParseArgs(args, MissingArgIndex, MissingArgCount);

// 2. Identify the Triple (Host is usually the intended target for JIT)
llvm::Triple HostTriple(llvm::sys::getDefaultTargetTriple());

// 3. Initialize the Detector (No CI needed!)
// The detector will look for --cuda-path in InputArgs automatically
clang::driver::CudaInstallationDetector Detector(
    /*Driver=*/ {}, // We can pass an empty driver object or a dummy
    HostTriple, 
    InputArgs);

if (CudaEnabled && Detector.isValid()) {
    // 4. Resolve the Architecture
    std::string SelectedArch;
    if (auto *Arg = InputArgs.getLastArg(clang::driver::options::OPT_offload_arch_EQ)) {
        SelectedArch = Arg->getValue();
    } else {
        SelectedArch = detectNVIDIAGPUArch();
    }

    // 5. Validation Logic
    if (!SelectedArch.empty()) {
        clang::CudaArch CArch = clang::StringToCudaArch(SelectedArch);
        // If the hardware is newer than the SDK, downgrade to the SDK's max supported arch
        if (Detector.version() < clang::MinVersionForCudaArch(CArch)) {
            CArch = clang::MaxVersionForCudaArch(Detector.version());
            SelectedArch = clang::CudaArchToString(CArch);
        }
        CB.SetOffloadArch(SelectedArch);
    }
}

The advantage is that now you know where the sdk is and you also can pass the relevant include paths automatically:

if (CudaEnabled && Detector.isValid()) {
    // 1. Add the CUDA include directory (e.g., /usr/local/cuda/include)
    // We use -isystem to treat them as system headers (suppressing warnings)
    std::string IncludeFlag = "-isystem" + Detector.getIncludePath();
    
    // Note: Ensure the string's lifetime matches the 'args' usage. 
    // If 'args' is a vector of const char*, you may need to manage the allocation.
    args.push_back(strdup(IncludeFlag.c_str()));

    // 2. Add the path to the CUDA builtins (libdevice)
    // This is required for math functions like __sinf, __expf, etc.
    std::string LibDevicePath = Detector.getLibDevicePath();
    if (!LibDevicePath.empty()) {
        std::string LinkBitcode = "--cuda-device-lib=" + LibDevicePath;
        args.push_back(strdup(LinkBitcode.c_str()));
    }
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat what the current implementation does, we use clang::Driver to create the clang::driver::CudaInstallationDetector and iterate over the paths it inserts based on the CUDA directory

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done by creating a compilation object with the driver and fetching the arg with this clang::driver::options::OPT_cuda_path_EQ enum value

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 91.78082% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.86%. Comparing base (8c06fd6) to head (3587c83).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
lib/CppInterOp/Compatibility.h 91.78% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #866      +/-   ##
==========================================
+ Coverage   78.67%   78.86%   +0.19%     
==========================================
  Files          11       11              
  Lines        4014     4079      +65     
==========================================
+ Hits         3158     3217      +59     
- Misses        856      862       +6     
Files with missing lines Coverage Δ
lib/CppInterOp/Compatibility.h 87.50% <91.78%> (+2.06%) ⬆️
Files with missing lines Coverage Δ
lib/CppInterOp/Compatibility.h 87.50% <91.78%> (+2.06%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aaronj0 aaronj0 force-pushed the fix-cuda-tests branch 2 times, most recently from 75923f6 to d5fea35 Compare March 25, 2026 08:55
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 10 out of 17. Check the log or trigger a new build to see more.

Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h
Comment thread unittests/CppInterOp/CUDATest.cpp
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp
Comment thread unittests/CppInterOp/CUDATest.cpp
Comment thread unittests/CppInterOp/CUDATest.cpp
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread lib/CppInterOp/Compatibility.h Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread lib/CppInterOp/Compatibility.h
Comment thread lib/CppInterOp/Compatibility.h
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp
@mcbarton
Copy link
Copy Markdown
Collaborator

This PR needs rebasing to get rid of the Emscripten llvm 20 jobs.

@aaronj0
Copy link
Copy Markdown
Collaborator Author

aaronj0 commented Mar 25, 2026

I have some fixups to increase code coverage..

@aaronj0 aaronj0 force-pushed the fix-cuda-tests branch 2 times, most recently from 41bdb5f to fdc6461 Compare March 25, 2026 10:53
Copy link
Copy Markdown
Contributor

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp Outdated
Comment thread unittests/CppInterOp/CUDATest.cpp
@aaronj0 aaronj0 force-pushed the fix-cuda-tests branch 2 times, most recently from cfaaf7d to d94e0ec Compare March 25, 2026 12:19
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread unittests/CppInterOp/CUDATest.cpp
Comment thread unittests/CppInterOp/CUDATest.cpp
aaronj0 added 2 commits March 25, 2026 16:53
`compat::createClangInterpreter` now auto detects the CUDA SDK path via CudaInstallationDetector,
and the GPU compute capability (sm_xx) using the CUDA Driver API (cuDeviceGetAttribute).
User provided flags still get first priority. This makes CUDA support with CppInterOp much more robust.
Consolidate SDK an dRuntime checks into a fixture that only runs once. Adds tests for `compat::detectCudaInstallPath` and `compat::detectNVPTXArch`
While debugging issues with the CUDA support, after fixing the interpreter creation, I found that the second interpreter creation held an error in cudaGetLastError because we perform the check sequentially for HasCudaRuntime. Now, we have a single `CheckCudaSupport` that performs both checks at once. Passes locally
@vgvassilev vgvassilev merged commit 80d9655 into compiler-research:main Mar 25, 2026
31 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants