Conversation
| auto sym = (ret(*)(__VA_ARGS__))Lib.getAddressOfSymbol(#sym); \ | ||
| if (!sym) \ | ||
| return ""; | ||
| LOAD(cuInit, int, unsigned) |
There was a problem hiding this comment.
warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]
LOAD(cuInit, int, unsigned)
^Additional context
lib/CppInterOp/Compatibility.h:222: expanded from macro 'LOAD'
auto sym = (ret(*)(__VA_ARGS__))Lib.getAddressOfSymbol(#sym); \
^| if (!sym) \ | ||
| return ""; | ||
| LOAD(cuInit, int, unsigned) | ||
| LOAD(cuDeviceGet, int, uint32_t*, int) |
There was a problem hiding this comment.
warning: no header providing "uint32_t" is directly included [misc-include-cleaner]
lib/CppInterOp/Compatibility.h:14:
+ #include <cstdint>| cuDeviceGetAttribute(&maj, /*MAJOR*/ 75, dev) || | ||
| cuDeviceGetAttribute(&min, /*MINOR*/ 76, dev)) | ||
| return ""; | ||
| return "sm_" + std::to_string(maj) + std::to_string(min); |
There was a problem hiding this comment.
warning: no header providing "std::to_string" is directly included [misc-include-cleaner]
return "sm_" + std::to_string(maj) + std::to_string(min);
^| static bool HasCudaSDK() { | ||
| auto supportsCudaSDK = []() { | ||
| // perform SDK and runtime checks on a single CUDA interpreter | ||
| static std::pair<bool, bool> CheckCudaSupport() { |
There was a problem hiding this comment.
warning: no header providing "std::pair" is directly included [misc-include-cleaner]
unittests/CppInterOp/CUDATest.cpp:9:
+ #include <utility>| CB.SetOffloadArch("sm_35"); | ||
| for (const auto* arg : args) | ||
| if (llvm::StringRef(arg).starts_with("--offload-arch=")) | ||
| OffloadArch = llvm::StringRef(arg).substr(strlen("--offload-arch=")); |
There was a problem hiding this comment.
This is probably a stupid question. Do we have any logic to check that the user provided gpu architechture is a valid one? I'm saying this assuming the answer is yes.
| compat::Interpreter* interp = &getInterp(I); | ||
| const auto& LO = interp->getCI()->getLangOpts(); | ||
|
|
||
| // CUDA and HIP reuse C++ language standards, so LangStd alone reports CXX. |
There was a problem hiding this comment.
This comment isn't specific to this PR, but given we are looking at gpu related stuff, can the interpreter handle, c++ + openmp (gpu offloading)?
There was a problem hiding this comment.
I believe you need to pass -fopenmp to the interpreter, but so far only CPU has been tested and not GPU offloading
| if (llvm::StringRef(arg).starts_with("--offload-arch=")) | ||
| OffloadArch = llvm::StringRef(arg).substr(strlen("--offload-arch=")); |
There was a problem hiding this comment.
I guess we will need some generic way to rely on what's already in clang. Here is what gemini proposes and I think it is worth investigating:
#include "clang/Driver/CudaInstallationDetector.h"
#include "clang/Driver/Options.h"
#include "llvm/Option/ArgList.h"
#include "llvm/Option/OptTable.h"
#include "llvm/TargetParser/Host.h"
// ... inside createClangInterpreter ...
// 1. Parse the raw args into a format the Detector understands
const auto &Opts = clang::driver::getDriverOptTable();
unsigned MissingArgIndex, MissingArgCount;
// Use 0 as the included/excluded flags for a general parse
auto InputArgs = Opts.ParseArgs(args, MissingArgIndex, MissingArgCount);
// 2. Identify the Triple (Host is usually the intended target for JIT)
llvm::Triple HostTriple(llvm::sys::getDefaultTargetTriple());
// 3. Initialize the Detector (No CI needed!)
// The detector will look for --cuda-path in InputArgs automatically
clang::driver::CudaInstallationDetector Detector(
/*Driver=*/ {}, // We can pass an empty driver object or a dummy
HostTriple,
InputArgs);
if (CudaEnabled && Detector.isValid()) {
// 4. Resolve the Architecture
std::string SelectedArch;
if (auto *Arg = InputArgs.getLastArg(clang::driver::options::OPT_offload_arch_EQ)) {
SelectedArch = Arg->getValue();
} else {
SelectedArch = detectNVIDIAGPUArch();
}
// 5. Validation Logic
if (!SelectedArch.empty()) {
clang::CudaArch CArch = clang::StringToCudaArch(SelectedArch);
// If the hardware is newer than the SDK, downgrade to the SDK's max supported arch
if (Detector.version() < clang::MinVersionForCudaArch(CArch)) {
CArch = clang::MaxVersionForCudaArch(Detector.version());
SelectedArch = clang::CudaArchToString(CArch);
}
CB.SetOffloadArch(SelectedArch);
}
}The advantage is that now you know where the sdk is and you also can pass the relevant include paths automatically:
if (CudaEnabled && Detector.isValid()) {
// 1. Add the CUDA include directory (e.g., /usr/local/cuda/include)
// We use -isystem to treat them as system headers (suppressing warnings)
std::string IncludeFlag = "-isystem" + Detector.getIncludePath();
// Note: Ensure the string's lifetime matches the 'args' usage.
// If 'args' is a vector of const char*, you may need to manage the allocation.
args.push_back(strdup(IncludeFlag.c_str()));
// 2. Add the path to the CUDA builtins (libdevice)
// This is required for math functions like __sinf, __expf, etc.
std::string LibDevicePath = Detector.getLibDevicePath();
if (!LibDevicePath.empty()) {
std::string LinkBitcode = "--cuda-device-lib=" + LibDevicePath;
args.push_back(strdup(LinkBitcode.c_str()));
}
}There was a problem hiding this comment.
This is somewhat what the current implementation does, we use clang::Driver to create the clang::driver::CudaInstallationDetector and iterate over the paths it inserts based on the CUDA directory
There was a problem hiding this comment.
This is done by creating a compilation object with the driver and fetching the arg with this clang::driver::options::OPT_cuda_path_EQ enum value
b6c9e3a to
657cf09
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #866 +/- ##
==========================================
+ Coverage 78.67% 78.86% +0.19%
==========================================
Files 11 11
Lines 4014 4079 +65
==========================================
+ Hits 3158 3217 +59
- Misses 856 862 +6
🚀 New features to boost your workflow:
|
75923f6 to
d5fea35
Compare
|
This PR needs rebasing to get rid of the Emscripten llvm 20 jobs. |
0947a76 to
235f249
Compare
|
I have some fixups to increase code coverage.. |
41bdb5f to
fdc6461
Compare
cfaaf7d to
d94e0ec
Compare
d94e0ec to
441f174
Compare
`compat::createClangInterpreter` now auto detects the CUDA SDK path via CudaInstallationDetector, and the GPU compute capability (sm_xx) using the CUDA Driver API (cuDeviceGetAttribute). User provided flags still get first priority. This makes CUDA support with CppInterOp much more robust.
Consolidate SDK an dRuntime checks into a fixture that only runs once. Adds tests for `compat::detectCudaInstallPath` and `compat::detectNVPTXArch` While debugging issues with the CUDA support, after fixing the interpreter creation, I found that the second interpreter creation held an error in cudaGetLastError because we perform the check sequentially for HasCudaRuntime. Now, we have a single `CheckCudaSupport` that performs both checks at once. Passes locally
Drop hardcoded CUDA arch provided to the
IncrementalCompilerBuilderincreateClangInterpreter. While debugging issues with the CUDA support that seems broken on master, after fixing the interpreter creation, I found that the second interpreter creation held an error in cudaGetLastError. CUDA and HIP reuse that option so it would not indicate whether the interpreter was CUDA enabled. All 4 CUDA tests pass locally with ansm_89device, LLVM 21