feat: Implement persistent CUBIN caching, multi-threaded context safety, and cross-platform NVRTC support (#175) by Franklalalala · Pull Request #193 · PASSIONLab/OpenEquivariance

Franklalalala · 2026-03-24T06:42:31Z

📌 Content

This PR introduces a robust disk-based caching mechanism for NVRTC JIT compilation, ensures CUDA context safety across multiple threads, and provides cross-platform compatibility (Windows/Linux). It significantly optimizes the startup time of models using JIT-compiled kernels by avoiding redundant compilation.

🎯 Motivation

Addressing Issue Compilation cache fails to persist on L40s/RTX 6000 #175: Users on high-end hardware (e.g., L40s, RTX 6000) reported long JIT compilation times. Currently, kernels are recompiled every time the process starts. Persistent caching solves this.
Multi-threaded Reliability: In environments like PyTorch DataLoader workers, CUDA contexts aren't always automatically initialized. This PR ensures a valid context is bound before any Driver API calls.
Deployment Flexibility: Adds support for custom cache paths and provides environment variables to toggle caching.
Bug Fixes: Fixed a memory leak where the compilation log was not freed during exception throws, and improved error reporting.

🛠️ Core Differences & Implementation Details

The new implementation introduces several critical enhancements over the previous version:

Deterministic Persistent Caching
- Old: Used an in-memory compiled flag; compilation was lost on process exit.
- New: Generates a unique FNV-1a 64-bit hash based on: NVRTC version, GPU architecture (sm_xx), compilation options, source code, and kernel name expressions.
- Storage: Saves both .cubin (binary) and .names (lowered name mappings) to ~/.cache/openequivariance or a custom path via OEQ_CACHE_PATH.
Atomic & Safe File I/O
- Race Condition Prevention: To prevent corrupted cache files when multiple processes/threads compile simultaneously, writes are performed to a temporary file (uniquely identified by PID and ThreadID) and then atomically moved to the final destination using std::rename.
Automatic CUDA Context Management
- Added ensure_cuda_context(). This function detects if the calling thread has an active CUDA context. If not, it retains and sets the Primary Context for the device. This is crucial for stability in multi-threaded C++ or Python environments.
Cross-Platform Portability
- Replaced Linux-specific logic with a unified macro system (_WIN32) for directory creation (mkdir vs _mkdir) and process identification (getpid vs _getpid).
- Utilizes <cinttypes> and PRIx64 for consistent 64-bit hex formatting across different compilers/architectures.
Robust Error Handling
- Ensures delete[] log; is called even when a std::logic_error is thrown during NVRTC failures, preventing memory leaks during iterative debugging.

🤖 Contribution Note

This PR was developed through a collaborative effort between the contributor and multiple AI systems (Gemini 3.1 Pro Preview, GPT-5.4 High & Claude 4.6 Thinking models). We have worked together to address complex technical edge cases (such as atomic renames and primary context retention) to ensure the long-term stability and performance of the OpenEquivariance library.

I have verified these changes on my local environment, and they successfully resolve the redundant JIT overhead described in Issue #175.

to support sm_89 machine

Franklalalala · 2026-03-24T06:44:26Z

me personally do not understand the underlining mechanism, though it indeed fix the issue smh...

Franklalalala added 3 commits March 19, 2026 13:17

Update backend_cuda.hpp

bd06f4c

to support sm_89 machine

Update backend_cuda.hpp

ba51a5b

Merge branch 'PASSIONLab:main' into main

d2e78af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement persistent CUBIN caching, multi-threaded context safety, and cross-platform NVRTC support (#175)#193

feat: Implement persistent CUBIN caching, multi-threaded context safety, and cross-platform NVRTC support (#175)#193
Franklalalala wants to merge 3 commits intoPASSIONLab:mainfrom
Franklalalala:main

Franklalalala commented Mar 24, 2026

Uh oh!

Franklalalala commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Franklalalala commented Mar 24, 2026

📌 Content

🎯 Motivation

🛠️ Core Differences & Implementation Details

🤖 Contribution Note

Uh oh!

Franklalalala commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant