Skip to content

feat(pathfinder): centralize CUDA env var handling, prioritize CUDA_PATH over CUDA_HOME#1801

Draft
rwgk wants to merge 12 commits intoNVIDIA:mainfrom
rwgk:CUDA_PATH_CUDA_HOME_cleanup
Draft

feat(pathfinder): centralize CUDA env var handling, prioritize CUDA_PATH over CUDA_HOME#1801
rwgk wants to merge 12 commits intoNVIDIA:mainfrom
rwgk:CUDA_PATH_CUDA_HOME_cleanup

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Mar 20, 2026

Closes #1433

WIP-WIP-WIP

Current state: Squash-merge of PR #1519 rebased onto main. Untested (not even locally).

…d onto main.

Adds cuda.pathfinder._utils.env_var_constants with canonical search order,
enhances get_cuda_home_or_path() with robust path comparison and caching,
and updates documentation across all packages to reflect the new priority.

Co-authored-by: Rob Parolin <rparolin@nvidia.com>
Made-with: Cursor
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 20, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 20, 2026

Decision: drop os.pathsep splitting of CUDA_HOME/CUDA_PATH

Both cuda_bindings/build_hooks.py and cuda_core/build_hooks.py currently split the env var value on os.pathsep and iterate over a list of CUDA roots. After reviewing the codebase and the broader ecosystem, I'm removing this pattern. Rationale:

  • No real-world usage. The NVIDIA CTK installer (Windows), conda environments, pixi configs, and CMake's FindCUDAToolkit all set CUDA_PATH/CUDA_HOME to a single directory. There is no documented convention for pathsep-separated multi-path values.
  • Never tested. The two build_hooks.py files even disagreed on env var priority (CUDA_HOME > CUDA_PATH in bindings, CUDA_PATH > CUDA_HOME in core) — if anyone had used multi-path, that inconsistency would have surfaced.
  • Avoids premature complexity. Adding multi-path support to the new centralized get_cuda_home_or_path() (especially combined with the conflict-warning logic) would add design surface for a feature nobody uses today.

Both _get_cuda_paths() functions will be replaced with a plain get_cuda_home_or_path() call returning a single path. If a real multi-path use case surfaces later, we can add it with proper design at that point.


Related commit: 2164c33

Drop os.pathsep splitting of CUDA_PATH/CUDA_HOME in both build_hooks.py files.
Both functions now delegate to get_cuda_home_or_path() from cuda.pathfinder,
returning a single path.

See NVIDIA#1801 (comment)

Made-with: Cursor
@rwgk
Copy link
Collaborator Author

rwgk commented Mar 20, 2026

Decision: treat empty CUDA_PATH/CUDA_HOME as undefined

Rob's original implementation preserved empty strings (CUDA_PATH="") as valid return values from get_cuda_home_or_path(). I'm changing this so that empty strings are equivalent to the variable not being set. Rationale:

  • No valid use case. An empty string is not a valid filesystem path. Nobody intentionally sets CUDA_PATH="" to mean "the CUDA path is empty."
  • Fixes a subtle shadowing bug. With the old behavior, CUDA_PATH="" + CUDA_HOME=/usr/local/cuda would return "" (CUDA_PATH wins by priority), silently hiding the valid CUDA_HOME. Treating empty as undefined lets it correctly fall through.
  • Simplifies callers. All downstream consumers already do if not cuda_path: raise .... Returning "" just pushes the falsy check to every call site.
  • Matches convention. The standard os.environ.get("CUDA_PATH") idiom throughout the codebase has always treated empty and unset interchangeably (via if not value). This aligns with that expectation.

This addresses my review comment on Rob's PR: #1519 (comment)

rwgk added 10 commits March 20, 2026 12:49
Safe: currently an internal-only API (not yet public).
Made-with: Cursor
Export get_cuda_path_or_home from cuda.pathfinder.__init__. External
consumers now import from cuda.pathfinder directly. Rename constant
to _CUDA_PATH_ENV_VARS_ORDERED and remove all public references to it.

Made-with: Cursor
Pathfinder 1.5.0 release notes no longer claim cross-package consistency
(that depends on future bindings/core releases). cuda_bindings env var
docs now defer to pathfinder release notes for migration guidance.

Made-with: Cursor
@rwgk
Copy link
Collaborator Author

rwgk commented Mar 21, 2026

Dump of findings from independent review with Cursor GPT-5.4 Extra High:

  Findings
  • High cuda_core and cuda_bindings now import get_cuda_path_or_home() in their build hooks (cuda_core/build_hooks.py:33, cuda_bindings/build_hooks.py:38), but their metadata still allows
     older cuda-pathfinder releases in both build and runtime deps (cuda_core/pyproject.toml:10, cuda_core/pyproject.toml:51, cuda_bindings/pyproject.toml:9,
    cuda_bindings/pyproject.toml:35). That means a resolver can legally choose a 1.4.x release that does not export this symbol, and isolated source builds will fail with ImportError
    before the actual CUDA-path logic runs.
  • Medium The new CUDA_PATH-first behavior is not reflected in public provenance fields. Winner selection now goes through get_cuda_path_or_home()
    (cuda_pathfinder/cuda/pathfinder/_utils/env_vars.py:77, cuda_pathfinder/cuda/pathfinder/_utils/env_vars.py:105), but env-based results still hard-code found_via="CUDA_HOME" in
    headers/libs lookup (cuda_pathfinder/cuda/pathfinder/_headers/find_nvidia_headers.py:110, cuda_pathfinder/cuda/pathfinder/_dynamic_libs/search_steps.py:198,
    cuda_pathfinder/cuda/pathfinder/_static_libs/find_static_lib.py:152, cuda_pathfinder/cuda/pathfinder/_static_libs/find_bitcode_lib.py:148). So callers can be told the path came from
    CUDA_HOME even when CUDA_PATH won.
  • Medium The header-dependent core tests now skip based on “an env var resolved” rather than “headers actually exist.” skipif_need_cuda_headers only checks the helper result in
    cuda_core/tests/conftest.py:257, while the shared helpers still require a real <cuda_root>/include directory in cuda_core/tests/helpers/__init__.py:13 and
    cuda_core/tests/helpers/__init__.py:17. With CUDA_PATH=/bad/path, these tests will run and fail for the wrong reason instead of skipping cleanly.
  • Low The docs version switcher points multiple releases at the wrong docs set. cuda_pathfinder/docs/nv-versions.json:7, cuda_pathfinder/docs/nv-versions.json:8,
    cuda_pathfinder/docs/nv-versions.json:11, cuda_pathfinder/docs/nv-versions.json:12, cuda_pathfinder/docs/nv-versions.json:15, and cuda_pathfinder/docs/nv-versions.json:16 map 1.5.0,
    1.4.3, and 1.4.2 to the 1.4.1 URL.

  Open questions / assumptions
  • I did not count the removed os.pathsep multi-root handling in the build hooks as a formal finding. It looks intentional, but it is still a compatibility change worth documenting if
    anyone relies on path-list values in CUDA_PATH or CUDA_HOME.
  • I could not run trustworthy local commands/tests in this environment because command execution was returning empty success responses, so this is a static review against main plus the
    PR/issue context rather than an executed test pass.

  Compared with main, the overall CUDA_PATH precedence cleanup looks directionally right, but I would not merge without fixing the dependency bounds first; that one is a real build-breaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Improve handling of CUDA_HOME and CUDA_PATH in build across projects

1 participant