Skip to content

ais-check: Add better HIP detection#216

Open
derobins wants to merge 5 commits intodevelopfrom
derobins/ais-check_better_hip_detection
Open

ais-check: Add better HIP detection#216
derobins wants to merge 5 commits intodevelopfrom
derobins/ais-check_better_hip_detection

Conversation

@derobins
Copy link
Collaborator

@derobins derobins commented Mar 13, 2026

Adds better HIP detection and output

AIHIPFILE-156

Copilot AI review requested due to automatic review settings March 13, 2026 20:02
@derobins derobins marked this pull request as draft March 13, 2026 20:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves ais-check’s HIP runtime detection by scanning common ROCm/HIP installation locations and reporting which discovered HIP runtimes support AIS-related symbols.

Changes:

  • Add filesystem-based discovery for libamdhip64.so across common ROCm/HIP paths.
  • Probe each discovered HIP runtime for AIS symbol availability and store per-library support state.
  • Print a per-HIP-library AIS support summary in non-quiet mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@derobins derobins marked this pull request as ready for review March 13, 2026 21:26
Copilot AI review requested due to automatic review settings March 13, 2026 21:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances tools/ais-check to improve HIP runtime detection for AMD Infinity Storage (AIS) and to provide more detailed output about which HIP runtime libraries were found and whether each supports AIS.

Changes:

  • Add filesystem/environment-based discovery of libamdhip64.so across multiple common ROCm install locations.
  • Probe AIS symbol availability across all discovered HIP runtime libraries and aggregate results.
  • Print a per-library AIS support summary in the non-quiet output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +164 to +169
if err != 0:
if symbol_status.value != 1:
symbol = symbol.decode("utf-8")
err_str = hip.hipGetErrorString(err).decode("utf-8")
print(
f"hipGetProcAddress({symbol}) failed with err code"
Comment on lines +64 to +75
# 1. Respect runtime linker paths
for p in os.environ.get("LD_LIBRARY_PATH", "").split(":"):
if p:
# Clean up the path by removing `..`, etc. and getting
# an absolute path.
safe_p = os.path.abspath(os.path.normpath(p))

candidates.append(os.path.join(safe_p, "libamdhip64.so"))

# 2. Environment variables commonly set by ROCm or modules
for var in ("ROCM_HOME", "ROCM_PATH", "HIP_PATH"):
base = os.environ.get(var)
@kurtmcmillan
Copy link
Collaborator

I think that ais-check should return success if the current environment supports the fastpath. I think this change breaks that. If the current environment is using ROCm 7.0, but ROCm 7.2 is also installed, would this script return success? If so, I don't think it should.

print(u.sysname, u.nodename, u.release, u.version, u.machine)

print()
print("Found these HIP libraries (some may be symlinks):")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding the word "redundant" (e.g. redundant symlinks) since /opt/rocm will likely be a duplicate entry (tbh I can't think of typical scenarios where it's not). To fix entirely, we could track ROCm installs by the realpath to get rid of symlinks.

@jordan-turbofish
Copy link
Collaborator

Played around with getting the absolute path when using find_library:

def dlinfo_path(library_name):
    class LinkMap(ctypes.Structure):
        _fields_ = [
            ("l_addr", ctypes.c_void_p),
            ("l_name", ctypes.c_char_p),
        ]

    dl_path = ctypes.util.find_library("dl")
    library_path = ctypes.util.find_library(library_name)
    if dl_path is None or library_path is None:
        return None

    try:
        libdl = ctypes.CDLL(dl_path)
        libwanted = ctypes.CDLL(library_path)
    except OSError:
        return None

    dlinfo = libdl.dlinfo
    dlinfo.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
    dlinfo.restype = ctypes.c_int

    lmptr = ctypes.POINTER(LinkMap)()
    err = dlinfo(libwanted._handle, 2, ctypes.byref(lmptr))
    if err != 0:
        return None

    abspath = lmptr.contents.l_name.decode('ascii')
    return abspath

print(f"amdhip64 path:{dlinfo_path(library_name='amdhip64')}")

Testing:

> python3 test.py
amdhip64 path:None
> LD_LIBRARY_PATH=/home/jpatters/hip-7.2.0-dbg/lib:/opt/rocm/lib python3 test.py
amdhip64 path:/home/jpatters/hip-7.2.0-dbg/lib/libamdhip64.so.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants