Skip to content

Improves support for out of tree Ops.#968

Open
tehbone wants to merge 2 commits intoexplosion:v8.3.xfrom
tehbone:out_of_tree_support
Open

Improves support for out of tree Ops.#968
tehbone wants to merge 2 commits intoexplosion:v8.3.xfrom
tehbone:out_of_tree_support

Conversation

@tehbone
Copy link
Copy Markdown

@tehbone tehbone commented Mar 27, 2026

Utilizes the entry_point registration to search for Ops that have been defined out of tree. Additionally refactors a bit of the GPU configuration and selection logic to not assume CUDA devices but also support the possibility that the device is non-CUDA.

Description

Pytorch is convincing vendors to add support for their devices out of their tree, by way of the PrivateUseOne device. This has a side effect that properly configured devices using that strategy will not present themselves as PrivateUseOne, but by one of the vendor's naming convention. In our case, the device is the nnpa device. The support provided here would allow one to extend the Ops interface and have the ability to choose one of these devices.

The main usage update is follows: require_gpu and prefer_gpu now take a keyword parameter "name", which can refer to any of the registered ops, and the functions will behave as expected - the prefer form will use the ops if it is found and the newly-added Ops.has_gpu_support is True. The require form will error if it cannot be used. The default form where the name parameter is not specified will search the entrypoints for out of tree definitions before falling back on the old method of autodetection.

A minimal example of an out of tree op that has been tested with this is here: (note - it only works on s390x machines)

Types of change

Enhancement/new feature, addresses #966.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Utilizes the entry_point registration to search for Ops that have been defined out
of tree. Additionally refactors a bit of the GPU configuration and selection
logic to not assume CUDA devices but also support the possibility that the device
is non-CUDA.
@tehbone
Copy link
Copy Markdown
Author

tehbone commented Mar 27, 2026

This is a stab at addressing #966. Please let me know if this approach would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant