Skip to content

Model Package Support#27786

Open
chilo-ms wants to merge 25 commits intomainfrom
chi/model_package_1
Open

Model Package Support#27786
chilo-ms wants to merge 25 commits intomainfrom
chi/model_package_1

Conversation

@chilo-ms
Copy link
Copy Markdown
Contributor

@chilo-ms chilo-ms commented Mar 20, 2026

Description

To support the model package design, one of the goals for ORT is to automatically select the most suitable compiled EPContext binary from a collection of precompiled variants based on the EP, provider options, metadata, and available devices.

This PR is for ORT to support first phase model package. There could be other follow-up PRs in the future.

A model package is a collection of models, binaries, and metadata files organized in a hierarchically structured directory.
The directory structure is not yet finalized, so the following is just a simple example of a model package directory:

<model>.ortpackage/  
├── manifest.json 
├── pipeline.json   
└── models/  
    └── model_name/  
        ├── metadata.json 
        |   └── Contains general information on the component model, 
        |       and specific information about each model variant 
        |       such as data types, quantization algo, EP, etc. that  
        |       is updated on add/remove of model variant 
        └── shared_weights/ (shared weights from all variants) 
            └── <checksum of weights file A>/ 
                └── model.data 
            └── <checksum of weights file B>/  
                └── model.data 
            └── ... 
        └── base model/    
            ├── model.onnx  
            ├── [GenAI config and data files]   
        └── variant A /  
            ├── optimized model.onnx (contains EPContext nodes)  
            ├── [GenAI config and data files]   
            └── [Compilation artifacts]  
        └── variant B /  
            ├── optimized model.onnx (contains EPContext nodes)  
            ├── [GenAI config and data files]   
            └── [Compilation artifacts] 

Definitions:

  • Model Package

    • A model package defines the overall logical ‘model’
    • A model package contains one or more ‘component models’
  • Component Model

    • A component model comprises one or more ‘model variants’
  • Model Variant

    • A ‘model variant’ is typically a single ONNX format model, however we allow some flexibility here
      • An ORT GenAI ‘model variant’ is the collection of files required by ORT GenAI such as one or more ONNX models and related configuration files.

manifest.json and metadata.json

Read the spec here
A manifest.json may look like:

{ 
    "model_name":  <logical_model_name>,
    "component_models": { // optional, if missing, ORT will discover component models by looking for folders with metadata.json under model_package_root/models
        <model_name_1>: {
           …  // could be empty.
        },
    } 
}

or 

{ 
    "model_name":  <my_model_name>,
    "component_models": {
        <model_name_1>: {
            …
            "model_variants": {
                <variant_name_1>:  {
                 "file": <ep_context_model_1 onnx file>,
                 "constraints": {
                     "ep": <ep_name>,
                     "device": <device_type>,
                     "architecture": <hardware_architecture>
                 }
             }
          }
        }
    }  
}

A metadata.json for a component model may look like:

{ 
    "component_model_name":  <my_model_name>,
    "model_variants": {
         <variant_name_1>:  {
             "file": <ep_context_model_1 onnx file>,
             "constraints": {
                 "ep": <ep_name>,
                 "device": <device_type>,
                 "architecture": <hardware_architecture>
             }
         },
         <variant_name_2>:  {
             "file": <ep_context_model_2 onnx file>,
             "constraints": {
                 "ep": <ep_name>,
                 "device": <device_type>,
                 "architecture": <hardware_architecture>
             }
         }   
    }
}

Model Selection

The selection logic is implemented in MatchesVariant(), which evaluates the following constraints:
(Note: A constraint refers to a value under the "constraints" field in either manifest.json or metadata.json.)

  • Check ep constraint
  • Check device constraint
    • For some provider-bridge EPs, they may not implement OrtEpFactory::GetSupportedDevices, therefore ORT
      won't have the supported device information for those EPs. In that case, ORT will skip the device constraint validation for those EPs.
    • If provider option contains key related to device type, then the value must match the device constraint if any.
  • Check ep_compatibility_info constraint
    • ORT does not directly evaluate the architecture constraint. Instead, it relies on the ep_compatibility_info constraint, which may encode architecture information if needed.
    • The ep_compatibility_info value is expected to match the EP compatibility string stored in the EPContext model metadata. (See OrtEp::GetCompiledModelCompatibilityInfo() for how this string is generated.)
    • The EP implementation of EpFactory::ValidateCompiledModelCompatibilityInfo() is responsible for validating the compatibility string against the target device (i.e. OrtHardwareDevice) and returning the compatibility result.

Note

Check the unit test here to better understand how to use model package.

Code Change

This pull request introduces significant enhancements to the execution provider (EP) selection and management infrastructure in ONNX Runtime. The main focus is on supporting more sophisticated device selection and manifest-based model packaging, as well as refactoring provider selection logic for modularity and future extensibility.

Key changes include:

  • Introduction of model package context and manifest parsing to support selecting model components based on device and EP constraints.
  • Refactoring of the execution provider interface and related classes to support multiple devices per provider.
  • Modularization of EP/device selection, creation, and registration logic in the provider policy context.

The most important changes are:

Model Package Context and Manifest Support

  • Added new files model_package_context.h and model_package_context.cc to implement manifest parsing, device/EP constraint matching, and component selection logic for model packages. This enables ONNX Runtime to select the most appropriate model variant based on available hardware and EP configuration. [1] [2]

Execution Provider Interface Enhancements

  • Updated the IExecutionProvider class to support construction with a list of OrtEpDevice pointers, and added a GetEpDevices() method to retrieve the supported devices. This allows plugin and bridge EPs to expose multiple devices. [1] [2]
  • Updated plugin EP construction to pass the list of supported devices to the base class.

Provider Policy Context Refactoring

  • Refactored provider policy context logic to modularize device ordering, device selection, telemetry logging, EP creation, and registration. This includes splitting the monolithic SelectEpsForSession into smaller methods: OrderDevices, SelectEpDevices, LogTelemetry, CreateExecutionProviders, RegisterExecutionProviders, and a new flow for model package-based EP selection. [1] [2] [3] [4]

These changes collectively lay the groundwork for more flexible, robust, and extensible device and EP selection in ONNX Runtime, especially in scenarios involving packaged models with multiple variants and complex hardware environments.

Motivation and Context

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@skottmckay
Copy link
Copy Markdown
Contributor

Top-level manifest.json should define the overall inputs/outputs so a user of the package knows what it does. They shouldn't have to trawl through the information to find the first and last things that will be run to infer this info.

Comment on lines +114 to +122
{
"variant_name": "variant_1",
"file": "mul_1.onnx",
"constraints": {
"ep": "example_ep",
"device": "cpu",
"architecture": "arch1"
}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like lower level per-variant info I would have expected to be in the component model's metadata.json not the top level manifest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where to store this lower-level, per-variant metadata is still open for discussion. It can either reside in the top-level manifest or within each component model’s metadata.json.

Option 1: Top-level manifest:

  • Pros:

    • Provides a single source of truth for the entire model package.
    • Simplifies parsing for ORT, as it only needs to read one manifest file to obtain a complete view.
  • Cons:

    • The manifest may become overly detailed, containing extensive information about each precompiled variant.
    • Requires synchronization with the component model directories.
      If component model directories are added or removed, the manifest must be updated accordingly

Option 2: Component model's metadata.json:

  • Pros:

    • Better modularity and separation of concerns.
    • Changes to component model directories typically do not require updates to the top-level manifest.
  • Cons:

    • ORT must scan and parse all component model directories to collect per-variant metadata.
    • May introduce additional runtime overhead during model package loading.

@devang-ml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way the same amount of info needs to be parsed. Should be trivial to scan the component model directories as the metadata should be in the top-level directory for each component model.

I'm wary about the top-level manifest getting overwhelmingly large, especially if we have multiple component models all with multiple variants. Harder to see issues/inconsistencies and easier to get mega-config files wrong. But maybe humans will never read this stuff and it doesn't matter anymore.

Option 2 slightly simplifies add/remove variant from package as you should only need to update the component model metadata file when doing so. For Foundry Local we will need to add/remove frequently as we will have to publish per-variant packages due to catalog being immutable and to keep the versioning specific to the variant, and merge those on-device post-download. e.g. user downloads TRT-RTX variant and CPU variant separately as they are different entries in the catalog.

I don't quite understand how Option 2 adds runtime overhead. I would have expected we have a general model package helper class. Create it by pointing it at the package directory and it parses everything into in-memory info at that point and checks the package is valid. Via that class instance I should be able to easily get things like the ordered list of component models, the available variants for each component model and things like the EP they require, and a way to get the directory of a variant to load it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are still discussing internally, i made the ModelPackageManifestParser::ParseManifestbe able to parse manifest.json and metadata.json for all component models as well as their associated model variants.

If a variant appears in both, it choses metadata.json as the source of truth, but falls back to manifest.json
if metadata.json is missing required fields.

@skottmckay
Copy link
Copy Markdown
Contributor

Would be good to add some details about the 'how' things are done as the PR description says what has changed but doesn't cover things like how selection is being implemented.


Status RegisterExecutionProviders(InferenceSession& sess,
std::vector<std::unique_ptr<IExecutionProvider>>& providers);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything call these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, that's unnecessary anymore and will remove it

};

class ModelPackageManifestParser {
public:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to keep the model package handling (parsing manifest and metadata files, iterating directories etc. to create a user-friendly in-memory representation) standalone as it will be needed in other places like Foundry Local

Comment on lines +447 to +450
// Parse manifest and gather components.
ModelPackageManifestParser parser(logging::LoggingManager::DefaultLogger());
std::vector<EpContextVariantInfo> components;
ORT_API_RETURN_IF_STATUS_NOT_OK(parser.ParseManifest(package_root, components));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would suggest having a ModelPackage class that owns all this info instead of doing things piecemeal in other places. Construct the ModelPackage from the package_root. It parses and validates the info and provides getters for the code to read that. That way all the parsing and processing of the model package is in one class so if there are any issues there's one place to fix them.

This also feels like it's missing a layer. The package has one or more component models (if multiple there's a specific order they're executed in). A component model has one or more variants. But this code is reading a collection of EpContextVariantInfo so the required grouping by component model seems to be lost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ModelPackage class that owns all the info is a good suggestion and will change the code here.
Also, i did miss the "component model" layer and will add it back.

Comment on lines +69 to +71
Status SelectComponent(gsl::span<EpContextVariantInfo> components,
gsl::span<SelectionEpInfo> ep_infos,
std::optional<std::filesystem::path>& selected_component_path) const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A 'context' owning the selection logic feels slightly off. Maybe that's just a naming thing as this seems more like it's implementing a selection policy for a model variant (which != component model).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was mixing the use of component model and model variant.
You are right, here is implementing a selection policy for a model variant. I will change the naming.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@@ -0,0 +1,380 @@
// Copyright (c) Microsoft Corporation.

Check warning

Code scanning / lintrunner

CLANGFORMAT/format Warning

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.
@chilo-ms chilo-ms marked this pull request as ready for review March 27, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants