Model Package Support by chilo-ms · Pull Request #27786 · microsoft/onnxruntime

chilo-ms · 2026-03-20T18:05:36Z

Description

To support the model package design, one of the goals for ORT is to automatically select the most suitable compiled EPContext binary from a collection of precompiled variants based on the EP, provider options, metadata, and available devices.

This PR is for ORT to support first phase model package. There could be other follow-up PRs in the future.

A model package is a collection of models, binaries, and metadata files organized in a hierarchically structured directory.
The directory structure is not yet finalized, so the following is just a simple example of a model package directory:

<model>.ortpackage/  
├── manifest.json 
├── pipeline.json   
└── models/  
    └── model_name/  
        ├── metadata.json 
        |   └── Contains general information on the component model, 
        |       and specific information about each model variant 
        |       such as data types, quantization algo, EP, etc. that  
        |       is updated on add/remove of model variant 
        └── shared_weights/ (shared weights from all variants) 
            └── <checksum of weights file A>/ 
                └── model.data 
            └── <checksum of weights file B>/  
                └── model.data 
            └── ... 
        └── base model/    
            ├── model.onnx  
            ├── [GenAI config and data files]   
        └── variant A /  
            ├── optimized model.onnx (contains EPContext nodes)  
            ├── [GenAI config and data files]   
            └── [Compilation artifacts]  
        └── variant B /  
            ├── optimized model.onnx (contains EPContext nodes)  
            ├── [GenAI config and data files]   
            └── [Compilation artifacts]

Definitions:

Model Package
- A model package defines the overall logical ‘model’
- A model package contains one or more ‘component models’
Component Model
- A component model comprises one or more ‘model variants’
Model Variant
- A ‘model variant’ is typically a single ONNX format model, however we allow some flexibility here
  - An ORT GenAI ‘model variant’ is the collection of files required by ORT GenAI such as one or more ONNX models and related configuration files.

manifest.json and metadata.json

Read the spec here
A manifest.json may look like:

{ 
    "model_name":  <logical_model_name>,
    "component_models": { // optional, if missing, ORT will discover component models by looking for folders with metadata.json under model_package_root/models
        <model_name_1>: {
           …  // could be empty.
        },
    } 
}

or 

{ 
    "model_name":  <my_model_name>,
    "component_models": {
        <model_name_1>: {
            …
            "model_variants": {
                <variant_name_1>:  {
                 "file": <ep_context_model_1 onnx file>,
                 "constraints": {
                     "ep": <ep_name>,
                     "device": <device_type>,
                     "architecture": <hardware_architecture>
                 }
             }
          }
        }
    }  
}

A metadata.json for a component model may look like:

{ 
    "component_model_name":  <my_model_name>,
    "model_variants": {
         <variant_name_1>:  {
             "file": <ep_context_model_1 onnx file>,
             "constraints": {
                 "ep": <ep_name>,
                 "device": <device_type>,
                 "architecture": <hardware_architecture>
             }
         },
         <variant_name_2>:  {
             "file": <ep_context_model_2 onnx file>,
             "constraints": {
                 "ep": <ep_name>,
                 "device": <device_type>,
                 "architecture": <hardware_architecture>
             }
         }   
    }
}

Model Selection

The selection logic is implemented in MatchesVariant(), which evaluates the following constraints:
(Note: A constraint refers to a value under the "constraints" field in either manifest.json or metadata.json.)

Check ep constraint
Check device constraint
- For some provider-bridge EPs, they may not implement OrtEpFactory::GetSupportedDevices, therefore ORT
  won't have the supported device information for those EPs. In that case, ORT will skip the device constraint validation for those EPs.
- If provider option contains key related to device type, then the value must match the device constraint if any.
Check ep_compatibility_info constraint
- ORT does not directly evaluate the architecture constraint. Instead, it relies on the ep_compatibility_info constraint, which may encode architecture information if needed.
- The ep_compatibility_info value is expected to match the EP compatibility string stored in the EPContext model metadata. (See OrtEp::GetCompiledModelCompatibilityInfo() for how this string is generated.)
- The EP implementation of EpFactory::ValidateCompiledModelCompatibilityInfo() is responsible for validating the compatibility string against the target device (i.e. OrtHardwareDevice) and returning the compatibility result.

Note

Check the unit test here to better understand how to use model package.

Code Change

This pull request introduces significant enhancements to the execution provider (EP) selection and management infrastructure in ONNX Runtime. The main focus is on supporting more sophisticated device selection and manifest-based model packaging, as well as refactoring provider selection logic for modularity and future extensibility.

Key changes include:

Introduction of model package context and manifest parsing to support selecting model components based on device and EP constraints.
Refactoring of the execution provider interface and related classes to support multiple devices per provider.
Modularization of EP/device selection, creation, and registration logic in the provider policy context.

The most important changes are:

Model Package Context and Manifest Support

Added new files model_package_context.h and model_package_context.cc to implement manifest parsing, device/EP constraint matching, and component selection logic for model packages. This enables ONNX Runtime to select the most appropriate model variant based on available hardware and EP configuration. [1] [2]

Execution Provider Interface Enhancements

Updated the IExecutionProvider class to support construction with a list of OrtEpDevice pointers, and added a GetEpDevices() method to retrieve the supported devices. This allows plugin and bridge EPs to expose multiple devices. [1] [2]
Updated plugin EP construction to pass the list of supported devices to the base class.

Provider Policy Context Refactoring

Refactored provider policy context logic to modularize device ordering, device selection, telemetry logging, EP creation, and registration. This includes splitting the monolithic SelectEpsForSession into smaller methods: OrderDevices, SelectEpDevices, LogTelemetry, CreateExecutionProviders, RegisterExecutionProviders, and a new flow for model package-based EP selection. [1] [2] [3] [4]

These changes collectively lay the groundwork for more flexible, robust, and extensible device and EP selection in ONNX Runtime, especially in scenarios involving packaged models with multiple variants and complex hardware environments.

Motivation and Context

…used by model package workflow

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/session/utils.cc

skottmckay · 2026-03-23T04:16:38Z

Top-level manifest.json should define the overall inputs/outputs so a user of the package knows what it does. They shouldn't have to trawl through the information to find the first and last things that will be run to infer this info.

skottmckay · 2026-03-23T04:19:26Z

onnxruntime/test/autoep/test_model_package.cc

+      {
+        "variant_name": "variant_1",
+        "file": "mul_1.onnx",
+        "constraints": {
+          "ep": "example_ep",
+          "device": "cpu",
+          "architecture": "arch1"
+        }
+      },


This seems like lower level per-variant info I would have expected to be in the component model's metadata.json not the top level manifest.

I think where to store this lower-level, per-variant metadata is still open for discussion. It can either reside in the top-level manifest or within each component model’s metadata.json.

Option 1: Top-level manifest:

Pros:

Provides a single source of truth for the entire model package.

Simplifies parsing for ORT, as it only needs to read one manifest file to obtain a complete view.

Cons:

The manifest may become overly detailed, containing extensive information about each precompiled variant.

Requires synchronization with the component model directories.
If component model directories are added or removed, the manifest must be updated accordingly

Option 2: Component model's metadata.json:

Pros:

Better modularity and separation of concerns.

Changes to component model directories typically do not require updates to the top-level manifest.

Cons:

ORT must scan and parse all component model directories to collect per-variant metadata.

May introduce additional runtime overhead during model package loading.

@devang-ml

Either way the same amount of info needs to be parsed. Should be trivial to scan the component model directories as the metadata should be in the top-level directory for each component model.

I'm wary about the top-level manifest getting overwhelmingly large, especially if we have multiple component models all with multiple variants. Harder to see issues/inconsistencies and easier to get mega-config files wrong. But maybe humans will never read this stuff and it doesn't matter anymore.

Option 2 slightly simplifies add/remove variant from package as you should only need to update the component model metadata file when doing so. For Foundry Local we will need to add/remove frequently as we will have to publish per-variant packages due to catalog being immutable and to keep the versioning specific to the variant, and merge those on-device post-download. e.g. user downloads TRT-RTX variant and CPU variant separately as they are different entries in the catalog.

I don't quite understand how Option 2 adds runtime overhead. I would have expected we have a general model package helper class. Create it by pointing it at the package directory and it parses everything into in-memory info at that point and checks the package is valid. Via that class instance I should be able to easily get things like the ordered list of component models, the available variants for each component model and things like the EP they require, and a way to get the directory of a variant to load it.

While we are still discussing internally, i made the ModelPackageManifestParser::ParseManifestbe able to parse manifest.json and metadata.json for all component models as well as their associated model variants.

If a variant appears in both, it choses metadata.json as the source of truth, but falls back to manifest.json
if metadata.json is missing required fields.

skottmckay · 2026-03-23T08:20:08Z

Would be good to add some details about the 'how' things are done as the PR description says what has changed but doesn't cover things like how selection is being implemented.

skottmckay · 2026-03-23T08:27:52Z

onnxruntime/core/session/provider_policy_context.h

+
+  Status RegisterExecutionProviders(InferenceSession& sess,
+                                    std::vector<std::unique_ptr<IExecutionProvider>>& providers);
+


Does anything call these?

good catch, that's unnecessary anymore and will remove it

skottmckay · 2026-03-23T08:29:39Z

onnxruntime/core/session/model_package_context.h

+};
+
+class ModelPackageManifestParser {
+ public:


Would be good to keep the model package handling (parsing manifest and metadata files, iterating directories etc. to create a user-friendly in-memory representation) standalone as it will be needed in other places like Foundry Local

skottmckay · 2026-03-24T05:54:29Z

onnxruntime/core/session/utils.cc

+    // Parse manifest and gather components.
+    ModelPackageManifestParser parser(logging::LoggingManager::DefaultLogger());
+    std::vector<EpContextVariantInfo> components;
+    ORT_API_RETURN_IF_STATUS_NOT_OK(parser.ParseManifest(package_root, components));


nit: would suggest having a ModelPackage class that owns all this info instead of doing things piecemeal in other places. Construct the ModelPackage from the package_root. It parses and validates the info and provides getters for the code to read that. That way all the parsing and processing of the model package is in one class so if there are any issues there's one place to fix them.

This also feels like it's missing a layer. The package has one or more component models (if multiple there's a specific order they're executed in). A component model has one or more variants. But this code is reading a collection of EpContextVariantInfo so the required grouping by component model seems to be lost.

A ModelPackage class that owns all the info is a good suggestion and will change the code here.
Also, i did miss the "component model" layer and will add it back.

skottmckay · 2026-03-24T05:58:31Z

onnxruntime/core/session/model_package_context.h

+  Status SelectComponent(gsl::span<EpContextVariantInfo> components,
+                         gsl::span<SelectionEpInfo> ep_infos,
+                         std::optional<std::filesystem::path>& selected_component_path) const;


nit: A 'context' owning the selection logic feels slightly off. Maybe that's just a naming thing as this seems more like it's implementing a selection policy for a model variant (which != component model).

i was mixing the use of component model and model variant.
You are right, here is implementing a selection policy for a model variant. I will change the naming.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/session/model_package_descriptor_parser.cc

onnxruntime/core/session/model_package/model_package_descriptor_parser.cc

@@ -0,0 +1,380 @@
+// Copyright (c) Microsoft Corporation.


chilo-ms added 9 commits March 17, 2026 11:40

separate get model metadata from auto ep selection function

7d27bc2

break down to more functions for provider policy context that can be …

e283312

…used by model package workflow

update

3cd5424

add basic logic for component selection

e5e2796

integrate component selection into the ORT workflow

b8b2c78

add GetOrtEpDevices in IExecutionProvider

72e3b00

add a simple unit test

c1a88c5

add scoring function is there are multiple components selected

a8f985d

refactor

54f612f

github-actions bot reviewed Mar 20, 2026

View reviewed changes

onnxruntime/core/session/utils.cc Outdated Show resolved Hide resolved

update and test

cbf062c

skottmckay reviewed Mar 23, 2026

View reviewed changes

chilo-ms added 3 commits March 23, 2026 12:58

address reviewer's comment

d4710aa

add ep compatibility to component selection logic

a42d22a

revert some changes

728e15b

skottmckay reviewed Mar 24, 2026

View reviewed changes

chilo-ms added 7 commits March 24, 2026 17:25

address reviewer's comments

3005fc8

address reviewer comment

3e3bbb4

small update

f7e47d6

add comments

f2765df

add more details into model selection logic

fdcfa3a

fix dangling pointer issue

45d4178

make model package parser be more smart for many cases

e27e5a4

github-actions bot reviewed Mar 26, 2026

View reviewed changes

onnxruntime/core/session/model_package_descriptor_parser.cc Outdated Show resolved Hide resolved

onnxruntime/core/session/model_package_descriptor_parser.cc Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Mar 26, 2026

View reviewed changes

onnxruntime/core/session/model_package/model_package_descriptor_parser.cc

@@ -0,0 +1,380 @@

// Copyright (c) Microsoft Corporation.

Check warning

Code scanning / lintrunner

CLANGFORMAT/format Warning

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.

move code under model_package folder

2161423

update comment

ace5147

chilo-ms marked this pull request as ready for review March 27, 2026 17:32

chilo-ms added 3 commits March 27, 2026 12:07

add a README for model package

07e5562

update field name in menifest and metadata.json

d6ad996

update README

1de7011


		Status RegisterExecutionProviders(InferenceSession& sess,
		std::vector<std::unique_ptr<IExecutionProvider>>& providers);

Conversation

chilo-ms commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Definitions:

manifest.json and metadata.json

Model Selection

Note

Code Change

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skottmckay commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skottmckay commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Check warning

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chilo-ms commented Mar 20, 2026 •

edited

Loading