Annotation based partitioning along with resource accounting by yuslepukhin · Pull Request #27595 · microsoft/onnxruntime

yuslepukhin · 2026-03-09T18:43:27Z

This pull request introduces support for node "layering annotations" and improves resource accounting and memory management during graph partitioning in ONNX Runtime. The changes add new mechanisms for annotating nodes, filtering nodes by annotation during partitioning, and efficiently accounting for resources in fused nodes. Several APIs are extended to support these features, and new configuration options are introduced to guide layer assignment.

Layering annotations & partitioning:

Added layering_annotation_ member and associated getter/setter/clear methods to the Node class, allowing nodes to be annotated for layer assignment. Also added a method to clear these annotations after partitioning to save memory. (include/onnxruntime/core/graph/graph.h) [1] [2] [3]
Extended the graph partitioning logic to support filtering nodes by their layering annotation using a LayeringIndex, ensuring only nodes matching the current execution provider's assignment are considered during partitioning. (onnxruntime/core/framework/graph_partitioner.cc) [1] [2] [3] [4] [5] [6]
Added a new session option kOrtSessionOptionsLayerAssignmentSettings to configure layer assignment using annotation prefixes per device. (include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h)

Resource accounting improvements:

Improved the IResourceAccountant interface to allow resetting and committing pending weights per node, and updated resource accounting logic to correctly sum and commit costs for all constituent nodes in fused nodes, preventing double-counting or undercounting. (include/onnxruntime/core/framework/resource_accountant.h, include/onnxruntime/core/graph/indexed_sub_graph.h, onnxruntime/core/framework/graph_partitioner.cc) [1] [2] [3]

API and code organization:

Updated the Graph class and related APIs to propagate layering annotations during function inlining and to provide a method for removing all layering annotations after partitioning. (include/onnxruntime/core/graph/graph.h) [1] [2]
Moved the CreateAccountants function out of the NodeStatsRecorder class to the namespace level for clarity. (include/onnxruntime/core/framework/resource_accountant.h)

These changes enable more flexible and memory-efficient graph partitioning, particularly for scenarios involving hardware-specific layer assignments and dynamic resource constraints.

Add session options string and parsing code along with the unit test Introduce layering configuration Refine LayeringRuleMatcher and add tests Add OrtEpDevice matching logic and tests Change the Matcher interface to match one rule against pontentially many devices Add matching again tranditional EPs Create LayeringIndex Add LayeringIndex and tests Adjust config parsing to detect errrors Adjust Create sig Implement WeightsSizeBasedAccountant

Duplicate layering annotations for AddNode in L1 transformers.

since the the Update after layout trnasformation may rely on them. Also [ RUN ] AttentionTest.Attention3DDefault GPU Compute Capability: SM 6.1 (value: 610) Assertion failed: data.IsUnfused(), file D:\dev\ort_trans\onnxruntime\contrib_ops\cuda\bert\attention_prepare_qkv.cu, line 318 This may be related to uninitialized memory.

and add SessionState partitioning test for layered execution. Add layering configuration file for tiny_gpt2_beamsearch and a script to annotate the model by layers.

instance based on a set of nodes. This is used by the graph partitioner to create a filtered graph viewer. Adjust implementation of the Graph_GetViewer.

Add a no-threashold and no-stat option for the accountant.

Copilot

Pull request overview

This PR adds layering annotations to ONNX Runtime graphs and uses them to guide graph partitioning across execution providers, alongside enhancements to resource accounting (including an initializer-based fallback when pre-recorded stats aren’t provided).

Changes:

Introduces node-level layering annotations (loaded from NodeProto metadata "layer_ann") and a LayeringIndex to map annotations/rules to EP assignments during partitioning.
Extends the graph partitioner APIs to accept an optional LayeringIndex and filters EP capability queries accordingly, with logic to “unassign” nodes not claimed.
Improves resource accounting to support threshold updates and an initializer-based counting fallback; adds tests and a Python tool for annotating models.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
onnxruntime/test/testdata/layering/tiny_gpt2_beamsearch_layering.txt	Adds test data for layering/annotation scenarios.
onnxruntime/test/framework/tensorutils_test.cc	Adds unit tests for extracting `layer_ann` from NodeProto metadata.
onnxruntime/test/framework/session_state_test.cc	Updates partitioning test helper to pass `LayeringIndex`; adds layering partitioning test.
onnxruntime/test/framework/layering_annotations_test.cc	Adds comprehensive unit tests for rule parsing/matching and LayeringIndex behavior.
onnxruntime/python/tools/layering/layer_annotate.py	Adds a Python tool to apply `layer_ann` metadata to ONNX nodes (recurses into subgraphs).
onnxruntime/core/session/onnxruntime_c_api.cc	Refactors Graph_GetGraphView subgraph IO detection and node handling.
onnxruntime/core/session/inference_session.cc	Builds and passes `LayeringIndex` from session options; clears annotations post-partitioning to save memory.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc	Improves threshold handling/logging for resource-aware CUDA capability selection.
onnxruntime/core/optimizer/utils.h	Declares `DuplicateNodeAnnotation` helper for propagating annotations in transforms/fusions.
onnxruntime/core/optimizer/utils.cc	Implements `DuplicateNodeAnnotation`.
onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc	Exposes layering annotation get/set in optimizer API; copies annotation when copying nodes.
onnxruntime/core/optimizer/transpose_optimization/optimizer_api.h	Extends NodeRef API with layering annotation get/set.
onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc	Propagates annotations to newly created nodes during transpose optimization rewrites.
onnxruntime/core/optimizer/reshape_fusion.cc	Copies annotation onto fused reshape node.
onnxruntime/core/optimizer/qdq_transformer/where_dummy_dq.cc	Copies annotation to inserted dummy DQ node.
onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc	Copies annotation to inserted Q/DQ and helper nodes.
onnxruntime/core/optimizer/qdq_transformer/qdq_propagation.cc	Copies annotation to inserted Q/DQ nodes.
onnxruntime/core/optimizer/qdq_transformer/ensure_unique_dq_for_node_unit.cc	Copies annotation when duplicating DQ nodes.
onnxruntime/core/optimizer/matmul_add_fusion.cc	Copies annotation to inserted reshape/gemm fusion nodes.
onnxruntime/core/optimizer/embed_layer_norm_fusion.cc	Copies annotation to inserted Cast and EmbedLayerNorm fusion node.
onnxruntime/core/graph/graph_utils.h	Adds `CreateFilteredIndexedGraph` helper for building filtered GraphViewer inputs/outputs.
onnxruntime/core/graph/graph_utils.cc	Implements `CreateFilteredIndexedGraph`.
onnxruntime/core/graph/graph.cc	Adds `Graph::RemoveAllLayeringAnnotations` and loads node annotations from NodeProto metadata.
onnxruntime/core/framework/tensorprotoutils.h	Adds `kNodeProtoLayerAnnotation` constant and annotation extraction helper declaration.
onnxruntime/core/framework/tensorprotoutils.cc	Implements `GetNodeProtoLayeringAnnotation`.
onnxruntime/core/framework/resource_accountant.cc	Refactors accountant creation; adds initializer-based fallback resource counting.
onnxruntime/core/framework/layering_annotations.h	Adds layering rule parsing/matching and `LayeringIndex` API.
onnxruntime/core/framework/layering_annotations.cc	Implements rule parsing, EP matching heuristics, graph indexing, and update/unassign logic.
onnxruntime/core/framework/graph_partitioner.h	Extends `GraphPartitioner::Partition` signature to accept `LayeringIndex*`.
onnxruntime/core/framework/graph_partitioner.cc	Integrates layering-aware filtering into EP capability queries and assignment reset.
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h	Documents new session config `session.layer_assignment_settings` and updates resource partitioning docs.
include/onnxruntime/core/graph/graph.h	Adds Node layering annotation storage/accessors and `Graph::RemoveAllLayeringAnnotations` declaration.
include/onnxruntime/core/framework/resource_accountant.h	Adds `SetThreshold`, makes `ComputeResourceCount` non-const, and moves `CreateAccountants` to a free function.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/session/onnxruntime_c_api.cc

onnxruntime/core/framework/layering_annotations.cc

onnxruntime/core/graph/graph.cc

onnxruntime/core/framework/graph_partitioner.cc

onnxruntime/core/framework/layering_annotations.h

onnxruntime/core/graph/graph_utils.h

onnxruntime/test/framework/session_state_test.cc

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/onnxruntime/core/graph/graph.h

onnxruntime/core/framework/graph_partitioner.cc

onnxruntime/core/framework/tensorprotoutils.cc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/framework/resource_accountant.cc

onnxruntime/core/graph/graph_utils.cc

onnxruntime/core/session/onnxruntime_c_api.cc

onnxruntime/test/framework/session_state_test.cc

onnxruntime/core/providers/cuda/cuda_execution_provider.cc

Adjust warning Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Adjust ordering Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/test/framework/session_state_test.cc

onnxruntime/python/tools/layering/layer_annotate.py

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/test/framework/session_state_test.cc

onnxruntime/core/framework/layering_annotations.h

tianleiwu

Incomplete optimizer coverage: While many optimizers are updated, the codebase has dozens of optimizers under onnxruntime/core/optimizer/. A grep for AddNode or graph.AddNode patterns not covered by this PR would be prudent to ensure no optimizer is creating nodes without propagating annotations. Missing even one optimizer would cause annotation loss for affected nodes, leading to incorrect partitioning in layered mode.

…new AddNode(), fix sesstion state test

yuslepukhin added 17 commits January 28, 2026 11:30

Wire annotations to partitioning interface.

1eeba1a

Duplicate layering annotations for AddNode in L1 transformers.

Fix up annotations with Transpose Optimizer

debc8dd

Add ORT_EXTENDED_MINIMAL build

8f0ff86

Move rules and matcher inside the index

f7a422e

Add Update with tests

1ef4078

Clear annotations after partitioning

1626d5c

Merge branch 'main' into yuslepukhin/layering

f33bab1

Address accountant bug

b3ecb39

Annotate tiny_gpt2_beamsearch by layers

b5ea1c6

and add SessionState partitioning test for layered execution. Add layering configuration file for tiny_gpt2_beamsearch and a script to annotate the model by layers.

Refactor Graph_GetGraphView to make it a utility

9a422ba

Introduce a graph utility to create an IndexedSubgraph

a1caf93

instance based on a set of nodes. This is used by the graph partitioner to create a filtered graph viewer. Adjust implementation of the Graph_GetViewer.

Merge branch 'main' into yuslepukhin/layering

e1b1c4f

Fix lint in python script.

acec402

Add a no-threashold and no-stat option for the accountant.

Merge branch 'main' into yuslepukhin/layering

31dd7a8

Merge branch 'main' into yuslepukhin/layering

9fa4849

yuslepukhin requested a review from Copilot March 9, 2026 18:43

Copilot started reviewing on behalf of yuslepukhin March 9, 2026 18:44 View session

Merge branch 'main' into yuslepukhin/layering

50c58c9

Copilot AI reviewed Mar 9, 2026

View reviewed changes

yuslepukhin added 9 commits March 9, 2026 16:29

Fix build errors and address Copilt comments

e445b60

Reject duplicate rules

358f7df

Move methods to .cc

653fb8b

Remove code duplication

23a8ecf

Add missing include

ef1227e

Fix matching bug

b0b2396

Change index parsing

b9e13cf

Remove wrong comment

add0227

Address minimal build issues

17e3525

Add copyright header

09967c3

yuslepukhin requested a review from Copilot March 27, 2026 00:18

Copilot started reviewing on behalf of yuslepukhin March 27, 2026 00:20 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

yuslepukhin and others added 5 commits March 27, 2026 10:20

Update onnxruntime/core/framework/graph_partitioner.cc

d23ee08

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Adjust doc and implementaton for fetching layering ann

9e8be7a

Make GetContainingGraph public

f636138

Adjust accounting for fused node and remove stray local var

fcda524

Address flaky test

2da1394

yuslepukhin requested a review from Copilot March 27, 2026 18:14

yuslepukhin mentioned this pull request Mar 27, 2026

Fix flaky ResourceAwarePartitioning tests by generating node stats dynamically #27877

Closed

Copilot started reviewing on behalf of yuslepukhin March 27, 2026 18:20 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

yuslepukhin and others added 4 commits March 27, 2026 11:32

Update onnxruntime/core/providers/cuda/cuda_execution_provider.cc

c0c5e51

Adjust warning Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update onnxruntime/core/graph/graph_utils.cc

c55bb6e

Adjust ordering Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Address review issues

cdd9faa

Fix potential perf issue

67a947b

yuslepukhin requested a review from Copilot March 27, 2026 21:13

Copilot started reviewing on behalf of yuslepukhin March 27, 2026 21:18 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

onnxruntime/test/framework/session_state_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/framework/session_state_test.cc Show resolved Hide resolved

onnxruntime/python/tools/layering/layer_annotate.py Show resolved Hide resolved

Address review comments

44c6904

yuslepukhin requested a review from Copilot March 27, 2026 21:37

Copilot started reviewing on behalf of yuslepukhin March 27, 2026 21:41 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

onnxruntime/test/framework/session_state_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/framework/session_state_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/framework/session_state_test.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Mar 27, 2026

View reviewed changes

onnxruntime/core/framework/layering_annotations.h Show resolved Hide resolved

tianleiwu reviewed Mar 27, 2026

View reviewed changes

Add documentation for ann and ep propagation. Fix L1 optimizers, add …

927a0ef

…new AddNode(), fix sesstion state test

tianleiwu approved these changes Mar 30, 2026

View reviewed changes

tianleiwu merged commit f4bdbb8 into main Mar 30, 2026
104 of 110 checks passed

tianleiwu deleted the yuslepukhin/layering branch March 30, 2026 08:19

Conversation

yuslepukhin commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuslepukhin commented Mar 9, 2026 •

edited

Loading