Skip to content

Fix run-level profiling for subgraph operators#27870

Open
adrianlizarraga wants to merge 7 commits intomainfrom
adrianl/RunProfiler_SubgraphBugFix
Open

Fix run-level profiling for subgraph operators#27870
adrianlizarraga wants to merge 7 commits intomainfrom
adrianl/RunProfiler_SubgraphBugFix

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

@adrianlizarraga adrianlizarraga commented Mar 26, 2026

Description

Run-level profiling (introduced in PR #26846) does not currently capture profiling events for operators inside subgraphs. This PR fixes that by threading the run_profiler pointer through OpKernelContextInternal to subgraph execution, following the same pattern as terminate_flag.

Root Cause

utils::ExecuteSubgraph() had no run_profiler parameter and always passed nullptr to ExecuteGraphImpl, so nested operators (inside If, Loop, Scan, BeamSearch, GreedySearch) were never profiled at the run level.

Fix

  1. OpKernelContextInternal — Added run_profiler_ member and GetRunProfiler() accessor.
  2. SessionScope / ExecuteKernel() — Pass the run profiler into OpKernelContextInternal.
  3. ExecuteSubgraph() — Added profiling::Profiler* run_profiler = nullptr parameter, forwarded to ExecuteGraphImpl().
  4. Control flow ops (if.cc, loop.cc, scan_utils.cc) — Pass context_.GetRunProfiler() to ExecuteSubgraph().
  5. Contrib transformer ops (beam_search_impl_gpt.h, beam_search_impl_t5.h, beam_search_impl_whisper.h, greedy_search_impl_gpt.h) — All 8 ExecuteSubgraph() call sites updated to pass this->context_.GetRunProfiler().

Plugin EP control flow kernels (PluginEpIfKernelImpl, etc.) delegate to the same internal kernels, so the fix propagates automatically.

Tests

  • CheckRunProfilerWithSubgraph (inference_session_test.cc) — Runs if_mul.onnx, enables run profiling, asserts mul_0 (inside If's then-branch) appears in the profile JSON.
  • CheckRunProfilerWithBeamSearch (beam_search_test.cc) — Runs tiny_gpt2_beamsearch.onnx, enables run profiling, asserts decoder subgraph Node entries (beyond the top-level BeamSearch op) appear in the profile JSON.

Files Changed (12 files)

File Change
core/framework/op_kernel_context_internal.h Added run_profiler_ member, GetRunProfiler(), constructor param
core/framework/sequential_executor.cc SessionScope::GetRunProfiler(), pass to OpKernelContextInternal
core/framework/utils.h / utils.cc run_profiler param on ExecuteSubgraph()
core/providers/cpu/controlflow/if.cc Forward GetRunProfiler()
core/providers/cpu/controlflow/loop.cc Forward GetRunProfiler()
core/providers/cpu/controlflow/scan_utils.cc Forward GetRunProfiler()
contrib_ops/cpu/transformers/beam_search_impl_gpt.h 2 call sites
contrib_ops/cpu/transformers/beam_search_impl_t5.h 2 call sites
contrib_ops/cpu/transformers/beam_search_impl_whisper.h 2 call sites
contrib_ops/cpu/transformers/greedy_search_impl_gpt.h 2 call sites
test/framework/inference_session_test.cc CheckRunProfilerWithSubgraph test
test/contrib_ops/beam_search_test.cc CheckRunProfilerWithBeamSearch test

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes run-level profiling so it also records operator events that occur inside subgraphs executed by control-flow operators (e.g., If/Loop/Scan bodies), by propagating the per-run profiler pointer through subgraph execution and kernel contexts.

Changes:

  • Thread profiling::Profiler* run_profiler through utils::ExecuteSubgraph into ExecuteGraphImpl so subgraph execution can record run-level events.
  • Plumb the run profiler into OpKernelContextInternal (from SessionScope) and expose it via GetRunProfiler().
  • Update CPU control-flow kernels (If, Loop, Scan) to pass context.GetRunProfiler() when executing subgraphs; add a regression test that validates profiling includes a nested subgraph op.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
onnxruntime/test/framework/inference_session_test.cc Adds a regression test that run-level profiling includes events for ops inside an If subgraph, and parses the generated JSON profile.
onnxruntime/core/providers/cpu/controlflow/scan_utils.cc Passes the run profiler through to subgraph execution in Scan iteration.
onnxruntime/core/providers/cpu/controlflow/loop.cc Passes the run profiler through to subgraph execution in Loop iterations.
onnxruntime/core/providers/cpu/controlflow/if.cc Passes the run profiler through to subgraph execution for the selected If branch.
onnxruntime/core/framework/utils.h Extends ExecuteSubgraph API to accept an optional run_profiler pointer.
onnxruntime/core/framework/utils.cc Forwards run_profiler from ExecuteSubgraph into ExecuteGraphImpl.
onnxruntime/core/framework/sequential_executor.cc Exposes the run profiler from SessionScope and passes it into OpKernelContextInternal during kernel execution.
onnxruntime/core/framework/op_kernel_context_internal.h Stores the run profiler pointer in the kernel context and adds a GetRunProfiler() accessor for kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xiaofeihan1
Copy link
Copy Markdown
Contributor

Thanks @adrianlizarraga for looking into this. I have two questions

  1. How do you find this issue?
  2. I found there are some other codes to use ExecuteSubgraph. Do we need to handle that?
image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adrianlizarraga adrianlizarraga marked this pull request as ready for review March 27, 2026 18:12
@adrianlizarraga
Copy link
Copy Markdown
Contributor Author

Hi @xiaofeihan1, Thanks for taking a look.

How do you find this issue?

I found it while working on a separate PR that ports the C++ EP profiling interface to the public C API for plugin EPs. I noticed that the run profilers did not profile nodes within subgraphs like Loop or If.

I found there are some other codes to use ExecuteSubgraph. Do we need to handle that?

Yep, thanks for catching those. I have now updated the PR to capture all call sites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants