Fix run-level profiling for subgraph operators#27870
Fix run-level profiling for subgraph operators#27870adrianlizarraga wants to merge 7 commits intomainfrom
Conversation
…kernels can pass it to their subgraphs
…se the public C++ API to load and run the test model.
There was a problem hiding this comment.
Pull request overview
Fixes run-level profiling so it also records operator events that occur inside subgraphs executed by control-flow operators (e.g., If/Loop/Scan bodies), by propagating the per-run profiler pointer through subgraph execution and kernel contexts.
Changes:
- Thread
profiling::Profiler* run_profilerthroughutils::ExecuteSubgraphintoExecuteGraphImplso subgraph execution can record run-level events. - Plumb the run profiler into
OpKernelContextInternal(fromSessionScope) and expose it viaGetRunProfiler(). - Update CPU control-flow kernels (
If,Loop,Scan) to passcontext.GetRunProfiler()when executing subgraphs; add a regression test that validates profiling includes a nested subgraph op.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/framework/inference_session_test.cc | Adds a regression test that run-level profiling includes events for ops inside an If subgraph, and parses the generated JSON profile. |
| onnxruntime/core/providers/cpu/controlflow/scan_utils.cc | Passes the run profiler through to subgraph execution in Scan iteration. |
| onnxruntime/core/providers/cpu/controlflow/loop.cc | Passes the run profiler through to subgraph execution in Loop iterations. |
| onnxruntime/core/providers/cpu/controlflow/if.cc | Passes the run profiler through to subgraph execution for the selected If branch. |
| onnxruntime/core/framework/utils.h | Extends ExecuteSubgraph API to accept an optional run_profiler pointer. |
| onnxruntime/core/framework/utils.cc | Forwards run_profiler from ExecuteSubgraph into ExecuteGraphImpl. |
| onnxruntime/core/framework/sequential_executor.cc | Exposes the run profiler from SessionScope and passes it into OpKernelContextInternal during kernel execution. |
| onnxruntime/core/framework/op_kernel_context_internal.h | Stores the run profiler pointer in the kernel context and adds a GetRunProfiler() accessor for kernels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks @adrianlizarraga for looking into this. I have two questions
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @xiaofeihan1, Thanks for taking a look.
I found it while working on a separate PR that ports the C++ EP profiling interface to the public C API for plugin EPs. I noticed that the run profilers did not profile nodes within subgraphs like
Yep, thanks for catching those. I have now updated the PR to capture all call sites. |

Description
Run-level profiling (introduced in PR #26846) does not currently capture profiling events for operators inside subgraphs. This PR fixes that by threading the
run_profilerpointer throughOpKernelContextInternalto subgraph execution, following the same pattern asterminate_flag.Root Cause
utils::ExecuteSubgraph()had norun_profilerparameter and always passednullptrtoExecuteGraphImpl, so nested operators (inside If, Loop, Scan, BeamSearch, GreedySearch) were never profiled at the run level.Fix
OpKernelContextInternal— Addedrun_profiler_member andGetRunProfiler()accessor.SessionScope/ExecuteKernel()— Pass the run profiler intoOpKernelContextInternal.ExecuteSubgraph()— Addedprofiling::Profiler* run_profiler = nullptrparameter, forwarded toExecuteGraphImpl().if.cc,loop.cc,scan_utils.cc) — Passcontext_.GetRunProfiler()toExecuteSubgraph().beam_search_impl_gpt.h,beam_search_impl_t5.h,beam_search_impl_whisper.h,greedy_search_impl_gpt.h) — All 8ExecuteSubgraph()call sites updated to passthis->context_.GetRunProfiler().Plugin EP control flow kernels (
PluginEpIfKernelImpl, etc.) delegate to the same internal kernels, so the fix propagates automatically.Tests
CheckRunProfilerWithSubgraph(inference_session_test.cc) — Runsif_mul.onnx, enables run profiling, assertsmul_0(inside If's then-branch) appears in the profile JSON.CheckRunProfilerWithBeamSearch(beam_search_test.cc) — Runstiny_gpt2_beamsearch.onnx, enables run profiling, asserts decoder subgraph Node entries (beyond the top-level BeamSearch op) appear in the profile JSON.Files Changed (12 files)
core/framework/op_kernel_context_internal.hrun_profiler_member,GetRunProfiler(), constructor paramcore/framework/sequential_executor.ccSessionScope::GetRunProfiler(), pass toOpKernelContextInternalcore/framework/utils.h/utils.ccrun_profilerparam onExecuteSubgraph()core/providers/cpu/controlflow/if.ccGetRunProfiler()core/providers/cpu/controlflow/loop.ccGetRunProfiler()core/providers/cpu/controlflow/scan_utils.ccGetRunProfiler()contrib_ops/cpu/transformers/beam_search_impl_gpt.hcontrib_ops/cpu/transformers/beam_search_impl_t5.hcontrib_ops/cpu/transformers/beam_search_impl_whisper.hcontrib_ops/cpu/transformers/greedy_search_impl_gpt.htest/framework/inference_session_test.ccCheckRunProfilerWithSubgraphtesttest/contrib_ops/beam_search_test.ccCheckRunProfilerWithBeamSearchtest