Plugin EP event profiling APIs#27649
Conversation
There was a problem hiding this comment.
Pull request overview
Adds draft profiling support for plugin Execution Providers (EPs) by extending the EP C API with a profiler interface and an event-ingestion mechanism, and wiring this into the plugin EP host/provider implementation plus an example/test EP.
Changes:
- Introduces
OrtEpProfilerImpl/OrtEpProfilingEventAPIs and extendsOrtEp/OrtEpApifor plugin EP profiling. - Implements host-side wrapping (
profiling::EpProfiler) for plugin-provided profilers and adds an ORT-side events container (OrtEpProfilingEventsContainer) with anAddEventsAPI. - Updates the example kernel-registry plugin EP to emit profiling events and adds an autoep test validating the profile output.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/autoep/test_execution.cc | Adds a new test that enables profiling and checks plugin EP events appear in the profile output. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep.h | Adds GetProfiler hook to the example plugin EP. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep.cc | Implements an example OrtEpProfilerImpl that reports events via the new container API. |
| onnxruntime/core/session/plugin_ep/ep_profiling_events_container.h | Adds the concrete definition for the previously-opaque profiling events container. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h | Adds GetProfiler() override for plugin EP provider. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc | Wraps plugin OrtEpProfilerImpl into profiling::EpProfiler and integrates with ORT profiling. |
| onnxruntime/core/session/plugin_ep/ep_api.h | Adds EpProfilingEventsContainer_AddEvents to the plugin EP API surface. |
| onnxruntime/core/session/plugin_ep/ep_api.cc | Implements EpProfilingEventsContainer_AddEvents and appends it to the versioned OrtEpApi table. |
| include/onnxruntime/core/session/onnxruntime_ep_c_api.h | Defines public C API types for profiling events and the plugin EP profiler interface; adds OrtEp::GetProfiler and OrtEpApi::EpProfilingEventsContainer_AddEvents. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR drafts profiling support for plugin Execution Providers (EPs) by adding a public EP profiling interface (C API + C++ wrapper) and wiring it into the plugin EP execution provider path, along with an example plugin implementation and a validation test.
Changes:
- Introduces public plugin EP profiling types/callbacks (
OrtEpProfilerImpl) and EP-side event reporting helpers (CreateEpProfilingEvent,EpProfilingEventsContainer_AddEvents). - Adds ORT-side wrapper (
PluginEpProfiler) to invoke plugin profiler callbacks and merge EP events into ORT’s profiling timeline. - Implements and tests profiling in the example plugin EP kernel registry, validating that EP kernel events show up in the profiling output.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/autoep/test_execution.cc | Adds a unit test that enables profiling and checks the output contains plugin EP events. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/kernels/binary_op.cc | Records per-kernel timing events and reports them through the example EP profiling manager. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep_profiling.h | Declares example EP profiler implementation and the singleton event manager used by kernels. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep_profiling.cc | Implements event tracking, correlation stack handling, and event conversion into ORT profiling events. |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep.h | Extends the example EP to expose profiler state (active profiler client id). |
| onnxruntime/test/autoep/library/example_plugin_ep_kernel_registry/ep.cc | Implements OrtEp::GetProfiler for the example EP. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h | Extends PluginExecutionProvider to provide an EP profiler instance. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc | Wires GetProfiler() for plugin EPs and refactors logger fallback handling. |
| onnxruntime/core/session/plugin_ep/ep_event_profiling.h | Defines ORT-side opaque EP profiling event/container and PluginEpProfiler wrapper interface. |
| onnxruntime/core/session/plugin_ep/ep_event_profiling.cc | Implements event merging and the PluginEpProfiler wrapper calling plugin callbacks. |
| onnxruntime/core/session/plugin_ep/ep_api.h | Exposes new EP API entrypoints for creating/releasing EP profiling events and adding them to a container. |
| onnxruntime/core/session/plugin_ep/ep_api.cc | Implements the new EP profiling event/container API functions and appends them to the EP API table. |
| onnxruntime/core/framework/error_code_helper.h | Adds ORT_API_RETURN_IF helper macro for concise C API argument validation. |
| onnxruntime/core/common/profiler.cc | Sorts ORT events by start timestamp before merging EP events. |
| include/onnxruntime/core/session/onnxruntime_ep_c_api.h | Adds public EP profiling C API declarations and documentation. |
| include/onnxruntime/core/common/profiler_common.h | Expands documentation for the C++ profiling::EpProfiler interface. |
| cmake/onnxruntime_unittests.cmake | Adds the example EP profiling sources to unit test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
…l::EndProfiling() function
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Please resolve merge conflicts.
Summary
This PR ports the C++ EpProfiler interface to a binary-stable C API for plugin EPs, with well-considered design improvements (epoch-independent offsets, absolute correlation IDs, append-only event containers). The architecture is clean with proper layering: C API → C++ wrappers → bridge layer. The code is overall high quality with thorough validation, ABI slot assertions, and comprehensive tests covering both session and run profiling modes. A few concerns around enum synchronization guarantees and a minor UB-by-the-letter in the C++ wrapper reinterpret_cast are noted below.
Review
1. C API Surface (onnxruntime_ep_c_api.h)
Positive:
- The
OrtEpProfilerImplstruct documentation is exceptionally thorough—it distinguishes session vs. run profiling concurrency semantics, threading guarantees for StartEvent/StopEvent, and the meaning of the absolute correlation ID. This level of documentation is rare and invaluable for EP authors. - The
OrtProfilingEventsContainerbeing append-only is a sound design choice that prevents one EP from corrupting another EP's events or ORT's own events. StartEvent/StopEventbeing optional (NULL-able) is good for EPs that don't need correlation tracking.
Concern:
⚠️ Enum synchronization is enforced only by comment: The C++EventCategoryenum inprofiler_common.hcarries a "keep in sync withOrtProfilingEventCategory" comment, but there is no compile-time enforcement. A static_assert verifying value equality would prevent silent drift:static_assert(static_cast<int>(OrtProfilingEventCategory_SESSION) == static_cast<int>(profiling::EventCategory::SESSION_EVENT)); static_assert(static_cast<int>(OrtProfilingEventCategory_NODE) == static_cast<int>(profiling::EventCategory::NODE_EVENT)); static_assert(static_cast<int>(OrtProfilingEventCategory_KERNEL) == static_cast<int>(profiling::EventCategory::KERNEL_EVENT)); static_assert(static_cast<int>(OrtProfilingEventCategory_API) == static_cast<int>(profiling::EventCategory::API_EVENT));
2. C++ Wrappers (onnxruntime_cxx_api.h, onnxruntime_cxx_inline.h)
Positive:
- The
ConstProfilingEvent/ProfilingEvent/UnownedProfilingEventsContainertrio follows the established ORT wrapper patterns (Const*, owning, unowned). Consistent with the rest of the API. - The
ProfilingEventconstructor overload acceptingstd::unordered_map<std::string, std::string>provides a natural C++ interface while the rawconst char**overload supports zero-copy scenarios.
Concern:
⚠️ reinterpret_castfor vector of wrappers is formally UB: TheAddEvents(const std::vector<ProfilingEvent>&)overload casts the vector data pointer toOrtProfilingEvent**:While thestatic_assert(sizeof(ProfilingEvent) == sizeof(OrtProfilingEvent*) && ...); const auto* event_ptrs = reinterpret_cast<const OrtProfilingEvent* const*>(events.data());
static_assertguards the layout assumption and this pattern is used elsewhere in ORT, it is technically a strict-aliasing violation under C++ rules. A safe alternative would be to build a temporary pointer array:This is a Suggestion rather than a blocker, as the existing pattern is well-established in ORT.std::vector<const OrtProfilingEvent*> ptrs(events.size()); for (size_t i = 0; i < events.size(); ++i) ptrs[i] = events[i]; return AddEvents(ptrs.data(), ptrs.size());
3. C API Implementation (ep_api.cc)
Positive:
- Thorough input validation in
CreateProfilingEvent: validates output pointer, event name, category range, arg array consistency (non-null keys/values whennum_args > 0), and per-element null checks. This is exemplary defensive coding for a C API boundary. ProfilingEventsContainer_AddEventsvalidates all events for null BEFORE modifying the container, ensuring atomicity on error.- ABI slot assertions are properly maintained (
static_assert(offsetof(...) == 57, ...)).
Concern:
⚠️ ORT_API_RETURN_IFmacro placement: The newORT_API_RETURN_IFmacro inerror_code_helper.his a useful utility, but it includescore/common/make_string.hat header level. This addsmake_string.has a transitive dependency for all consumers oferror_code_helper.h. Ifmake_string.his already widely included this is fine; otherwise consider a forward-declared helper or noting the include impact. (Nitpick)
4. Bridge Layer (ep_event_profiling.cc, ep_event_profiling.h)
Positive:
- The
PluginEpProfiler::Create()factory withPrivateTagis a clean pattern that ensures validation always runs before construction—prevents partially constructed objects. - The epoch-independent timestamp computation (
offset_ns = now() - profiling_start_time) is an elegant solution that eliminates clock epoch mismatches between ORT and the EP. - Proper error handling throughout: all profiling callback errors are logged but do not throw, preserving inference execution.
- The
ToOpaqueProfilingEvent/FromOpaqueProfilingEventhelpers are clean type-puns that centralize the cast in one place.
Concern:
⚠️ Version-check leak path: InPluginEpProfiler::Create(), ifort_version_supported < 25, the profiler returned by the EP is not released because Release itself may not exist at that version. A comment documents this ambiguity, which is good. However, the callerPluginExecutionProvider::GetProfiler()already gates onort_ep_->ort_version_supported < 25before callingCreateProfiler, making this dead code in practice. Consider either removing the redundant check or adding a brief comment toCreate()noting the caller's pre-check makes this unreachable. (Nitpick)
5. Profiler Base Class Changes (profiler_common.h, profiler.cc)
Positive:
- Enriching
Stop()withconst EventRecord&is well-motivated—it lets EP profilers annotate their events with ORT event metadata (op_name, event_name) at stop time, eliminating the need for the post-hoc merge algorithm used by CUDA EP. - The detailed doc comments on
Start()/Stop()explaining relative vs. absolute IDs and the conversion formula are excellent. These will save plugin EP authors significant debugging time. - Moving the
Stop()call above event storage inprofiler.ccis correct because the event hasn't been moved yet at that point. Theconst EventRecord¶meter ensures the data is available during the callback.
Concern:
- None for this component.
6. Existing EP Profiler Updates (gpu_profiler_common.h, cuda_profiler.h, vitisai_profiler.h, webgpu_profiler.h)
Positive:
- The signature changes are mechanical and correct:
void Stop(uint64_t)→void Stop(uint64_t, const EventRecord&)with theEventRecordparameter unused in these existing implementations to preserve current behavior. - The new TODO comment in
gpu_profiler_common.hdocumenting the incorrect sort-order assumption in the merge algorithm is a helpful breadcrumb for future work.
Concern:
- None. Clean signature adaptation.
7. Example EP Profiler (ep_profiling.cc, ep_profiling.h)
Positive:
- The
EpEventManagersingleton with per-profiler state and thread-local correlation tracking is a clear demonstration of the intended usage pattern. It correctly handles:- Multiple concurrent profiling sessions (each with its own
profiler_id) - Inter-op parallelism (thread-local ORT event stacks)
- Event annotation by filtering on
thread_idduringPopOrtEvent
- Multiple concurrent profiling sessions (each with its own
- Event timestamp computation in
EndProfilingImplcorrectly combines the ORT-provided offset with the EP's local elapsed time—a faithful implementation of the documented formula.
Concern:
⚠️ Mutex contention in hot path:EpEventManagerholds a global mutex forAddEpEvent,PushOrtEvent, andPopOrtEvent. InCompute()(binary_op.cc),AddEpEventis called under this mutex on every kernel invocation when profiling is active. For a real EP with high throughput, this serialized access to the global events vector could become a bottleneck. For test/example code this is completely fine, but the example might benefit from a brief comment noting that a production EP should use a more scalable approach (e.g., per-thread event collection). (Suggestion)
8. Kernel Integration (binary_op.cc)
Positive:
- Conditional profiling (only activating when
GetActiveProfilerId()returns a value) ensures zero overhead on the hot path when profiling is disabled. - Using
std::chrono::high_resolution_clockconsistently with the profiler start point eliminates clock-source mismatches.
Concern:
- None.
9. Tests (test_execution.cc, ep_plugin_provider_test.cc, inference_session_test.cc)
Positive:
- The end-to-end profiling tests (
KernelPluginEp_SessionProfiling,KernelPluginEp_RunProfiling) validate the full pipeline: run model → parse profile JSON → find EP event → verify category/timestamps/duration/args → verify parent ORT event containment. This is thorough coverage. - The unit tests for C API (
CreateProfilingEvent_AllCategories,*_NullOutput,*_NullKey) and C++ wrappers validate all error paths and both constructor overloads. - The
inference_session_test.ccfix toCheckRunProfilerWithStartProfileremoves the fragile line-count-dependent assertions and replaces them with a proper search. Good defensive test improvement. - The run-profiling test honestly acknowledges the subgraph limitation as a
TODOwith commented-out assertions rather than asserting wrong behavior.
Concern:
⚠️ Run profiling file discovery is fragile: The run profiling test finds the profile file by scanning the current directory for the newest file matching the prefix. In a parallel test execution environment, two separate test processes could produce files with near-identical timestamps, causing the wrong file to be selected. The cleanupgsl::finallymitigates stale file risk, but the window exists. Consider using a unique subdirectory per test invocation to isolate profile output. (Suggestion—may be an ORT test infra limitation)⚠️ profile_file_prefixtype change: The change from"ort_run_profile_test"toORT_TSTR("ort_run_profile_test")ininference_session_test.ccis a type fix. Verify thatRunOptions::profile_file_prefixwas recently changed fromstd::stringtostd::basic_string<ORTCHAR_T>or similar—otherwise this would be a compile error. This appears to be correct but is worth confirming. (Nitpick)
10. Build System (onnxruntime_unittests.cmake)
Positive:
- New source files (
ep_profiling.h,ep_profiling.cc) are correctly added to the example plugin EP's build target, keeping them adjacent to the other EP source files.
Concern:
- None.
Summary of Concerns
| # | Severity | Component | Issue |
|---|---|---|---|
| 1 | Suggestion | C API (onnxruntime_ep_c_api.h) |
Add static_assert to enforce enum value equality between C++ EventCategory and C OrtProfilingEventCategory |
| 2 | Suggestion | C++ wrappers (onnxruntime_cxx_inline.h) |
reinterpret_cast in AddEvents is technically strict-aliasing UB; consider a temporary pointer array |
| 3 | Nitpick | C API impl (error_code_helper.h) |
New #include "make_string.h" adds a transitive dependency via ORT_API_RETURN_IF |
| 4 | Nitpick | Bridge layer (ep_event_profiling.cc) |
Version-check code path in Create() is unreachable given caller's pre-check; add comment or remove |
| 5 | Suggestion | Example EP (ep_profiling.cc) |
Global mutex in EpEventManager could bottleneck real EPs; add comment noting this is example-only |
| 6 | Suggestion | Tests (test_execution.cc) |
Run profiling file discovery by timestamp is fragile under parallel test execution |
| 7 | Nitpick | Tests (inference_session_test.cc) |
Confirm profile_file_prefix type change to ORT_TSTR() matches upstream RunOptions field type |
Verdict
APPROVE — This is a well-designed, thoroughly documented, and carefully validated API extension. The concerns are suggestions and nitpicks; none represent correctness bugs or security issues. The code correctly maintains ABI stability, validates all inputs at the C API boundary, handles error paths gracefully, and provides comprehensive test coverage.
Description
TLDR
This PR ports the existing C++ EpProfiler interfaces used by provider-bridge EPs to the binary-stable C APIs for plugin EPs. It introduces C/C++ APIs for creating/querying profiling events, a container for appending EP events, and callback hooks (
StartEvent/StopEvent) that give EPs access to ORT event metadata in real-time.Changes to the original C++ API
The original
EpProfilerC++ interface was adapted for the C API with the following intentional changes:StartProfilingnow receives an offset indicating the elapsed time since profiling started, as opposed to receiving an absolute/epoch-dependent profiling start time. This prevents EPs from having to do epoch conversions. Credit to @edgchen1 for the idea.StartEvent/StopEventreceive an absolute, epoch-based correlation ID (ort_event_correlation_id) instead of a relative ORT event ID. ThePluginEpProfilerbridge layer automatically converts the C++relative_ort_event_id(microseconds since profiling start) to an absoluteort_event_correlation_idby adding the epoch-based profiling start time. This means plugin EPs can use the correlation ID directly with profiling utilities like CUPTI or ROCTracer without computing the conversion themselves.StopEventnow receives the completed ORT event as a parameter. This allows EPs to optionally inspect ORT event metadata (e.g.,op_name,event_name) at the time the event ends, facilitating annotation of correlated EP events.EndProfilingonly allows EPs to append events (viaOrtProfilingEventsContainer), not read or modify the full events array. This is motivated by:parent_name/op_namemetadata. However:StopEventparameter that provides the EP with the full correlated ORT event.Naming conventions for ORT event IDs
EpProfilerinterface (existing): Usesrelative_ort_event_id— a timestamp offset in microseconds relative to profiling start.OrtEpProfilerImpl(new in this PR): Usesort_event_correlation_id— an absolute, epoch-based timestamp in microseconds computed fromstd::chrono::high_resolution_clock(platform-defined epoch). Unique across concurrent profiling sessions within the same process.PluginEpProfilerbridge class (inep_event_profiling.cc) performsort_event_correlation_id = relative_ort_event_id + profiling_start_time_epoch_us_, mirroring the pattern inGPUTracerManager::PushCorrelation.New C APIs
CreateProfilingEventReleaseProfilingEventProfilingEvent_GetCategorySESSION,NODE,KERNEL,API)ProfilingEvent_GetNameProfilingEvent_GetTimestampUsProfilingEvent_GetDurationUsProfilingEvent_GetArgValueProfilingEventsContainer_AddEventsOrtEp::CreateProfilerOrtEpProfilerImpl::StartProfilingOrtEpProfilerImpl::StartEventort_event_correlation_idOrtEpProfilerImpl::StopEventort_event_correlation_idand ORT event metadataOrtEpProfilerImpl::EndProfilingOrtEpProfilerImpl::ReleaseNew C++ wrapper classes
Ort::ConstProfilingEventOrtProfilingEvent(e.g., inStopEvent)Ort::ProfilingEventOrtProfilingEvent(e.g., forEndProfiling)Ort::UnownedProfilingEventsContainerOrtProfilingEventsContainerduringEndProfilingExample EP profiling implementation
This PR updates an example plugin EP to use the new profiling APIs:
OrtEpProfilerImplimplementation: ep_profiling.h / ep_profiling.ccOrtEp::CreateProfiler()implementation: ep.ccExisting bugs found
Not fixed in this PR.
Motivation and Context
Allows plugin EPs to generate profiling events, further closing the functionality gap between provider-bridge EPs and plugin EPs.