[Bench] Add PTI XPTI instrumentation overhead benchmark#21558
[Bench] Add PTI XPTI instrumentation overhead benchmark#21558againull wants to merge 6 commits intointel:syclfrom
Conversation
Add a new benchmark suite to measure XPTI instrumentation overhead using the pti-gpu SDK's perf-profiling-overhead test. The benchmark: - Clones and builds the pti-gpu repository SDK - Runs the perf-profiling-overhead ctest - Parses and reports the overhead percentage The benchmark is included in the "Full" and "SYCL" presets. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
| # See LICENSE.TXT | ||
| # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
|
||
| import os |
| performs multiple internal iterations and reports a median value. The framework | ||
| then computes the median of all these median values. | ||
|
|
||
| For stable results, use --iterations=20 or higher. |
There was a problem hiding this comment.
Looks like we need iterations variable in dispatched runs now? Could you add it, please?
There was a problem hiding this comment.
hmm... I'm not sure if we can add many new variables in .github/workflows/sycl-ur-perf-benchmarking.yml workflow. Perhaps if we want to add iterations we could not add verbose...?
Also, alternatively we could print a warning or fail if iterations are below 20...?
|
|
||
| def git_hash(self) -> str: | ||
| # Latest master branch - can be updated as needed | ||
| return "master" |
There was a problem hiding this comment.
hmm.. we rather use tags/commits, as moving target "master" may lead to unexpected issues.
| use_installdir=False, | ||
| ) | ||
|
|
||
| # Patch CMakeLists.txt to increase threshold from 60 to 70 |
There was a problem hiding this comment.
just a note: if this threshold is important variable, perhaps we should add support in xpti project to manipulate this value via e.g. env var or cmake option...?
| extra_args = [ | ||
| f"-DCMAKE_C_COMPILER={options.sycl}/bin/clang", | ||
| f"-DCMAKE_CXX_COMPILER={options.sycl}/bin/clang++", | ||
| "-DCMAKE_CXX_FLAGS=-Wall -Wextra -Wextra-semi -pedantic -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -D_FORTIFY_SOURCE=2", |
There was a problem hiding this comment.
do we need -Wall etc. in benchmarking...?
| f"-S{self.project.src_dir}/sdk", | ||
| f"-B{build_dir}", | ||
| ] | ||
| + extra_args, |
There was a problem hiding this comment.
these extra_agrs are only used here, why not merge all options to a single list? unless you wanted to control wheter they are enabled or not (in some cases)
|
|
||
| For stable results, use --iterations=20 or higher. | ||
| """ | ||
| build_dir = self.suite.project.src_dir / "sdk" / "build" |
There was a problem hiding this comment.
can't we reuse build_dir from the suite?
| performs multiple internal iterations and reports a median value. The framework | ||
| then computes the median of all these median values. | ||
|
|
||
| For stable results, use --iterations=20 or higher. |
There was a problem hiding this comment.
hmm... I'm not sure if we can add many new variables in .github/workflows/sycl-ur-perf-benchmarking.yml workflow. Perhaps if we want to add iterations we could not add verbose...?
Also, alternatively we could print a warning or fail if iterations are below 20...?
|
|
||
| if not results: | ||
| # Fallback: look for simpler overhead pattern | ||
| fallback_pattern = r"(?:Overhead|overhead).*:\s*([0-9.]+)\s*%" |
There was a problem hiding this comment.
perhaps we could use a fallback_pattern in an else to the if match:. This way we could get rid of the extra for and make it a little simpler
Add a new benchmark suite to measure XPTI instrumentation overhead
using the pti-gpu SDK's perf-profiling-overhead test. The benchmark:
The benchmark is included in the "Full" preset.
Additionally add option to verbose output (false by default).
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com