Replicate generic hardware events on all CPU PMUs#2123
Conversation
|
cc @captain5050 |
|
Does this need any kind of guarding or is this always the right thing to do? |
This is always correct as far as I'm aware. I've tested that it does the right thing on an M2 Ultra Mac Studio, a Pixel 10 and a regular x86 PC (where it just finds the single PMU which returns the same results as not specifying a PMU). |
LebedevRI
left a comment
There was a problem hiding this comment.
Didn't test, but seems fine.
|
can we add google test unit tests for this please? i don't know how tricky it is to test it but i'm a little concerned having no tests at all. |
We can test it by verifying that counters are non-zero when the process is pinned to each CPU in the system. We can adapt the test |
Done |
|
looks like we need to update this from head and do some clang-format fun. |
On systems with more than one PMU for the CPUs (e.g. Apple M series SOCs), generic hardware events are only created for an arbitrary PMU. Usually this is the big cluster's PMU, which can cause inaccuracies when the process is scheduled onto a little core. To fix this, teach PerfCounters to register generic hardware events on all CPU PMUs. CPU PMUs are identified using the same method as perf.
Done |
|
thank you so much :) |
On systems with more than one PMU for the CPUs (e.g. Apple M series SOCs), generic hardware events are only created for an arbitrary PMU. Usually this is the big cluster's PMU, which can cause inaccuracies when the process is scheduled onto a little core. To fix this, teach PerfCounters to register generic hardware events on all CPU PMUs.
CPU PMUs are identified using the same method as perf.