Benchmark Runner QoL Improvements by ManuLinares · Pull Request #3004 · c3lang/c3c

ManuLinares · 2026-03-06T20:45:13Z

Fixed merge conflicts of #2672

From original PR by @NotsoanoNimus :

I work with compile-benchmark quite often - only fair I help with its
upkeep. Slapped this together in a couple hours, because I really need
more consistent results and a way to visualize them, and the benchmark
runner is a bit of a rat's nest.

Changes, in no particular order:

Add a MEDIAN metric to the results
- The mean gives us throughput information, but it's too
  heavily skewed by a set's outliers.
- The median gives us an idea of performance expectations with
  a 50% probability. Put another way, the tested function has performed at
  or better than the median in 50% of samples taken.
- Caveat: a sorted set is required. For high-volume iterations,
  this can get expensive.
Improve the output units and refactor that into NanoDuration,
so it can be used elsewhere as desired
Provide some pretty colors, oooooo
Added CSV reporting option to get a resultant data-set
- Not the most useful thing, but makes it easy to plug benchmark
  outputs into other software. Could probably be improved upon. and yeah
  I'm a boomer who likes CSV, sue me
Restructure the benchmark runtime to be more like the tester
runtime
Rudimentary command-line options to adjust benchmark options

@NotsoanoNimus

Fixed merge conflicts of c3lang#2672 ``` From @NotsoanoNimus: I work with `compile-benchmark` quite often - only fair I help with its upkeep. Slapped this together in a couple hours, because I really need more consistent results and a way to visualize them, and the benchmark runner is a bit of a rat's nest. Changes, in no particular order: * Add a MEDIAN metric to the results * The _mean_ gives us throughput information, but it's too heavily skewed by a set's outliers. * The _median_ gives us an idea of performance expectations with a 50% probability. Put another way, the tested function has performed at or better than the median in 50% of samples taken. * Caveat: a sorted set is required. For high-volume iterations, this can get expensive. * Improve the output units and refactor that into `NanoDuration`, so it can be used elsewhere as desired * Provide some pretty colors, oooooo * Added CSV reporting option to get a resultant data-set * Not the most useful thing, but makes it easy to plug benchmark outputs into other software. Could probably be improved upon. and yeah I'm a boomer who likes CSV, sue me * Restructure the benchmark runtime to be more like the tester runtime * Fixed a divide-by-zero crash when the benchmark iteration count was <100 * Rudimentary command-line options to adjust benchmark options ```

lerno · 2026-03-06T20:48:20Z

I would drop the median in favour of standard deviation

differentiate measure units by color (us, ms, s)

ManuLinares · 2026-03-06T22:43:54Z

I would drop the median in favour of standard deviation

I added standard deviation by default (no need for a flag)
And a few tweaks here and there on colors and stuff.
Progress bar doesn't affect bench results.

ManuLinares · 2026-03-06T23:42:38Z

I tweaked all stdlib benchmarks to target ~300ms execution time

lerno · 2026-03-07T00:14:55Z

I just fixed this. You should revert your changes.

lerno · 2026-03-07T00:15:14Z

(And merge with the latest)

lerno · 2026-03-07T11:08:08Z

Out of scope for this one, but the benchmarking progress indicator is an old thing and works poorly, and I think it should be removed anyway:

Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [####################] 980 / 1024 (96%)
Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [####################] 990 / 1024 (97%)
Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [####################] 1000 / 1024 (98%)
Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [####################] 1010 / 1024 (99%)
Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [####################] 1020 / 1024 (100%)
Benchmarking crypto_hash_benchmarks::streebog_512_1mib .............. [COMPLETE] 5.44 milliseconds, 5437982.00 CPU clocks, 1024 iterations (runtime 5.57 seconds)

lerno · 2026-03-07T11:08:53Z

Another point of improvement would be to add a symbol for standard deviation instead.

print standard deviation by default

c25f4bc

differentiate measure units by color (us, ms, s)

ManuLinares added 2 commits March 6, 2026 19:45

clean leftovers

c6d0bbe

tweak all benchmarks so each takes ~300ms

6673122

Merge branch 'master' into benchmark-qol-clean

f899b7b

Merge remote-tracking branch 'upstream/master' into benchmark-qol-clean

4f00ea5

ManuLinares force-pushed the benchmark-qol-clean branch from 38e9c80 to 4f00ea5 Compare March 7, 2026 00:17

lerno merged commit 9e2fea9 into c3lang:master Mar 7, 2026
21 checks passed

lerno mentioned this pull request Mar 7, 2026

Benchmark Runner QoL Improvements #2672

Closed

ManuLinares deleted the benchmark-qol-clean branch March 7, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark Runner QoL Improvements#3004

Benchmark Runner QoL Improvements#3004
lerno merged 6 commits intoc3lang:masterfrom
ManuLinares:benchmark-qol-clean

ManuLinares commented Mar 6, 2026

Uh oh!

lerno commented Mar 6, 2026

Uh oh!

ManuLinares commented Mar 6, 2026

Uh oh!

ManuLinares commented Mar 6, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ManuLinares commented Mar 6, 2026

Uh oh!

lerno commented Mar 6, 2026

Uh oh!

ManuLinares commented Mar 6, 2026

Uh oh!

ManuLinares commented Mar 6, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

lerno commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants