runtime-benchmarks

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table of a single configuration:

Runtime	libfork	TooManyCooks	tbb	taskflow	cppcoro	coros	HPX	concurrencpp	libcoro
Mean Ratio to Best (lower is better)	1.00x	1.11x	2.82x	2.98x	3.53x	4.45x	160.93x	170.80x	2238.69x
skynet	39639 us	42512 us	146884 us	200196 us	156739 us	110734 us	14654199 us	12085877 us	153184034 us
nqueens	78579 us	83539 us	161880 us	183805 us	186797 us	883579 us	4498900 us	8252158 us	43830994 us
fib(39)	67668 us	84565 us	272178 us	203514 us	438185 us	171781 us	14550913 us	18381070 us	305949459 us
matmul(2048)	41733 us	43626 us	62264 us	62783 us	54275 us	50580 us	72222 us	68116 us	465260 us

Runtime	TooManyCooks_st_asio	TooManyCooks_mt	libcoro_mt	cobalt_st_asio
Mean Ratio to Best (lower is better)	1.00x	1.02x	1.55x	3.77x
channel	365842 us	374115 us	565826 us	1379967 us

Runtime	TooManyCooks	cobalt	cppcoro	libcoro
Mean Ratio to Best (lower is better)	1.00x	1.12x	1.45x	1.48x
io_socket_st	393705 us	441244 us	569703 us	582490 us

Click to view the machine configuration used in the summary table

Processor: EPYC 7742 64-core processor
Worker Thread Count: 64 (no SMT)
OS: Debian 13 Server
Compiler: Clang 21.1.7 Release (-O3 -march=native)
CPU boost enabled / schedutil governor
Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

recursive fibonacci (forks x2)
skynet (original link) but increased to 100M tasks (forks x10)
nqueens (forks up to x14)
matmul (forks x4)

As well as some miscellaneous benchmarks:

channel - tests the performance of the library's async MPMC queue
io_socket_st - tests TCP ping-pong between a single-threaded client and single-threaded server

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Install Dependencies:

The build+bench script uses python3. The only Python dependency is libyaml.
CMake + Clang 18 or newer
libfork and TooManyCooks depend on the hwloc library.
TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
boost::cobalt requires Boost 1.82 or newer. You may need to build Boost from source, since cobalt is currently not included in distro packages.
A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

On Debian/Ubuntu: sudo apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

On MacOS: brew install cmake gperftools hwloc libyaml tbb

Get Quick Results (uses threads = #CPUs):

NOTE: If a particular library or benchmark fails to build or run, don't worry - its output will simply be ignored.

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.

Future Plans

Frameworks to come:

(C#) .Net thread pool
(Rust) tokio
(Golang) goroutines
Facebook Folly
PhotonLibOS https://github.com/alibaba/PhotonLibOS

Benchmarks to come:

Some inspiration here

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
cpp		cpp
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build_and_bench_all.py		build_and_bench_all.py
clean_all.sh		clean_all.sh
generate_results_md.py		generate_results_md.py
get_nproc.sh		get_nproc.sh
merge_results.py		merge_results.py
results.html.tmpl		results.html.tmpl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

tzcnt/runtime-benchmarks

Folders and files

Latest commit

History

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages