Benchmarks to compare the performance of async runtimes / executors.
An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/
Results summary table of a single configuration:
| Runtime | libfork | TooManyCooks | tbb | taskflow | cppcoro | coros | HPX | concurrencpp | libcoro |
|---|---|---|---|---|---|---|---|---|---|
| Mean Ratio to Best (lower is better) |
1.00x | 1.11x | 2.82x | 2.98x | 3.53x | 4.45x | 160.93x | 170.80x | 2238.69x |
| skynet | 39639 us | 42512 us | 146884 us | 200196 us | 156739 us | 110734 us | 14654199 us | 12085877 us | 153184034 us |
| nqueens | 78579 us | 83539 us | 161880 us | 183805 us | 186797 us | 883579 us | 4498900 us | 8252158 us | 43830994 us |
| fib(39) | 67668 us | 84565 us | 272178 us | 203514 us | 438185 us | 171781 us | 14550913 us | 18381070 us | 305949459 us |
| matmul(2048) | 41733 us | 43626 us | 62264 us | 62783 us | 54275 us | 50580 us | 72222 us | 68116 us | 465260 us |
| Runtime | TooManyCooks_st_asio | TooManyCooks_mt | libcoro_mt | cobalt_st_asio |
|---|---|---|---|---|
| Mean Ratio to Best (lower is better) |
1.00x | 1.02x | 1.55x | 3.77x |
| channel | 365842 us | 374115 us | 565826 us | 1379967 us |
| Runtime | TooManyCooks | cobalt | cppcoro | libcoro |
|---|---|---|---|---|
| Mean Ratio to Best (lower is better) |
1.00x | 1.12x | 1.45x | 1.48x |
| io_socket_st | 393705 us | 441244 us | 569703 us | 582490 us |
Click to view the machine configuration used in the summary table
- Processor: EPYC 7742 64-core processor
- Worker Thread Count: 64 (no SMT)
- OS: Debian 13 Server
- Compiler: Clang 21.1.7 Release (-O3 -march=native)
- CPU boost enabled / schedutil governor
- Linked against libtcmalloc_minimal.so.4
Currently only includes C++ frameworks, and several recursive fork-join benchmarks:
- recursive fibonacci (forks x2)
- skynet (original link) but increased to 100M tasks (forks x10)
- nqueens (forks up to x14)
- matmul (forks x4)
As well as some miscellaneous benchmarks:
- channel - tests the performance of the library's async MPMC queue
- io_socket_st - tests TCP ping-pong between a single-threaded client and single-threaded server
Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.
- The build+bench script uses python3. The only Python dependency is libyaml.
- CMake + Clang 18 or newer
- libfork and TooManyCooks depend on the hwloc library.
- TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
- boost::cobalt requires Boost 1.82 or newer. You may need to build Boost from source, since cobalt is currently not included in distro packages.
- A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.
On Debian/Ubuntu:
sudo apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4
On MacOS:
brew install cmake gperftools hwloc libyaml tbb
NOTE: If a particular library or benchmark fails to build or run, don't worry - its output will simply be ignored.
python3 ./build_and_bench_all.py
Results will appear in RESULTS.md and RESULTS.csv files.
python3 ./build_and_bench_all.py full
Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.
Frameworks to come:
- (C#) .Net thread pool
- (Rust) tokio
- (Golang) goroutines
- Facebook Folly
- PhotonLibOS https://github.com/alibaba/PhotonLibOS
Benchmarks to come:
- Some inspiration here
