A multiplatform benchmark designed to provide a holistic, detailed, and close-to-hardware view of memory system performance through bandwidth-latency curves.
This is an update to the original Mess Benchmark (now deprecated), focused on improved usability and portability.
- Documentation
- Motivation
- Tools Included
- Architecture Support
- Installation
- Quick Start
- Common Workflows
- Mess Profiler
- Plotter-Parser
- Learning Resources
- Troubleshooting
- Contributors
- Citation
- References
Project documentation is available in the GitHub wiki:
- Wiki Home
- Installation
- Understanding CLI Arguments
- Mess Benchmark
- Mess Profiler
- Plotter-Parser
- Architecture Support
- FAQ
Traditional memory benchmarks report isolated metrics such as peak bandwidth or idle latency, which often fail to capture how memory systems behave under realistic workloads. Mess (Memory Stress) addresses this limitation by characterizing memory performance through bandwidth-latency curves that cover the full range of memory traffic intensity, from unloaded to fully saturated.
This approach reveals critical insights:
- Memory writes degrade performance significantly compared to reads
- Systems typically saturate at 70-90% of theoretical maximum bandwidth
- Latency ranges from 85-130ns when idle to 200-600ns+ under saturation
Mess provides a holistic, close-to-hardware view of memory system behavior, enabling researchers and engineers to understand real-world performance characteristics that standard benchmarks miss.
MICRO 2024 Best Paper Runner-Up: The Mess methodology was published at the 57th IEEE/ACM International Symposium on Microarchitecture.
For a detailed explanation of the benchmark methodology, see the Memory BSC Tools page.
Mess 2.0 provides an integrated workflow for memory system characterization, from benchmarking to application profiling:
- Mess Benchmark: Characterizes your memory system by generating bandwidth-latency curves that reveal how it behaves under varying load.
- Mess Profiler: Automates counter discovery and runs profiling tools (
perf,likwid, etc.) with the correct configuration, ensuring application measurements align with benchmark data. - Plotter-Parser: Generates publication-quality plots as well as CSV and JSON files containing the parsed bandwidth-latency curves.
- Traffic Generator: The low-level engine that generates precise memory traffic patterns at the assembly level. Can also be used independently for custom microbenchmarks.
Support status follows the wiki (Architecture-Support):
| Architecture | Status | SIMD | Notes |
|---|---|---|---|
| x86-64 CPUs | Supported | AVX2, AVX-512 | Intel and AMD processors |
| ARM CPUs | Supported | NEON, SVE | Includes Neoverse, Graviton, Apple Silicon |
| Power CPUs | Supported | VSX | Power8 and newer |
| RISC-V CPUs | WIP | RVV 1.0 | Assembly + latency + counter detection available; bandwidth measurement pending |
| GPUs | Pending | — | Under active development |
See full instructions in the Installation wiki page.
git clone --recursive https://github.com/bsc-mem/Mess-2.0.git
cd Mess-2.0Important: The --recursive flag is required to download submodules that Mess depends on.
If you forget --recursive, initialize submodules later:
git submodule update --init --recursiveCore requirements:
- C++17 compiler: GCC 9+ (recommended), Clang 10+, Intel OneAPI (ICX), AOCC
- numactl: NUMA memory binding (required)
- taskset: Core pinning (preferred, part of
util-linux) - perf: Recommended counter backend (
linux-tools-common) - Python 3: Plotting utilities
On Linux, ensure perf access:
cat /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoidHuge pages are recommended for more accurate measurements:
echo 1024 | sudo tee /proc/sys/vm/nr_hugepagesEven without huge pages, Mess automatically compensates for page walk latency. See Huge-Memory-Pages for details.
make
make installBinaries are generated in build/bin/:
mess— Core benchmarkmess-profiler— Memory bandwidth profilertraffic_generator— Standalone traffic generation tool
Optional PATH setup:
export PATH=$PATH:$(pwd)/build/bin./build/bin/mess --version
./build/bin/mess --dry-run --verbose=2./build/bin/mess --dry-run --verbose=2
./build/bin/mess
./build/bin/mess --profileCommon options:
| Option | Description | Example |
|---|---|---|
--ratio=N[,N...] |
Issued load ratio(s) in % | --ratio=100,75,50 |
--pause=N[,N...] |
Pause bubble values | --pause=0,10,100,1000 |
--profile |
Save measurement files | --profile |
--verbose=N |
Verbosity level 0-4 |
--verbose=3 |
--measurer=TYPE |
Counter backend (auto/perf/likwid/pcm) |
--measurer=perf |
--bind=LIST |
NUMA memory-node binding | --bind=0 |
--cores=LIST |
Explicit traffic-generator cores | --cores=0-15 |
--total-cores=N |
Number of traffic-generator cores | --total-cores=16 |
For the complete option set, use ./build/bin/mess --help or see Understanding-CLI-Arguments.
./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 --repetitions=1
./build/bin/mess --profile --ratio=0 --pause=0 --verbose=3 --repetitions=1./build/bin/mess --profile --bind=0 --folder=numa0
./build/bin/mess --profile --bind=1 --folder=numa1for c in 2 4 8 16; do
./build/bin/mess --profile --total-cores=$c --folder=cores_$c
donemess-profiler reuses Mess counter discovery to profile applications with consistent output.
./build/bin/mess-profiler --dry-run
./build/bin/mess-profiler -s 100ms -o app_profile.csv ./my_appProfiler docs: Mess-Profiler
Visualization utilities in utils/:
plotter.py: generates memory-curve plots and processed CSV/JSONapp_plotter.py: overlays application profile points on curvesparse_runtimes.py: summarizes run times
cd utils
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtPlotter docs: Plotter-Parser
- Tutorials and Slides: mess.bsc.es/tutorials
- Detailed Methodology: memory.bsc.es/tools/mess-benchmark
- Permission errors on counters: set
perf_event_paranoidto0 - Missing counters/backend mismatch: check
./build/bin/mess-profiler --dry-run - Unstable measurements: increase
--repetitionsand use--verbose=3
More: FAQ and Iterative-Debugging
Found a bug? Open an issue on GitHub or email mess@bsc.es.
Mess is developed by the Memory Systems Team at the Barcelona Supercomputing Center (BSC).
|
Victor Xirau Guardans Main Mess 2.0 developer victor.xirau@bsc.es |
Mariana Carmin Mess 2.0 developer mcarmin@bsc.es |
Pau Diaz Mess 2.0 developer pau.diazcuesta@bsc.es |
Pouya Esmaili Dokht Mess Paper author pouya.esmaili@bsc.es |
Or email: mess@bsc.es
If you use Mess in research, please cite:
@inproceedings{esmaili2024mess,
title = {A Mess of Memory System Benchmarking, Simulation and Application Profiling},
author = {Esmaili-Dokht, Pouya and Sgherzi, Francesco and Girelli, Valeria Soldera
and Boixaderas, Isaac and Carmin, Mariana and Monemi, Alireza
and Armejach, Adria and Mercadal, Estanislao and Llort, German
and Radojkovi{\'c}, Petar and Moreto, Miquel and Gim{\'e}nez, Judit
and Martorell, Xavier and Ayguad{\'e}, Eduard and Labarta, Jesus
and Confalonieri, Emanuele and Dubey, Rishabh and Adlard, Joshua},
booktitle = {Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)},
pages = {136--152},
year = {2024},
publisher = {IEEE}
}- Mess Benchmark — The original implementation of the Mess benchmark.
- Mess Simulator — Analytical memory model using bandwidth-latency curves.
- Mess-Paraver — Integration with Paraver for visualization.
- Mess Paper — Esmaili-Dokht, P., Sgherzi, F., Girelli, V. S., Boixaderas, I., Carmin, M., Monemi, A., Armejach, A., Mercadal, E., Llort, G., Radojković, P., Moreto, M., Giménez, J., Martorell, X., Ayguadé, E., Labarta, J., Confalonieri, E., Dubey, R., & Adlard, J. (2024). A mess of memory system benchmarking, simulation and application profiling. In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 136-152). IEEE.
Mess Benchmark is released under the BSD 3-Clause License