Skip to content

bsc-mem/Mess-2.0

Repository files navigation

Mess Benchmark Logo

Mess Benchmark 2.0

A multiplatform benchmark designed to provide a holistic, detailed, and close-to-hardware view of memory system performance through bandwidth-latency curves.

This is an update to the original Mess Benchmark (now deprecated), focused on improved usability and portability.

Website  |  GitHub  |  Paper

License Version

Table of Contents


Documentation

Project documentation is available in the GitHub wiki:


Motivation

Traditional memory benchmarks report isolated metrics such as peak bandwidth or idle latency, which often fail to capture how memory systems behave under realistic workloads. Mess (Memory Stress) addresses this limitation by characterizing memory performance through bandwidth-latency curves that cover the full range of memory traffic intensity, from unloaded to fully saturated.

This approach reveals critical insights:

  • Memory writes degrade performance significantly compared to reads
  • Systems typically saturate at 70-90% of theoretical maximum bandwidth
  • Latency ranges from 85-130ns when idle to 200-600ns+ under saturation

Mess provides a holistic, close-to-hardware view of memory system behavior, enabling researchers and engineers to understand real-world performance characteristics that standard benchmarks miss.

MICRO 2024 Best Paper Runner-Up: The Mess methodology was published at the 57th IEEE/ACM International Symposium on Microarchitecture.

For a detailed explanation of the benchmark methodology, see the Memory BSC Tools page.


Tools Included

Mess 2.0 provides an integrated workflow for memory system characterization, from benchmarking to application profiling:

  • Mess Benchmark: Characterizes your memory system by generating bandwidth-latency curves that reveal how it behaves under varying load.
  • Mess Profiler: Automates counter discovery and runs profiling tools (perf, likwid, etc.) with the correct configuration, ensuring application measurements align with benchmark data.
  • Plotter-Parser: Generates publication-quality plots as well as CSV and JSON files containing the parsed bandwidth-latency curves.
  • Traffic Generator: The low-level engine that generates precise memory traffic patterns at the assembly level. Can also be used independently for custom microbenchmarks.

Architecture Support

Support status follows the wiki (Architecture-Support):

Architecture Status SIMD Notes
x86-64 CPUs Supported AVX2, AVX-512 Intel and AMD processors
ARM CPUs Supported NEON, SVE Includes Neoverse, Graviton, Apple Silicon
Power CPUs Supported VSX Power8 and newer
RISC-V CPUs WIP RVV 1.0 Assembly + latency + counter detection available; bandwidth measurement pending
GPUs Pending Under active development

Installation

See full instructions in the Installation wiki page.

Clone

git clone --recursive https://github.com/bsc-mem/Mess-2.0.git
cd Mess-2.0

Important: The --recursive flag is required to download submodules that Mess depends on.

If you forget --recursive, initialize submodules later:

git submodule update --init --recursive

Dependencies

Core requirements:

  • C++17 compiler: GCC 9+ (recommended), Clang 10+, Intel OneAPI (ICX), AOCC
  • numactl: NUMA memory binding (required)
  • taskset: Core pinning (preferred, part of util-linux)
  • perf: Recommended counter backend (linux-tools-common)
  • Python 3: Plotting utilities

On Linux, ensure perf access:

cat /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

Huge pages are recommended for more accurate measurements:

echo 1024 | sudo tee /proc/sys/vm/nr_hugepages

Even without huge pages, Mess automatically compensates for page walk latency. See Huge-Memory-Pages for details.

Build

make
make install

Binaries are generated in build/bin/:

  • mess — Core benchmark
  • mess-profiler — Memory bandwidth profiler
  • traffic_generator — Standalone traffic generation tool

Optional PATH setup:

export PATH=$PATH:$(pwd)/build/bin

Verification

./build/bin/mess --version
./build/bin/mess --dry-run --verbose=2

Quick Start

./build/bin/mess --dry-run --verbose=2

./build/bin/mess

./build/bin/mess --profile

Common options:

Option Description Example
--ratio=N[,N...] Issued load ratio(s) in % --ratio=100,75,50
--pause=N[,N...] Pause bubble values --pause=0,10,100,1000
--profile Save measurement files --profile
--verbose=N Verbosity level 0-4 --verbose=3
--measurer=TYPE Counter backend (auto/perf/likwid/pcm) --measurer=perf
--bind=LIST NUMA memory-node binding --bind=0
--cores=LIST Explicit traffic-generator cores --cores=0-15
--total-cores=N Number of traffic-generator cores --total-cores=16

For the complete option set, use ./build/bin/mess --help or see Understanding-CLI-Arguments.


Common Workflows

Single-Point Sanity Check

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 --repetitions=1
./build/bin/mess --profile --ratio=0 --pause=0 --verbose=3 --repetitions=1

NUMA Comparison

./build/bin/mess --profile --bind=0 --folder=numa0
./build/bin/mess --profile --bind=1 --folder=numa1

Core-Scaling Sweep

for c in 2 4 8 16; do
  ./build/bin/mess --profile --total-cores=$c --folder=cores_$c
done

Mess Profiler

mess-profiler reuses Mess counter discovery to profile applications with consistent output.

./build/bin/mess-profiler --dry-run

./build/bin/mess-profiler -s 100ms -o app_profile.csv ./my_app

Profiler docs: Mess-Profiler


Plotter-Parser

Visualization utilities in utils/:

  • plotter.py: generates memory-curve plots and processed CSV/JSON
  • app_plotter.py: overlays application profile points on curves
  • parse_runtimes.py: summarizes run times
cd utils
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Plotter docs: Plotter-Parser


Learning Resources


Troubleshooting

  • Permission errors on counters: set perf_event_paranoid to 0
  • Missing counters/backend mismatch: check ./build/bin/mess-profiler --dry-run
  • Unstable measurements: increase --repetitions and use --verbose=3

More: FAQ and Iterative-Debugging

Found a bug? Open an issue on GitHub or email mess@bsc.es.


Contributors

Mess is developed by the Memory Systems Team at the Barcelona Supercomputing Center (BSC).

Victor Xirau Guardans
Main Mess 2.0 developer
victor.xirau@bsc.es
Mariana Carmin
Mess 2.0 developer
mcarmin@bsc.es
Pau Diaz
Mess 2.0 developer
pau.diazcuesta@bsc.es
Pouya Esmaili Dokht
Mess Paper author
pouya.esmaili@bsc.es

Or email: mess@bsc.es


Citation

If you use Mess in research, please cite:

@inproceedings{esmaili2024mess,
  title     = {A Mess of Memory System Benchmarking, Simulation and Application Profiling},
  author    = {Esmaili-Dokht, Pouya and Sgherzi, Francesco and Girelli, Valeria Soldera
               and Boixaderas, Isaac and Carmin, Mariana and Monemi, Alireza
               and Armejach, Adria and Mercadal, Estanislao and Llort, German
               and Radojkovi{\'c}, Petar and Moreto, Miquel and Gim{\'e}nez, Judit
               and Martorell, Xavier and Ayguad{\'e}, Eduard and Labarta, Jesus
               and Confalonieri, Emanuele and Dubey, Rishabh and Adlard, Joshua},
  booktitle = {Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)},
  pages     = {136--152},
  year      = {2024},
  publisher = {IEEE}
}

References

  1. Mess Benchmark — The original implementation of the Mess benchmark.
  2. Mess Simulator — Analytical memory model using bandwidth-latency curves.
  3. Mess-Paraver — Integration with Paraver for visualization.
  4. Mess Paper — Esmaili-Dokht, P., Sgherzi, F., Girelli, V. S., Boixaderas, I., Carmin, M., Monemi, A., Armejach, A., Mercadal, E., Llort, G., Radojković, P., Moreto, M., Giménez, J., Martorell, X., Ayguadé, E., Labarta, J., Confalonieri, E., Dubey, R., & Adlard, J. (2024). A mess of memory system benchmarking, simulation and application profiling. In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 136-152). IEEE.

Mess Benchmark is released under the BSD 3-Clause License

About

The Mess benchmark, redesigned. A C++ framework for generating bandwidth–latency curves to characterize memory systems.

Topics

Resources

License

Stars

Watchers

Forks

Contributors