Deduplicate walltime perf symbols, unwind data and debug info across pids #240

GuillaumeLagrange · 2026-02-11T14:44:04Z

Since the real deduplicator source of truth is actually the path of the elf file on disk, I settled on

{file_name}-{rest_of_path_hash}.unwind_data
{file_name}-{rest_of_path_hash}.symbols.map

it allows for the filename to be determined by the actual path, without having to either nest all the files, or have to sanitize paths to file name and end up in awkward file-name length limitations

Here's what it looks like on a 1000 iterations report of one simple program

$ ls
bash-86c9948d.symbols.map                  libattr.so.1.1.2502-a75bfa71.symbols.map  libc.so.6-1af6be6d.unwind_data          libm.so.6-59456f02.symbols.map           libreadline.so.8.3-1656ff94.unwind_data
bash-86c9948d.unwind_data                  libattr.so.1.1.2502-a75bfa71.unwind_data  libc.so.6-4302bc2f.symbols.map          libm.so.6-59456f02.unwind_data           libseccomp.so.2.6.0-01555ea6.symbols.map
bash-fffb755d.symbols.map                  libaudit.so.1.0.0-c27122eb.symbols.map    libc.so.6-4302bc2f.unwind_data          libm.so.6-c9163c27.symbols.map           libseccomp.so.2.6.0-01555ea6.unwind_data
bash-fffb755d.unwind_data                  libaudit.so.1.0.0-c27122eb.unwind_data    libdl.so.2-58b68575.symbols.map         libm.so.6-c9163c27.unwind_data           libsystemd-shared-258.so-d9b4b43b.symbols.map
exec-harness-cc46e1df.symbols.map          libblkid.so.1.1.0-68b14485.symbols.map    libdl.so.2-58b68575.unwind_data         libncursesw.so.6.5-7479753c.symbols.map  libsystemd-shared-258.so-d9b4b43b.unwind_data
exec-harness-cc46e1df.unwind_data          libblkid.so.1.1.0-68b14485.unwind_data    libdl.so.2-dd4b2dac.symbols.map         libncursesw.so.6.5-7479753c.unwind_data  perf.metadata
ExecutionTimestamps.msgpack                libcap-ng.so.0.0.0-2c59cd5f.symbols.map   libdl.so.2-dd4b2dac.unwind_data         libncursesw.so.6.5-c089dd4d.symbols.map  perf.pipedata
graph-benchmark-4d4467b7.symbols.map       libcap-ng.so.0.0.0-2c59cd5f.unwind_data   libgcc_s.so.1-a02333af.symbols.map      libncursesw.so.6.5-c089dd4d.unwind_data  results
graph-benchmark-4d4467b7.unwind_data       libcap.so.2.77-7a805723.symbols.map       libgcc_s.so.1-a02333af.unwind_data      libpam.so.0.85.1-417d7e35.symbols.map    runner.log
ld-linux-x86-64.so.2-83640422.symbols.map  libcap.so.2.77-7a805723.unwind_data       libhistory.so.8.3-b5123faa.symbols.map  libpam.so.0.85.1-417d7e35.unwind_data    systemd-run-0333c053.symbols.map
ld-linux-x86-64.so.2-83640422.unwind_data  libcrypto.so.3-06c897d6.symbols.map       libhistory.so.8.3-b5123faa.unwind_data  libpthread.so.0-8e8dd99c.symbols.map     systemd-run-0333c053.unwind_data
ld-linux-x86-64.so.2-beb2b705.symbols.map  libcrypto.so.3-06c897d6.unwind_data       libhistory.so.8.3-bbaab728.symbols.map  libpthread.so.0-8e8dd99c.unwind_data     systemd-tty-ask-password-agent-e7893e06.symbols.map
ld-linux-x86-64.so.2-beb2b705.unwind_data  libcrypt.so.2.0.0-09d9cd41.symbols.map    libhistory.so.8.3-bbaab728.unwind_data  libreadline.so.8.3-0dfde283.symbols.map  systemd-tty-ask-password-agent-e7893e06.unwind_data
libacl.so.1.1.2302-b7a36e2c.symbols.map    libcrypt.so.2.0.0-09d9cd41.unwind_data    libmount.so.1.1.0-f87ad571.symbols.map  libreadline.so.8.3-0dfde283.unwind_data
libacl.so.1.1.2302-b7a36e2c.unwind_data    libc.so.6-1af6be6d.symbols.map

codspeed-hq · 2026-02-11T14:48:06Z

Merging this PR will not alter performance

✅ 4 untouched benchmarks

_{Comparing cod-2138-deduplicate-perf-maps-and-unwind_data (7f65690) with main (9409359)}

This fixes a regression introduced in 8b37208. We now filter pids using `bench_pids`, except for `exec-harness` integrations, where we take all pids.

- Store pid-agnostic data in a file or json map under the key `{file_name}-{rest_of_path_hash}.{extension}` for each elf - For each pid, store pid specific data, mostly the computed load_bias from where each module was loaded into memory at runtime, alongside a key to retrieve the pid-agnostic data This way, we only write to disk relevant parts of the information.

…refactor The use case for this is quite scoped, and it will most likely get removed later, but for now, it helps compute two consecutive runs, one with the current version of the runner, and one post refactor, and highlight the differences in the report data. It is expected to be used with ASLR disabled, otherwise all load_bias will be randomized across runs.

Also add a rebuild trigger to make it easier to run GITHUB_ACTIONS=1 cargo test` locally. We could have a better trigger, but this will do for now.

not-matthias

I think the struct naming can be clarified a bit, I was struggling quite a lot to understand how they map together. Maybe also using custom types for the key that's used in the hashmaps.

The overall logic looks good! Next round should be quick

not-matthias · 2026-02-12T15:12:40Z

crates/runner-shared/src/metadata.rs

+
+    /// Per-pid debug info references, mapping PID to list of debug info index + load bias
+    #[serde(default, skip_serializing_if = "HashMap::is_empty")]
+    pub debug_info_pid_mappings_by_pid: HashMap<i32, Vec<DebugInfoPidMapping>>,


let's use libc::pid_t here

Le'ts also do it for the two other fields below

not-matthias · 2026-02-12T15:29:01Z

...apshots/codspeed_runner__executor__wall_time__perf__unwind_data__tests__cpp_unwind_data.snap

+            avma_range: 4194304..4558848,
+            base_avma: 4194304,


nit: can we have a custom Debug impl to always have hex values here?

not-matthias · 2026-02-12T15:46:13Z

src/executor/wall_time/perf/parse_perf_file.rs

+
+/// Per-pid mounting info for symbols, referencing the deduplicated symbols by path.
+#[derive(Debug)]
+pub struct SymbolPidMappingWithFullPath {


This is quite a long/complicated name 😅 Do we need the WithFullPath?

Maybe ModuleSymbols for the static structs that are shared, then maybe ProcessModuleSymbols for the in-process loaded symbols with the load bias + path?

Same for unwind data: We have static data which is shared across different processes (UnwindDataV3) and then specific data that depends on the process (just the load bias) which could be called ProcessUnwindData or LoadedUnwindData

not-matthias · 2026-02-12T15:50:11Z

src/executor/wall_time/perf/parse_perf_file.rs

+    // NOTE: We still read the ELF file for every mapping even if we've already seen this path.
+    // The V3 data will be deduplicated below, but there's a possible optimization to skip
+    // the ELF read entirely when the path is already known, by caching the necessary data to
+    // compute load_bias.


side note: Since we're using memmap2 it might already be fast enough (except the copying of the DWARF data). but let's keep it in mind for the future

not-matthias · 2026-02-12T15:51:05Z

src/executor/wall_time/perf/perf_map.rs


-        let load_bias = elf_helper::compute_load_bias(
+        Ok(Self {
+            load_bias: 0,


wait, shouldn't we remove this then?

not-matthias · 2026-02-12T16:16:11Z

src/bin/compare_walltime_output.rs

@@ -0,0 +1,900 @@
+/// Compares walltime profiling output between old and new runner formats.


I'm not sure if it make sense to keep this, since we won't need it anymore after this PR has been merged, right?

not-matthias · 2026-02-12T16:26:17Z

build.rs

+    // Force a rebuild of the test target to be able to run the full test suite locally just by
+    // setting GITHUB_ACTIONS=1 in the environment.
+    // This is because `test_with` is evaluated at build time
+    println!("cargo::rerun-if-env-changed=GITHUB_ACTIONS");


Do we really need this? I thought test_with has a runtime check

GuillaumeLagrange changed the title ~~Cod 2138 deduplicate perf maps and unwind data~~ Deduplicate walltime perf symbols, unwind data and debug info across pids Feb 11, 2026

GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from 99e0ea4 to 1a32faa Compare February 11, 2026 14:45

GuillaumeLagrange added 2 commits February 12, 2026 03:58

feat: use bench_pids filters when harvesting symbols from perf file

fb3c796

This fixes a regression introduced in 8b37208. We now filter pids using `bench_pids`, except for `exec-harness` integrations, where we take all pids.

GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from ecc5dde to 784aa11 Compare February 12, 2026 02:58

GuillaumeLagrange added 5 commits February 12, 2026 04:37

chore: make old unwinddata parser available

66b838d

feat: add unit test for breaking change detection in unwind_data

b7d8b5e

feat: re-add old debug_info_by_pid field for old report backend parsing

61b29d1

refactor: add helper functions for artifacts saving

9305a03

GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from 15316f4 to 9305a03 Compare February 12, 2026 03:37

GuillaumeLagrange marked this pull request as ready for review February 12, 2026 08:49

GuillaumeLagrange requested review from art049 and not-matthias February 12, 2026 08:49

GuillaumeLagrange added 2 commits February 12, 2026 11:32

feat: skip tests requiring sudo if GITHUB_ACTIONS is not set

aed5b4e

Also add a rebuild trigger to make it easier to run GITHUB_ACTIONS=1 cargo test` locally. We could have a better trigger, but this will do for now.

feat: get rid of path hashing in favor of an index-based shared map

7f65690

GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from a3617af to 7f65690 Compare February 12, 2026 11:14

not-matthias requested changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate walltime perf symbols, unwind data and debug info across pids #240

Deduplicate walltime perf symbols, unwind data and debug info across pids #240

GuillaumeLagrange commented Feb 11, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

not-matthias left a comment

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

not-matthias Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,900 @@
		/// Compares walltime profiling output between old and new runner formats.

Deduplicate walltime perf symbols, unwind data and debug info across pids #240

Are you sure you want to change the base?

Deduplicate walltime perf symbols, unwind data and debug info across pids #240

Conversation

GuillaumeLagrange commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

not-matthias left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GuillaumeLagrange commented Feb 11, 2026 •

edited

Loading

codspeed-hq bot commented Feb 11, 2026 •

edited

Loading