-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add the tracegrind tool #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
art049
wants to merge
26
commits into
master
Choose a base branch
from
tracegrind-tool
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pure copy of callgrind/ to tracegrind/ with symbol prefix rename CLG_ → TG_ (expanding to vgTracegrind_), header guards updated, public header renamed to tracegrind.h with TRACEGRIND_* macros. No behavioral changes — output is still identical to callgrind. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace callgrind's accumulated callgraph output with streaming CSV trace data emitted at function ENTER/EXIT boundaries. Each row contains delta counters since the last sample, enabling per-call cost attribution. Key changes: - dump.c: Replace callgraph output with CSV trace (trace_open/emit/close) - callstack.c: Hook push/pop_call_stack to emit ENTER/EXIT samples - threads.c: Add per-thread last_sample_cost for delta tracking - global.h: Add trace_output struct and per-thread sample state - main.c: Open trace at init, close at fini, update copyright Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --output-format=csv|msgpack option. MsgPack format uses LZ4 block compression achieving ~12x compression vs CSV. New files: - tg_msgpack.c/h: MsgPack encoder (write-only) - tg_lz4.c/h: LZ4 compression wrapper with VG_() adaptations - lz4.c/h: Vendored LZ4 library (BSD-2-Clause) - docs/tracegrind-msgpack-format.md: Format specification Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update msgpack format to version 2 with event_schemas - Each event type (ENTER, EXIT, FORK) has its own column schema - FORK events use minimal 4-element format: [seq, tid, event, child_pid] - Remove CSV output format entirely (msgpack-only now) - Add decode-trace.py script for debugging trace files - Add fork detection via post-syscall handler for fork/clone/vfork Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4ccc4f9 to
864a751
Compare
Add tracegrind configurations to the benchmark suite: - tracegrind/default: basic tracing - tracegrind/cache-sim: with cache simulation - tracegrind/cache-sim+systime: with cache sim and syscall timing This allows direct performance comparison between callgrind and tracegrind. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detect available tools at startup and only run benchmarks for tools that are present. This fixes CI failures when running against upstream valgrind which doesn't have tracegrind. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TRACEGRIND_ADD_MARKER client request that emits named marker events (event=0) into the trace stream, renumbering ENTER=1, EXIT=2, FORK=3. Remove the legacy dump_profile/zero_all_cost/dump_every_bb machinery inherited from callgrind, replacing it with the simpler compute_total_cost. Update the analyzer script (renamed from decode-trace.py) to match the new event numbering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove ~240 lines of unused code inherited from callgrind: - Dead CLI options (combine-dumps, compress-*, dump-*, collect-alloc, etc.) - Dead struct fields (jCC.creation_seq, BBCC.ret_counter, fn_node.is_malloc/is_realloc/is_free, etc.) - Dead functions (forall_bbccs, zero_bbcc, cachesim_dump_desc, cachesim_add_icost) - Dead types and typedefs (OutputFormat, fCC, SimCost, UserCost, AddrPos, AddrCost, FnPos) - Dead EG_ALLOC event group and its registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These callgrind-inherited options are unnecessary for tracegrind's streaming trace model. Simplifies recursion depth tracking to always increment/decrement unconditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add vg_regtest-based regression tests covering basic tracing, markers, instrumentation toggle, toggle collect, call chains, inlining behavior, and schema validation. Extend CI matrix to run tracegrind tests alongside callgrind on both Ubuntu 22.04 and 24.04. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nction tracking Track inlined function transitions at the BB level using Valgrind's debug info API. This bumps the trace format to v3 with two new event types (4=ENTER_INLINED, 5=EXIT_INLINED), updates the analyzer script to handle them, and adds regression tests for enter and nested inlined scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… diffing Replace flat single-pointer inline tracking with a per-BB inline call stack built via Valgrind's InlIPCursor API. BB-to-BB transitions now diff the old and new inline stacks to emit the minimal EXIT/ENTER sequence, producing correct containment (ENTER outer → ENTER inner → EXIT inner → EXIT outer) instead of flat transitions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for signal handling, C++ exceptions, longjmp, tail calls, and deep recursion (100 levels) to verify call stack correctness across non-trivial control flow. Also fix missing -I include path for tracegrind.h in test Makefile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Emit THREAD_CREATE (type 6) when new threads are spawned, using VG_(track_pre_thread_ll_create). Suppress spurious FORK events for pthread_create by checking CLONE_THREAD flag in clone/clone3 syscalls. Rename events for consistency: ENTER→ENTER_FN, EXIT→EXIT_FN, ENTER_INLINED→ENTER_INLINED_FN, EXIT_INLINED→EXIT_INLINED_FN. Reorder: ENTER_INLINED_FN=3, EXIT_INLINED_FN=4, FORK=5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that syscall instruction counts and timing (sysCount, sysTime, sysCpuTime) are properly attributed to libc wrapper functions (getpid, write) when --collect-systime=nsec is enabled. Nonzero timing values on EXIT_FN events are normalized to T to assert measurement occurred. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Include creator ("valgrind-tracegrind") and creator_version fields in
the schema chunk so consumers can identify which tool and version
produced the trace file.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spawns 3 threads with distinct noinline call chains at different depths (work_a->depth_a1->depth_a2, work_b->depth_b1, work_c->depth_c1->depth_c2) to verify tracegrind correctly tracks per-thread ENTER_FN/EXIT_FN stacks. Output is sorted by tid for deterministic comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed file path The expected output had file=??? but debug info now correctly resolves the source file to test_thread_create.c. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Record the unit of time-based event counters (sysTime, sysCpuTime) in the schema chunk so consumers can interpret values without out-of-band knowledge of the --collect-systime setting. The map is extensible for future counters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… file The output file path is now used exactly as specified by --tracegrind-out-file. The default format includes the extension so the default behavior is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nters array Extract counter column names from inline event schemas into a separate top-level `counters` field and nest counter deltas as a sub-array within fn-call data rows. This makes the schema self-describing for counter layout without repeating counter names in every event type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a directory for pre-generated tracegrind output files that serve as reference material for trace parser implementations, along with a script to regenerate them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Perf profiling revealed LZ4 compression (11.4% of runtime) and per-event strlen calls (4.6%) as the top two optimization targets. Switch from LZ4_compress_default to LZ4_compress_fast with acceleration=2 for faster compression at marginal ratio cost. Cache name_len in fn_node, file_node, and obj_node structs so msgpack_write_str receives pre-computed lengths instead of calling VG_(strlen) on every trace event. This eliminates strlen from the perf profile entirely. Benchmarked improvement: 55-78ms saved (10-13% of the TG-CG gap) on ls -lR /usr/share/doc workload. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add .pre-commit-config.yaml with clang-format scoped to tracegrind/ only. Reformat all tracegrind source files to match the repo's .clang-format style. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.