Skip to content

ARM coresight support#354

Draft
cgaebel wants to merge 13 commits intojanestreet:masterfrom
cgaebel:coresight-aarch64-support
Draft

ARM coresight support#354
cgaebel wants to merge 13 commits intojanestreet:masterfrom
cgaebel:coresight-aarch64-support

Conversation

@cgaebel
Copy link
Copy Markdown
Contributor

@cgaebel cgaebel commented Mar 16, 2026

Just a draft for now, to figure out what needs to be done.

Sample usage in its current state:

cd /tmp
mkdir -p trace-dir
perf record -e cs_etm//u -o ./trace-dir/perf.data -- ./a.out
echo "()" > ./trace-dir/hits.sexp
MAGIC_TRACE_NO_DLFILTER=1 ./magic-trace decode -working-directory ./trace-dir -executable ./a.out

@cgaebel cgaebel marked this pull request as draft March 16, 2026 20:05
@cgaebel cgaebel marked this pull request as ready for review March 16, 2026 20:05
@cgaebel cgaebel marked this pull request as draft March 16, 2026 20:05
@cgaebel cgaebel force-pushed the coresight-aarch64-support branch from b0a327d to af6aa65 Compare March 17, 2026 13:55
cgaebel and others added 12 commits March 17, 2026 11:24
When running an x86_64 magic-trace binary under binfmt_misc/qemu-user
on an aarch64 host, uname -m returns x86_64 but the host's cs_etm
device is available in sysfs. Switch all detection (OCaml and C) to
probe for device existence at runtime instead of checking architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CoreSight ETM produces "tr end  jcc" and "tr end  jmp" branch types
that Intel PT doesn't. Add these to the branches regex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CoreSight ETM can end a trace at any branch type (e.g. "tr end  jcc",
"tr end  jmp"). These were previously considered impossible events.
Treat them like other trace ends by transitioning to untraced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CoreSight ETM produces "tr end  jcc" / "tr end  jmp" for brief decoder
sync gaps that resume immediately with "tr strt". Pushing an [untraced]
call frame for these caused a staircase effect since the subsequent
"tr strt" never pops that frame. Instead, treat these as no-ops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On aarch64, CoreSight ETM classifies PLT stub indirect branches (br xN)
as returns. A "return" from foo@plt to foo is really a jump into the
resolved function. Misclassifying it as a return causes the trace writer
to pop a frame and push a duplicate, creating a staircase effect.

Detect this case by checking if the src symbol ends with @plt and the
dst symbol matches after stripping that suffix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CoreSight ETM classifies indirect branches (br xN) as returns, but
many are actually tail calls or vtable dispatches (e.g. __overflow
tail-calling _IO_file_overflow). In check_current_symbol, after
popping the misclassified frame, re-check whether the new stack top
already matches the destination before pushing. This prevents
duplicate frames that caused staircase artifacts in traces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The more general fix in check_current_symbol (re-checking the stack
top after popping) already handles the PLT case, so the @plt-specific
hack in perf_decode.ml is unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After a `ret` has already popped the returning function, if the new stack
top doesn't match the return destination, the old code would pop the
caller (treating it as a tail call) and then push the destination. This
incorrectly destroyed the caller's stack frame on every PLT/vtable
trampoline return (e.g. `call @plt; return @plt -> real_func`).

Add `check_current_symbol_after_ret` which only pushes on mismatch
(never pops), since after a return the destination is a callee resolved
through a trampoline, not a tail-call replacement. The existing
`check_current_symbol` (pop-then-push) remains for Jump events where
tail-call semantics are correct.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pure push-only approach was too aggressive — it accumulated orphaned
frames when a return skipped over intermediate "ghost" frames (e.g.
same-function different-label like __entry_text_start vs
entry_SYSCALL_64_after_hwframe).

New approach: after ret pops one frame, search the callstack for the
destination symbol. If found, pop down to it (normal return skipping
ghost frames). If not found, push (trampoline target like PLT/vtable).

This correctly handles both cases:
- PLT/vtable trampolines: destination not on stack → push as callee
- Ghost frame returns: destination on stack → pop to it
- Normal returns: destination at top → no-op

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…terns

Tests based on real CoreSight ETM perf traces that exercise:
- PLT trampoline: call @plt, return @plt → real_func, return to caller
- Vtable dispatch: call wrapper, return wrapper → impl
- Tail call via jump: _setjmp → __sigsetjmp
- Repeated PLT calls: verifies no staircase (caller stays open)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant