Skip to content

ZJIT: Drop capstone, use built-in disassemblers#954

Draft
tekknolagi wants to merge 12 commits intoShopify:masterfrom
tekknolagi:mb-integrate-dart-disassembler
Draft

ZJIT: Drop capstone, use built-in disassemblers#954
tekknolagi wants to merge 12 commits intoShopify:masterfrom
tekknolagi:mb-integrate-dart-disassembler

Conversation

@tekknolagi
Copy link

Summary

  • Replace capstone (C library) dependency with pure-Rust x86_64 and aarch64 disassemblers ported from the Dart VM
  • The disasm feature flag is kept but no longer pulls in any external dependency
  • Immediate formatting tuned to match capstone conventions (decimal for small values, hex for larger)

Known differences from capstone

  • 32-bit ops use mnemonic suffix (addw x0) instead of w-prefix registers (add w0)
  • ldur/stur shown as ldr/str with explicit #0 offset
  • A few instructions not yet supported: adrp, atomics (ldaddal, ldaxr, stlxr), system registers (mrs, msr)
  • No alias resolution for movn/movzmov

Test plan

  • make zjit-test — 2156 tests pass (ARM64)
  • x86_64 CI

Pure-Rust x86_64 and aarch64 disassemblers ported from the Dart SDK,
to replace the capstone dependency. Copied verbatim from
ares-6-air-rs/src/backend/{disasm_x86_64.rs,disasm_aarch64.rs}.
Change from Dart VM naming convention (r0-r28, tmp, tmp2, pp, fp, lr,
zr, csp) to standard ARM64 names (x0-x30, xzr, sp).
Replace capstone with pure-Rust disassemblers for x86_64 and aarch64.
The disasm feature flag is kept to control compilation in release builds
but no longer pulls in any external dependency.
Format differences from capstone:
- movz/movn shown instead of mov alias
- Immediates in hex (#0x7 vs #7)
- Branch targets as relative decimal (+8 vs #0x10)
- Condition on mnemonic (bne vs b.ne)
- ldr/str with explicit #0 offset instead of ldur/stur
- 32-bit ops use mnemonic suffix (addw) instead of w-prefix registers
- Embedded data bytes show as "unknown" instead of fake instructions
- Small values (< 10) use decimal: #7, #0
- Larger values use hex: #0x20, #0x1000
- Signed negatives: #-8, #-0x10
- Branch conditions use b.cond format: b.ne, b.eq
- Branch targets as absolute hex: #0x400
- Memory offsets use same decimal/hex convention
- movk shift uses comma separator: , lsl Shopify#16
- All immediates have # prefix
- Replace capstone test in asm/x86_64/tests.rs with built-in disassembler
- Add clippy allows for vendored disassembler code (erasing_op,
  manual_range_contains, manual_range_patterns, etc.)
- Add space after comma between operands: addq rax, rcx
- Branch/call targets as absolute addresses: call 0xf, jmp 0xd
- Fix all internal disasm test assertions to match
Generated by decoding hex dumps through the built-in x86_64
disassembler and computing instruction addresses.
Prevent panics when disassembling truncated or short byte sequences.
Return 0 for out-of-bounds reads instead of panicking.
Multi-byte NOPs (>9 bytes) are split into multiple NOP instructions
by the assembler, producing multiple disasm lines.
The disassembler doesn't handle movsx with 66 prefix or not [mem]
encodings, producing 'unknown' and misaligned subsequent decodes.
Update snapshots to match actual output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant