ZJIT: Drop capstone, use built-in disassemblers#954
Draft
tekknolagi wants to merge 12 commits intoShopify:masterfrom
Draft
ZJIT: Drop capstone, use built-in disassemblers#954tekknolagi wants to merge 12 commits intoShopify:masterfrom
tekknolagi wants to merge 12 commits intoShopify:masterfrom
Conversation
Pure-Rust x86_64 and aarch64 disassemblers ported from the Dart SDK,
to replace the capstone dependency. Copied verbatim from
ares-6-air-rs/src/backend/{disasm_x86_64.rs,disasm_aarch64.rs}.
Change from Dart VM naming convention (r0-r28, tmp, tmp2, pp, fp, lr, zr, csp) to standard ARM64 names (x0-x30, xzr, sp).
Replace capstone with pure-Rust disassemblers for x86_64 and aarch64. The disasm feature flag is kept to control compilation in release builds but no longer pulls in any external dependency.
Format differences from capstone: - movz/movn shown instead of mov alias - Immediates in hex (#0x7 vs #7) - Branch targets as relative decimal (+8 vs #0x10) - Condition on mnemonic (bne vs b.ne) - ldr/str with explicit #0 offset instead of ldur/stur - 32-bit ops use mnemonic suffix (addw) instead of w-prefix registers - Embedded data bytes show as "unknown" instead of fake instructions
- Small values (< 10) use decimal: #7, #0 - Larger values use hex: #0x20, #0x1000 - Signed negatives: #-8, #-0x10 - Branch conditions use b.cond format: b.ne, b.eq - Branch targets as absolute hex: #0x400 - Memory offsets use same decimal/hex convention - movk shift uses comma separator: , lsl Shopify#16 - All immediates have # prefix
- Replace capstone test in asm/x86_64/tests.rs with built-in disassembler - Add clippy allows for vendored disassembler code (erasing_op, manual_range_contains, manual_range_patterns, etc.)
- Add space after comma between operands: addq rax, rcx - Branch/call targets as absolute addresses: call 0xf, jmp 0xd - Fix all internal disasm test assertions to match
Generated by decoding hex dumps through the built-in x86_64 disassembler and computing instruction addresses.
Prevent panics when disassembling truncated or short byte sequences. Return 0 for out-of-bounds reads instead of panicking.
Multi-byte NOPs (>9 bytes) are split into multiple NOP instructions by the assembler, producing multiple disasm lines.
The disassembler doesn't handle movsx with 66 prefix or not [mem] encodings, producing 'unknown' and misaligned subsequent decodes. Update snapshots to match actual output.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
disasmfeature flag is kept but no longer pulls in any external dependencyKnown differences from capstone
addw x0) instead of w-prefix registers (add w0)ldur/sturshown asldr/strwith explicit#0offsetadrp, atomics (ldaddal,ldaxr,stlxr), system registers (mrs,msr)movn/movz→movTest plan
make zjit-test— 2156 tests pass (ARM64)