Add selective X-propagation#21
Open
robtaylor wants to merge 12 commits intofeature/multi-clock-domainsfrom
Open
Add selective X-propagation#21robtaylor wants to merge 12 commits intofeature/multi-clock-domainsfrom
robtaylor wants to merge 12 commits intofeature/multi-clock-domainsfrom
Conversation
2f71182 to
11bb424
Compare
d9ff348 to
4523682
Compare
An error occurred while trying to automatically change base from
feature/multi-clock-domains
to
main
February 24, 2026 01:38
fbefda6 to
3b7fc26
Compare
The prefix-based `is_sequential_cell()` matched `starts_with("dl")`
which incorrectly classified `dlygate4sd3` (a combinational delay
buffer used for hold-time fixing) as a sequential element. This
inserted a phantom DFF in the logic path, breaking simulation.
Replace prefix matching with an exhaustive table of 32 sequential
cells derived from the PDK by grepping for `udp_dff`/`udp_dlatch`
primitives in behavioral Verilog models.
Regression introduced in d11e914.
Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
Covers the full process from library detection through cell classification, behavioral model parsing, AIG decomposition, and testing. Based on the SKY130 enablement experience, including the dlygate4sd3 misclassification pitfall. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
An error occurred while trying to automatically change base from
feature/multi-clock-domains
to
main
February 24, 2026 17:31
Proposes a compile-time static analysis approach to identify X-capable signals in the AIG, enabling mixed two-state/four-state simulation. Only partitions with genuinely unknown signals pay the ~2.3x ALU overhead; the rest continue at full two-state speed. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Add compute_x_sources() and compute_x_capable_pins() to AIG for identifying signals that can carry unknown (X) values during simulation. Algorithm: mark DFF Q outputs and SRAM read data ports as X sources, forward-propagate through AND gates (leveraging topological order), then fixpoint-iterate through DFF feedback loops until convergence. This is Stage 1 of the selective X-propagation feature: compile-time analysis that identifies the ~5% of signals needing X-aware simulation. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Add xprop_enabled, partition_x_capable, and xprop_state_offset fields to FlattenedScriptV1. Add from_with_xprop() constructor that accepts x_capable_pins from AIG analysis and classifies partitions. Classification: a partition is X-capable if any of its AIG pins are X-capable, with fixpoint propagation for inter-partition reads. Metadata words 8 (is_x_capable) and 9 (xmask_state_offset) are patched into each partition's script for GPU kernel consumption. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Implement simulate_block_v1_xprop() that mirrors the standard kernel but tracks a parallel X-mask sideband through global reads, boomerang stages, SRAM, clock enable gating, and DFF writeout. Non-X-capable partitions delegate to the standard kernel with zero overhead. Also adds sanity_check_cpu_xprop() for GPU validation and 13 unit tests covering the AND gate X-prop formula, DFF clock-enable behavior, SRAM X-mask semantics, and end-to-end kernel execution. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Wire selective X-propagation through the CLI and simulation pipeline: - Add --xprop flag to `loom map` (informational X-analysis report) - Add --xprop flag to `loom sim` (enables X-prop in FlattenedScript) - Add xprop field to DesignArgs; load_design runs compute_x_capable_pins and builds script via from_with_xprop when enabled - Add write_output_vcd_xprop to emit Value::X for X-masked output signals - Log X-capable partition count and warn about X transitions in VCD Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Dual-lane X-mask tracking through all kernel phases: global read, boomerang shuffle/reduction (hier[0]-[12]), writeout hooks, SRAM/duplicate permutation, SRAM commit with X-mask shadow, clock enable permutation, and DFF writeout with X-aware gating. X-free partitions (metadata word 8 == 0) execute unchanged code path with zero overhead — the is_x_capable branch is uniform per threadgroup so there is no warp/SIMD divergence. Shared memory additions: shared_state_x[256] and shared_writeouts_x[256] (+2KB, well within 32KB threadgroup memory limit). Updates all Rust dispatch code (loom.rs, metal_test.rs, cuda_test.rs, cuda_dummy_test.rs) to allocate and pass sram_xmask buffers. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Fix critical state buffer sizing: when xprop is enabled, the GPU kernel expects a doubled state buffer (values + X-mask) per cycle. Add `effective_state_size()` method and `expand_states_for_xprop()` / `split_xprop_states()` helpers for correct buffer management. State buffer layout: [values (reg_io_state_size) | xmask (reg_io_state_size)] per cycle. X-mask initialized to 0xFFFFFFFF for DFF positions (unknown) and 0 for primary input positions (known from VCD). Post-simulation diagnostics: - First-cycle-X-free detection with log message - Warning when X values persist at primary outputs at final cycle - CPU sanity check now uses xprop variant when enabled Wire xprop through all dispatch paths (loom sim Metal/CUDA, metal_test, cuda_test, cuda_dummy_test) with effective_state_size. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Measures three scenarios across varying IO counts and stage depths: - two_state: baseline kernel (no xprop) - xprop_xfree: X-aware kernel on non-X-capable partition (zero overhead) - xprop_xcapable: X-aware kernel on X-capable partition (~1.5-1.6x) Results confirm X-free partitions pay no overhead, and X-capable partitions stay well within the 2x budget. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Replace the speculative Open Questions section with Design Decisions documenting the choices made during implementation: conservative whole-SRAM X granularity, skipping reset-aware analysis, VCD X output enabled, partition-level granularity, runtime CLI flag, and state buffer layout with metadata words 8/9. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
Unit tests for vcd_io.rs xprop helpers (expand/split roundtrip, cycle structure preservation, X-mask template correctness) and flatten.rs xprop fields (effective_state_size, xprop_state_offset, partition metadata words 8/9). CI updates: run xprop benchmark alongside event_buffer, add E2E xprop simulation steps to both Metal and CUDA jobs (map --xprop, sim --xprop, verify VCD contains X values). Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)
4523682 to
ced0ca4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
shared_state_x[256],shared_writeouts_x[256]) and SRAM X-mask shadow--xpropflag onloom mapandloom sim, VCD output emitsValue::Xfor unknown primary outputsTest plan
cargo testpasses (unit tests for X-source analysis, CPU kernel correctness)cargo bench --bench xpropruns successfully--xpropand verify X-capable pin/partition stats in log output--xpropand verify VCD contains X values for uninitialised outputs--xpropand verify identical behaviour to baseline (backward compat).gemparts(without xprop fields) and verifyxprop_enabled == false