Add selective X-propagation by robtaylor · Pull Request #21 · ChipFlow/Loom

robtaylor · 2026-02-23T21:11:16Z

Summary

Static analysis identifies DFF Q outputs and SRAM read ports as X-sources, computes forward cone with fixpoint iteration to classify ~5% of signals as X-capable
Partition-level classification: only X-capable partitions run the dual-lane (value + X-mask) kernel; X-free partitions have zero overhead
Full GPU kernel support (Metal + CUDA) with shared memory sideband (shared_state_x[256], shared_writeouts_x[256]) and SRAM X-mask shadow
CLI: --xprop flag on loom map and loom sim, VCD output emits Value::X for unknown primary outputs
CPU reference kernel + sanity check for GPU validation
Criterion benchmarks confirm: X-free ~0% overhead, X-capable ~1.5-1.6x (within 2x budget)

Test plan

cargo test passes (unit tests for X-source analysis, CPU kernel correctness)
cargo bench --bench xprop runs successfully
Map a design with --xprop and verify X-capable pin/partition stats in log output
Simulate with --xprop and verify VCD contains X values for uninitialised outputs
Simulate without --xprop and verify identical behaviour to baseline (backward compat)
Load old .gemparts (without xprop fields) and verify xprop_enabled == false

The prefix-based `is_sequential_cell()` matched `starts_with("dl")` which incorrectly classified `dlygate4sd3` (a combinational delay buffer used for hold-time fixing) as a sequential element. This inserted a phantom DFF in the logic path, breaking simulation. Replace prefix matching with an exhaustive table of 32 sequential cells derived from the PDK by grepping for `udp_dff`/`udp_dlatch` primitives in behavioral Verilog models. Regression introduced in d11e914. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)

Covers the full process from library detection through cell classification, behavioral model parsing, AIG decomposition, and testing. Based on the SKY130 enablement experience, including the dlygate4sd3 misclassification pitfall. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)

Proposes a compile-time static analysis approach to identify X-capable signals in the AIG, enabling mixed two-state/four-state simulation. Only partitions with genuinely unknown signals pay the ~2.3x ALU overhead; the rest continue at full two-state speed. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)

Add compute_x_sources() and compute_x_capable_pins() to AIG for identifying signals that can carry unknown (X) values during simulation. Algorithm: mark DFF Q outputs and SRAM read data ports as X sources, forward-propagate through AND gates (leveraging topological order), then fixpoint-iterate through DFF feedback loops until convergence. This is Stage 1 of the selective X-propagation feature: compile-time analysis that identifies the ~5% of signals needing X-aware simulation. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Add xprop_enabled, partition_x_capable, and xprop_state_offset fields to FlattenedScriptV1. Add from_with_xprop() constructor that accepts x_capable_pins from AIG analysis and classifies partitions. Classification: a partition is X-capable if any of its AIG pins are X-capable, with fixpoint propagation for inter-partition reads. Metadata words 8 (is_x_capable) and 9 (xmask_state_offset) are patched into each partition's script for GPU kernel consumption. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Implement simulate_block_v1_xprop() that mirrors the standard kernel but tracks a parallel X-mask sideband through global reads, boomerang stages, SRAM, clock enable gating, and DFF writeout. Non-X-capable partitions delegate to the standard kernel with zero overhead. Also adds sanity_check_cpu_xprop() for GPU validation and 13 unit tests covering the AND gate X-prop formula, DFF clock-enable behavior, SRAM X-mask semantics, and end-to-end kernel execution. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Wire selective X-propagation through the CLI and simulation pipeline: - Add --xprop flag to `loom map` (informational X-analysis report) - Add --xprop flag to `loom sim` (enables X-prop in FlattenedScript) - Add xprop field to DesignArgs; load_design runs compute_x_capable_pins and builds script via from_with_xprop when enabled - Add write_output_vcd_xprop to emit Value::X for X-masked output signals - Log X-capable partition count and warn about X transitions in VCD Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Dual-lane X-mask tracking through all kernel phases: global read, boomerang shuffle/reduction (hier[0]-[12]), writeout hooks, SRAM/duplicate permutation, SRAM commit with X-mask shadow, clock enable permutation, and DFF writeout with X-aware gating. X-free partitions (metadata word 8 == 0) execute unchanged code path with zero overhead — the is_x_capable branch is uniform per threadgroup so there is no warp/SIMD divergence. Shared memory additions: shared_state_x[256] and shared_writeouts_x[256] (+2KB, well within 32KB threadgroup memory limit). Updates all Rust dispatch code (loom.rs, metal_test.rs, cuda_test.rs, cuda_dummy_test.rs) to allocate and pass sram_xmask buffers. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Fix critical state buffer sizing: when xprop is enabled, the GPU kernel expects a doubled state buffer (values + X-mask) per cycle. Add `effective_state_size()` method and `expand_states_for_xprop()` / `split_xprop_states()` helpers for correct buffer management. State buffer layout: [values (reg_io_state_size) | xmask (reg_io_state_size)] per cycle. X-mask initialized to 0xFFFFFFFF for DFF positions (unknown) and 0 for primary input positions (known from VCD). Post-simulation diagnostics: - First-cycle-X-free detection with log message - Warning when X values persist at primary outputs at final cycle - CPU sanity check now uses xprop variant when enabled Wire xprop through all dispatch paths (loom sim Metal/CUDA, metal_test, cuda_test, cuda_dummy_test) with effective_state_size. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Measures three scenarios across varying IO counts and stage depths: - two_state: baseline kernel (no xprop) - xprop_xfree: X-aware kernel on non-X-capable partition (zero overhead) - xprop_xcapable: X-aware kernel on X-capable partition (~1.5-1.6x) Results confirm X-free partitions pay no overhead, and X-capable partitions stay well within the 2x budget. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Replace the speculative Open Questions section with Design Decisions documenting the choices made during implementation: conservative whole-SRAM X granularity, skipping reset-aware analysis, VCD X output enabled, partition-level granularity, runtime CLI flag, and state buffer layout with metadata words 8/9. Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

Unit tests for vcd_io.rs xprop helpers (expand/split roundtrip, cycle structure preservation, X-mask template correctness) and flatten.rs xprop fields (effective_state_size, xprop_state_offset, partition metadata words 8/9). CI updates: run xprop benchmark alongside event_buffer, add E2E xprop simulation steps to both Metal and CUDA jobs (map --xprop, sim --xprop, verify VCD contains X values). Co-developed-by: Claude Code v2.1.39 (claude-opus-4-6)

robtaylor force-pushed the feature/multi-clock-domains branch from 2f71182 to 11bb424 Compare February 23, 2026 23:39

robtaylor force-pushed the multivalue branch from d9ff348 to 4523682 Compare February 24, 2026 00:32

An error occurred while trying to automatically change base from feature/multi-clock-domains to main February 24, 2026 01:38

robtaylor force-pushed the feature/multi-clock-domains branch from fbefda6 to 3b7fc26 Compare February 24, 2026 17:19

robtaylor added 2 commits February 24, 2026 17:31

An error occurred while trying to automatically change base from feature/multi-clock-domains to main February 24, 2026 17:31

robtaylor added 10 commits February 24, 2026 17:40

robtaylor force-pushed the multivalue branch from 4523682 to ced0ca4 Compare February 24, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add selective X-propagation#21

Add selective X-propagation#21
robtaylor wants to merge 12 commits intofeature/multi-clock-domainsfrom
multivalue

robtaylor commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robtaylor commented Feb 23, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant