Skip to content
/ Loom Public
forked from NVlabs/GEM

Open-source RTL logic simulator with CUDA acceleration

License

Notifications You must be signed in to change notification settings

ChipFlow/Loom

 
 

Repository files navigation

Loom

CI License Rust

Loom is a GPU-accelerated RTL logic simulator. Like a Jacquard loom weaving patterns from punched cards, Loom maps gate-level netlists onto a virtual manycore Boolean processor and executes them on GPUs, delivering 5-40X speedup over CPU-based RTL simulators.

Loom builds on the excellent GEM research by Zizheng Guo, Yanqing Zhang, Runsheng Wang, Yibo Lin, and Haoxing Ren at NVIDIA Research. ChipFlow extends their work with:

  • Metal backend for Apple Silicon Macs (in addition to the original CUDA backend)
  • Liberty timing support — load real cell delays from Liberty files (e.g. SKY130) for timing-annotated simulation
  • SDF back-annotation — post-layout timing from Standard Delay Format files
  • Setup/hold violation detection — both CPU and GPU-side checking
  • Significant performance optimizations to the partition mapping pipeline
  • CI/CD with automated testing across both backends

Roadmap: Timing Simulation

The goal is GPU-accelerated gate-level simulation with real cell timing — a first for open source. Current status:

Component Status
Liberty file parsing Done — loads SKY130 HD cell delays
Gate delay computation Done — per-AIG-pin delays from Liberty
SDF back-annotation Done — post-layout delays from SDF files
CPU timing simulation Done — arrival time propagation with setup/hold checking
GPU timing simulation Done — setup/hold violation detection on GPU (Metal + CUDA)
SKY130 timing test suite Done — post-P&R test circuits with SDF
Unified loom sim CLI Done — timing constraints wired to both Metal and CUDA kernels

Next steps:

  1. Timing-aware bit packing for improved GPU utilization
  2. Multi-clock domain support

Quick Start

Requires the Rust toolchain.

git clone https://github.com/ChipFlow/Loom.git
cd Loom
git submodule update --init --recursive

Build (Metal - macOS)

cargo build -r --features metal --bin loom

Build (CUDA - Linux)

Requires CUDA toolkit installed.

cargo build -r --features cuda --bin loom

Usage

Loom operates in two phases:

  1. Map your synthesized gate-level netlist to a .gemparts file (one-time cost):
cargo run -r --bin loom -- map design.gv design.gemparts
  1. Simulate with a VCD input waveform:
# Metal (macOS) - use NUM_BLOCKS=1
cargo run -r --features metal --bin loom -- sim design.gv design.gemparts input.vcd output.vcd 1

# CUDA (Linux) - set NUM_BLOCKS to 2x your GPU's SM count
cargo run -r --features cuda --bin loom -- sim design.gv design.gemparts input.vcd output.vcd NUM_BLOCKS

# With SDF timing back-annotation:
cargo run -r --features metal --bin loom -- sim design.gv design.gemparts input.vcd output.vcd 1 \
  --sdf design.sdf --sdf-corner typ

See docs/usage.md for full documentation including synthesis preparation, VCD scope handling, and troubleshooting.

Documentation

Browse the full documentation online or build it locally with mdbook:

mdbook serve   # opens at http://localhost:3000

Limitations

  • Only supports non-interactive testbenches (static VCD input waveforms)
  • Synchronous logic only (no latches or async sequential logic)
  • Clock gates must use the CKLNQD module from aigpdk.v

Benchmarks

Pre-synthesized benchmark designs are in benchmarks/dataset/ (git submodule). See benchmarks/README.md for instructions.

Available designs: NVDLA, Rocket, Gemmini.

Citation

Loom builds on the GEM research. Please cite the original paper if you find this work useful.

@inproceedings{gem,
 author = {Guo, Zizheng and Zhang, Yanqing and Wang, Runsheng and Lin, Yibo and Ren, Haoxing},
 booktitle = {Proceedings of the 62nd Annual Design Automation Conference 2025},
 organization = {IEEE},
 title = {{GEM}: {GPU}-Accelerated Emulator-Inspired {RTL} Simulation},
 year = {2025}
}

License

Apache-2.0. See LICENSE for details.

About

Open-source RTL logic simulator with CUDA acceleration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Verilog 87.0%
  • Rust 10.1%
  • Python 1.7%
  • Metal 0.5%
  • Cuda 0.3%
  • Shell 0.2%
  • Other 0.2%