Description
Problem Statement
The current rtl_macro_placer frequently fails or produces suboptimal results for designs with:
- a large number of small SRAM macros
- clustered memory architectures
- structured layouts (e.g., systolic arrays, tiled accelerators)
This is evident from multiple issues:
Typical failure:
Root Cause Analysis
The current macro placement approach:
relies primarily on simulated annealing
assumes macros are:
few
large
loosely connected
However, modern accelerator workloads (ML/AI designs) exhibit:
many small SRAM macros
strong locality and communication constraints
structured layouts (grid / tiled / systolic)
Key Mismatch:
OpenROAD → generic macro placement
Modern designs → structured + memory-dense
This leads to:
failure in search space exploration
invalid placements (MPL-0040)
routing congestion
degraded PPA
Suggested Solution
Inspiration from Industry (Vivado)
Modern tools like Xilinx Vivado handle similar challenges using:
hierarchical placement (clustering)
constraint-driven floorplanning (Pblocks)
architecture-aware optimization (BRAM/DSP awareness)
multi-stage placement (coarse → refine)
Proposed Enhancement: ML-Aware Macro Placement Mode
Introduce a new mode:
rtl_macro_placer -mode ml_aware
This mode enables structure-aware, memory-aware placement.
Proposed Approach
- Macro Clustering (Hierarchical Placement)
Build a connectivity graph of macros
Cluster macros based on:
communication intensity
shared memory structures
Treat clusters as super-macros
Reduces search complexity and improves locality
- Structure / Grid-Aware Placement
Detect structured patterns:
systolic arrays
tiled compute blocks
Enforce:
alignment (row/column)
regular spacing
Matches real accelerator layouts
- Enhanced Cost Function
Extend placement cost:
Cost = α * wirelength
+ β * overlap
+ γ * congestion
+ δ * clustering_penalty
+ ε * locality_reward
Add:
locality awareness
communication distance penalties
cluster compactness
4. Congestion-Aware Placement
Estimate routing congestion early
Penalize dense macro regions
Prevents post-placement routing failures
- Multi-Stage Placement (Vivado-style)
Stage 1: Clustering
group macros into clusters
Stage 2: Coarse Placement
place clusters globally
Stage 3: Refinement
expand clusters → place individual macros
- Failure Recovery Mechanism
Instead of:
FAIL → exit
Introduce:
FAIL → adaptive retry:
- reduce clustering granularity
- adjust placement density
- change annealing parameters
- retry with different seeds
Additional Context
I would be interested in contributing to as I belong to VLSI domain :
macro clustering implementation
cost function improvements
benchmarking and evaluation
Looking forward to feedback from maintainers.
Description
Problem Statement
The current
rtl_macro_placerfrequently fails or produces suboptimal results for designs with:This is evident from multiple issues:
Typical failure:
Root Cause Analysis
The current macro placement approach:
relies primarily on simulated annealing
assumes macros are:
few
large
loosely connected
However, modern accelerator workloads (ML/AI designs) exhibit:
many small SRAM macros
strong locality and communication constraints
structured layouts (grid / tiled / systolic)
Key Mismatch:
OpenROAD → generic macro placement
Modern designs → structured + memory-dense
This leads to:
failure in search space exploration
invalid placements (MPL-0040)
routing congestion
degraded PPA
Suggested Solution
Inspiration from Industry (Vivado)
Modern tools like Xilinx Vivado handle similar challenges using:
hierarchical placement (clustering)
constraint-driven floorplanning (Pblocks)
architecture-aware optimization (BRAM/DSP awareness)
multi-stage placement (coarse → refine)
Proposed Enhancement: ML-Aware Macro Placement Mode
Introduce a new mode:
rtl_macro_placer -mode ml_aware
This mode enables structure-aware, memory-aware placement.
Proposed Approach
Build a connectivity graph of macros
Cluster macros based on:
communication intensity
shared memory structures
Treat clusters as super-macros
Reduces search complexity and improves locality
Detect structured patterns:
systolic arrays
tiled compute blocks
Enforce:
alignment (row/column)
regular spacing
Matches real accelerator layouts
Extend placement cost:
Cost = α * wirelength
+ β * overlap
+ γ * congestion
+ δ * clustering_penalty
+ ε * locality_reward
Add:
locality awareness
communication distance penalties
cluster compactness
4. Congestion-Aware Placement
Estimate routing congestion early
Penalize dense macro regions
Prevents post-placement routing failures
Stage 1: Clustering
group macros into clusters
Stage 2: Coarse Placement
place clusters globally
Stage 3: Refinement
expand clusters → place individual macros
Instead of:
FAIL → exit
Introduce:
FAIL → adaptive retry:
- reduce clustering granularity
- adjust placement density
- change annealing parameters
- retry with different seeds
Additional Context
I would be interested in contributing to as I belong to VLSI domain :
macro clustering implementation
cost function improvements
benchmarking and evaluation
Looking forward to feedback from maintainers.