Skip to content

[gpl] Include internal power in MBFF clustering cost#10075

Open
openroad-ci wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:mbff-improve-power-modeling
Open

[gpl] Include internal power in MBFF clustering cost#10075
openroad-ci wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:mbff-improve-power-modeling

Conversation

@openroad-ci
Copy link
Copy Markdown
Collaborator

@openroad-ci openroad-ci commented Apr 7, 2026

Summary

The MBFF algorithm previously used only leakage power to decide whether
to replace single-bit flip-flops with multi-bit cells. For flip-flops,
internal power dominates total power, and MBFF cells share scan (SE/SI)
and clock structures across bits, giving 40-90% savings on those pins.
Ignoring internal power caused clustering to increase total power on
some PDKs.

Add getInternalEnergy() which sums average internal energy across all
pins (CK, D, Q, SE, SI) from Liberty internal_power tables. Use this
alongside leakage in three places:

  • SetRatios: norm_power_ uses total estimated power (leakage +
    internal_energy * clock_activity) so the ILP cost function reflects
    total power, not just leakage.
  • ReadLibs: select best tray per size by minimum total estimated power
    instead of minimum leakage, so cells with lower total power (e.g. SVT
    over LVT) are preferred even when their leakage is higher.
  • SetVars: select the single-bit baseline cell by lowest total estimated
    power, with both leakage and internal energy paired from the same cell.

Clock period is obtained from SDC before ReadLibs runs so tray selection
can account for internal power. Falls back to leakage-only when no
clock is defined or no internal_power tables exist.

Type of Change

  • New feature

Verification

  • I have verified that the local build succeeds (./etc/Build.sh).
  • I have run the relevant tests and they pass.
  • My code follows the repository's formatting guidelines.
  • I have signed my commits (DCO).

@maliberty
Copy link
Copy Markdown
Member

@mikesinouye please test this on your internal designs. I'm seeing improvement on some of our designs.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates clock pin internal power into the MBFF cost metric, ensuring more accurate power estimation during clustering. The changes include adding a method to calculate clock pin energy, updating the cost calculation logic, and caching clock periods. I have identified a potential division-by-zero risk in the cost normalization logic, a performance concern regarding redundant STA lookups within the SetVars loop, and a violation of the style guide regarding the use of magic numbers for clock activity calculations.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

clang-tidy review says "All clean, LGTM! 👍"

// Include clock pin internal power in the cost metric.
// Clock activity = 2 transitions / period (known precisely from SDC).
const float clock_activity
= (clock_period_ > 0) ? (2.0 / clock_period_) : 0.0;
Copy link
Copy Markdown
Collaborator

@QuantamHD QuantamHD Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you grab the 2 / clock_period from the waveform duty cycle definition from OpenSTA. Clock::waveform

I know this is equal rise equal fall by default, but since it is possible to specify it I feel you should take it into account here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The power only depends on transition activity not duty cycle.

Copy link
Copy Markdown
Collaborator

@QuantamHD QuantamHD Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can specify multiple transitions in the wave form. So my assumption was you would need to replace the 2.0 with waveform()->getSize()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is legal but I've never seen it used. In any case sta doesn't handle it either:

float
Power::clockDuty(const Clock *clk)
{
  if (clk->isGenerated()) {
    const Clock *master = clk->masterClk();
    if (master == nullptr)
      return 0.5;  // punt
    else
      return clockDuty(master);
  }
  else {
    const FloatSeq *waveform = clk->waveform();
    float rise_time = (*waveform)[0];
    float fall_time = (*waveform)[1];
    float duty = (fall_time - rise_time) / clk->period();
    return duty;
  }
}
```

The MBFF algorithm previously used only leakage power to decide whether
to replace single-bit flip-flops with multi-bit cells.  For flip-flops,
internal power dominates total power, and MBFF cells share scan (SE/SI)
and clock structures across bits, giving 40-90% savings on those pins.
Ignoring internal power caused clustering to increase total power on
some PDKs.

Add getInternalEnergy() which sums average internal energy across all
pins (CK, D, Q, SE, SI) from Liberty internal_power tables.  Use this
alongside leakage in three places:

- SetRatios: norm_power_ uses total estimated power (leakage +
  internal_energy * clock_activity) so the ILP cost function reflects
  total power, not just leakage.
- ReadLibs: select best tray per size by minimum total estimated power
  instead of minimum leakage, so cells with lower total power (e.g. SVT
  over LVT) are preferred even when their leakage is higher.
- SetVars: select the single-bit baseline cell by lowest total estimated
  power, with both leakage and internal energy paired from the same cell.

Clock period is obtained from SDC before ReadLibs runs so tray selection
can account for internal power.  Falls back to leakage-only when no
clock is defined or no internal_power tables exist.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@openroad-ci openroad-ci force-pushed the mbff-improve-power-modeling branch from c194db0 to eee27b3 Compare April 9, 2026 18:26
@github-actions github-actions bot added the size/M label Apr 9, 2026
@maliberty maliberty changed the title [gpl] Include clock pin internal power in MBFF clustering cost [gpl] Include internal power in MBFF clustering cost Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

clang-tidy review says "All clean, LGTM! 👍"

@maliberty
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eee27b3ea3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 2410 to 2412
SetVars(FFs[i]);
clock_period_ = getClockPeriod(ff_inst);
SetRatios(array_mask);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set clock period before deriving single-bit power baseline

SetVars() now depends on clockActivity() to compute single_bit_power_, but in this loop clock_period_ is updated only after SetVars() runs. In designs with multiple clock periods, each chunk therefore uses the previous chunk’s period (or the bootstrap value) for the denominator baseline, while SetRatios() uses the current chunk’s period, producing inconsistent power normalization and skewed clustering decisions.

Useful? React with 👍 / 👎.

Comment on lines +2381 to +2385
// Determine clock period before ReadLibs so tray selection can use
// total estimated power (leakage + internal_energy * clock_activity).
if (!flops_.empty()) {
clock_period_ = getClockPeriod(insts_[0]);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid seeding tray power ranking from one arbitrary flop clock

ReadLibs() ranks best_master_ using activity = clockActivity(), but Run() initializes clock_period_ from only insts_[0] before that pass. Because tray ranking is cached globally while clustering later runs per clock-net group, multi-clock designs can permanently select a suboptimal tray master for domains whose period differs from that first flop, regressing total power optimization.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants