From cde7fb756544f27df046d9293d7f31d3c65fe463 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 14 Mar 2026 14:39:05 +0000 Subject: [PATCH 1/3] feat: add 8 Cybereum platform skills with gstack pattern integration Two skill groups for the Cybereum capital project governance platform: Dev Team (building Cybereum): - cybereum-schedule-intelligence: P6/XER analysis, DCMA 14-Point, critical path - cybereum-decision-ai: Schwerpunkt decision engine, corrective actions - cybereum-risk-engine: Risk register, scoring, mitigation strategies - cybereum-evm-control: Earned Value Management, CPI/SPI/EAC analytics PM Team (inside Cybereum): - cybereum-completion-prediction: Monte Carlo, P50/P80 forecasting - cybereum-reference-class: Flyvbjerg RCF, optimism bias correction - cybereum-executive-reporting: Board/PMO/lender report generation - cybereum-sales-intelligence: BD, prospect research, pitch materials Integrated gstack patterns into Cybereum skills: - QA health scoring rubric -> Schedule Intelligence, Risk Engine - Review two-pass severity (CRITICAL/INFORMATIONAL) -> Schedule Intelligence, Risk Engine - Retro trend tracking + JSON history persistence -> EVM Control, Completion Prediction, Decision-AI - Plan-CEO-review interactive question protocol -> Decision-AI - Cross-skill integration wiring -> Executive Reporting https://claude.ai/code/session_01NbTWS982B1xJFJCyt6ZVig --- CLAUDE.md | 40 ++- cybereum-completion-prediction/SKILL.md | 338 ++++++++++++++++++++++++ cybereum-decision-ai/SKILL.md | 271 +++++++++++++++++++ cybereum-evm-control/SKILL.md | 303 +++++++++++++++++++++ cybereum-executive-reporting/SKILL.md | 284 ++++++++++++++++++++ cybereum-reference-class/SKILL.md | 244 +++++++++++++++++ cybereum-risk-engine/SKILL.md | 312 ++++++++++++++++++++++ cybereum-sales-intelligence/SKILL.md | 264 ++++++++++++++++++ cybereum-schedule-intelligence/SKILL.md | 279 +++++++++++++++++++ 9 files changed, 2323 insertions(+), 12 deletions(-) create mode 100644 cybereum-completion-prediction/SKILL.md create mode 100644 cybereum-decision-ai/SKILL.md create mode 100644 cybereum-evm-control/SKILL.md create mode 100644 cybereum-executive-reporting/SKILL.md create mode 100644 cybereum-reference-class/SKILL.md create mode 100644 cybereum-risk-engine/SKILL.md create mode 100644 cybereum-sales-intelligence/SKILL.md create mode 100644 cybereum-schedule-intelligence/SKILL.md diff --git a/CLAUDE.md b/CLAUDE.md index 0fb4879..c81384a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -13,18 +13,34 @@ bun run build # compile binary to browse/dist/browse ``` gstack/ -├── browse/ # Headless browser CLI (Playwright) -│ ├── src/ # CLI + server + commands -│ ├── test/ # Integration tests + fixtures -│ └── dist/ # Compiled binary -├── ship/ # Ship workflow skill -├── review/ # PR review skill -├── plan-ceo-review/ # /plan-ceo-review skill -├── plan-eng-review/ # /plan-eng-review skill -├── retro/ # Retrospective skill -├── setup # One-time setup: build binary + symlink skills -├── SKILL.md # Browse skill (Claude discovers this) -└── package.json # Build scripts for browse +├── browse/ # Headless browser CLI (Playwright) +│ ├── src/ # CLI + server + commands +│ ├── test/ # Integration tests + fixtures +│ └── dist/ # Compiled binary +├── ship/ # Ship workflow skill +├── review/ # PR review skill +├── plan-ceo-review/ # /plan-ceo-review skill +├── plan-eng-review/ # /plan-eng-review skill +├── retro/ # Retrospective skill +├── qa/ # QA testing skill +│ +│ ── Cybereum Platform Skills ── +│ +│ Dev Team (building Cybereum): +├── cybereum-schedule-intelligence/ # P6/XER schedule analysis, DCMA 14-Point +├── cybereum-decision-ai/ # Schwerpunkt decision engine +├── cybereum-risk-engine/ # Risk register, scoring, mitigation +├── cybereum-evm-control/ # Earned Value Management analytics +│ +│ PM Team (inside Cybereum): +├── cybereum-completion-prediction/ # Monte Carlo, P50/P80 forecasting +├── cybereum-reference-class/ # Flyvbjerg RCF, optimism bias correction +├── cybereum-executive-reporting/ # Board/PMO/lender report generation +├── cybereum-sales-intelligence/ # BD, prospect research, pitch materials +│ +├── setup # One-time setup: build binary + symlink skills +├── SKILL.md # Browse skill (Claude discovers this) +└── package.json # Build scripts for browse ``` ## Deploying to the active skill diff --git a/cybereum-completion-prediction/SKILL.md b/cybereum-completion-prediction/SKILL.md new file mode 100644 index 0000000..ce0a43a --- /dev/null +++ b/cybereum-completion-prediction/SKILL.md @@ -0,0 +1,338 @@ +--- +name: cybereum-completion-prediction +version: 1.0.0 +description: | + Generates probabilistic completion forecasts for capital projects using Monte Carlo simulation, + S-curve modeling, and Birnbaum-Saunders distributions. Produces P20/P50/P80 completion date + estimates with confidence intervals and recovery scenario comparisons. Use this skill whenever a + user asks for completion probability, forecast completion date, S-curve analysis, schedule confidence, + Monte Carlo simulation, "what is the probability we finish on time," P50 or P80 estimates, or wants + to compare recovery scenarios on a capital project schedule or portfolio. Always use for probabilistic + forecasting, completion confidence intervals, and risk-adjusted schedule analysis. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Completion Prediction + +Probabilistic forecasting engine for capital project completion. Applies Monte Carlo simulation and statistical modeling to transform deterministic schedule data into confidence-calibrated completion distributions. + +> *"Point estimates lie. Distributions tell the truth."* + +--- + +## Theoretical Foundation + +### Why Probabilistic Forecasting? + +Capital projects are inherently uncertain. A single completion date (point estimate) fails to communicate the risk profile embedded in the schedule. Cybereum's Completion Prediction engine generates: + +- **P20**: 20% probability of completing by this date (optimistic scenario) +- **P50**: 50% probability -- most likely outcome (median) +- **P80**: 80% probability -- conservative/defensible commitment date + +**Industry standard**: DoE, DoD, and major EPC owners use P80 for budget and schedule commitments. NSF SBIR and government programs often require P80 justification. + +### Distribution Selection + +- **Birnbaum-Saunders (Fatigue Life)**: Best for construction activities subject to cumulative degradation (erosion, equipment fatigue cycles) +- **Triangular**: Best for duration uncertainty where min/most likely/max can be estimated +- **Lognormal**: Best for activity durations with right-skewed uncertainty (common in EPC) +- **Beta-PERT**: Best for expert-estimated ranges; more peaked than triangular +- **Uniform**: Use only for truly unknown ranges + +--- + +## Step 1: Input Collection + +Minimum required for basic forecast: + +``` +Current project completion date (deterministic) +Baseline completion date +Remaining duration (working days) +Schedule confidence assessment (Low / Medium / High) +CPI (if available) +SPI (if available) +``` + +For full Monte Carlo: + +``` +Activity list (ID, Duration, Float, % Complete) +Duration uncertainty ranges (% above/below for each activity or class) +Risk register with schedule impact estimates +Number of simulations: default 10,000 +``` + +--- + +## Step 2: Uncertainty Quantification + +### Default Uncertainty Ranges by Project Phase and Condition + +**Engineering Phase:** + +``` +High design maturity (IFC >80%): +/-10% duration uncertainty +Medium maturity (IFC 50-80%): +/-20% uncertainty +Low maturity (IFC <50%): +/-35% uncertainty +``` + +**Procurement Phase:** + +``` +Firm POs, confirmed lead times: +/-5% uncertainty +Orders placed, preliminary leads: +/-15% uncertainty +Pending orders, estimated leads: +/-30% uncertainty +``` + +**Construction Phase:** + +``` +Productivity factor > 0.90: +/-15% uncertainty +Productivity factor 0.75-0.90: +/-25% uncertainty +Productivity factor < 0.75: +/-40% uncertainty +``` + +**Commissioning Phase:** + +``` +System completion >90%: +/-10% uncertainty +System completion 70-90%: +/-20% uncertainty +System completion <70%: +/-35% uncertainty +``` + +--- + +## Step 3: Monte Carlo Simulation Logic + +If running computationally (Python/JS environment available): + +```python +import numpy as np + +def run_monte_carlo( + baseline_duration, # in working days + uncertainty_range, # as decimal (e.g., 0.20 for +/-20%) + risk_events, # list of {probability, schedule_impact_days} + n_simulations=10000, + distribution='lognormal' +): + results = [] + + for _ in range(n_simulations): + # Sample duration uncertainty + if distribution == 'lognormal': + sigma = uncertainty_range / 2 + sampled_duration = np.random.lognormal( + mean=np.log(baseline_duration), + sigma=sigma + ) + elif distribution == 'triangular': + low = baseline_duration * (1 - uncertainty_range) + high = baseline_duration * (1 + uncertainty_range) + sampled_duration = np.random.triangular(low, baseline_duration, high) + + # Add risk event impacts + risk_delay = sum( + impact for prob, impact in risk_events + if np.random.random() < prob + ) + + results.append(sampled_duration + risk_delay) + + return { + 'P20': np.percentile(results, 20), + 'P50': np.percentile(results, 50), + 'P80': np.percentile(results, 80), + 'mean': np.mean(results), + 'std': np.std(results), + 'distribution': results + } +``` + +**Without computational environment**: Use the parametric estimation table below. + +--- + +## Step 4: Parametric Estimation (No Computation Required) + +When Monte Carlo cannot be run directly, apply the Cybereum parametric table: + +### Schedule Confidence Multipliers + +Based on project phase, SPI, and uncertainty level: + +``` +Remaining Uncertainty P50 Multiplier P80 Multiplier P20 Multiplier +Duration Level +< 3 months Low 1.05 1.12 0.97 +< 3 months Medium 1.10 1.22 0.95 +< 3 months High 1.20 1.38 0.90 +3-12 months Low 1.08 1.18 0.96 +3-12 months Medium 1.15 1.30 0.92 +3-12 months High 1.25 1.45 0.88 +> 12 months Low 1.12 1.25 0.95 +> 12 months Medium 1.20 1.40 0.90 +> 12 months High 1.35 1.60 0.85 +``` + +**SPI adjustment**: Multiply P50 multiplier by (1 / SPI) for projects with SPI < 0.90. + +**Risk adjustment**: Add (Sum of top 5 risk impacts x probability) directly to P80 estimate. + +--- + +## Step 5: Recovery Scenario Modeling + +Generate 3 scenarios: + +### Scenario A: Do Nothing (Baseline Trajectory) + +Apply current SPI trend to remaining work. Project current completion date. + +### Scenario B: Moderate Recovery + +Apply a feasible productivity improvement (typically +10-15% from current). Identify fast-track opportunities (2-3 specific activities). Calculate new P50/P80. + +### Scenario C: Aggressive Recovery + +Apply maximum credible productivity improvement + resource surge + logic compression. Calculate new P50/P80. Note: Aggressive recovery typically increases cost CPI by 10-20%. + +**Scenario comparison table:** + +``` +Scenario | P20 Finish | P50 Finish | P80 Finish | Cost Premium +Baseline | [date] | [date] | [date] | $0 +Moderate | [date] | [date] | [date] | +[X]% +Aggressive | [date] | [date] | [date] | +[X]% +Target Date | N/A | [date] | N/A | -- +``` + +--- + +## Step 6: S-Curve Generation (Narrative) + +Describe the project S-curve for the recommended scenario: + +**S-Curve Phases:** + +1. **Mobilization (0-15% duration)**: Slow ramp-up. Actual % complete trails plan. +2. **Peak Execution (15-75% duration)**: Steepest slope. Productivity must be maintained. +3. **Punch-out (75-100% duration)**: Deceleration. Long tail risk if commissioning issues emerge. + +**Current position on S-curve**: [Describe where the project sits and what the shape implies] + +**Inflection point risk**: If project is in punch-out phase, schedule risk is highest -- commissioning issues rarely resolve quickly. + +--- + +## Step 7: Completion Confidence Statement + +Produce an executive-ready forecast statement: + +``` +COMPLETION FORECAST -- [Project Name] -- [Status Date] + +BASELINE COMPLETION: [Date] +CURRENT DETERMINISTIC FORECAST: [Date] ([+/-X] days from baseline) + +PROBABILISTIC FORECAST: + P20 (Optimistic): [Date] -- 20% confidence + P50 (Most Likely): [Date] -- 50% confidence <- Recommended reporting date + P80 (Conservative): [Date] -- 80% confidence <- Recommended commitment date + +COMPLETION ON OR BEFORE BASELINE: [X]% probability + +PRIMARY SCHEDULE DRIVER: [Activity/issue driving the P80] + +RECOVERY POTENTIAL: [X] days recoverable with [specific action] + +CONFIDENCE BASIS: [Phase, SPI, uncertainty level, risk adjustments applied] +``` + +--- + +## Reference Files + +- `references/birnbaum-saunders.md` -- BS distribution theory and capital project application +- `references/monte-carlo-methodology.md` -- Full simulation methodology with validation +- `references/recovery-playbook.md` -- Recovery strategy library by project type and phase +- `references/industry-benchmarks.md` -- Typical P50/P80 spreads by project type + +--- + +## Forecast History & Trend Tracking + +Persist completion forecasts for tracking prediction accuracy over time: + +```bash +mkdir -p .cybereum/forecast-snapshots +``` + +Save as `.cybereum/forecast-snapshots/{project-slug}-{YYYY-MM-DD}.json`: + +```json +{ + "date": "2026-03-14", + "project": "Project Name", + "baseline_finish": "2027-03-01", + "deterministic_finish": "2027-06-15", + "p20_finish": "2027-05-01", + "p50_finish": "2027-07-10", + "p80_finish": "2027-09-22", + "baseline_probability": 0.08, + "spi": 0.92, + "uncertainty_level": "medium", + "primary_driver": "Civil works productivity", + "recovery_potential_days": 35, + "scenario_selected": "moderate" +} +``` + +### Forecast Accuracy Tracking + +When prior snapshots exist, track how predictions evolved: + +``` +Forecast Date | P50 Prediction | P80 Prediction | Trend +2026-01-15 | 2027-05-20 | 2027-07-15 | -- +2026-02-15 | 2027-06-01 | 2027-08-10 | Slipping (+12d / +25d) +2026-03-14 | 2027-07-10 | 2027-09-22 | Slipping (+39d / +43d) +``` + +**Automatic alerts:** +- P50 slipping >15 days per month for 2+ months: **SUSTAINED SCHEDULE EROSION** +- P80 exceeding contractual milestone: **CONTRACTUAL RISK -- notify stakeholders** +- Baseline probability < 10%: **BASELINE IS NOT ACHIEVABLE -- recommend rebaseline** + +### Scenario Comparison Tracking + +Track which recovery scenario was selected and whether the assumed improvements materialized: + +``` +Period | Selected Scenario | Assumed SPI Improvement | Actual SPI | On Track? +P1 | Moderate | +0.05 | +0.02 | Behind +P2 | Moderate | +0.05 | +0.01 | Behind +P3 | Aggressive | +0.10 | TBD | In progress +``` + +If actual improvements consistently fall short of scenario assumptions, flag as **RECOVERY PLAN NOT WORKING -- reassess strategy**. + +--- + +## Portfolio Mode + +For multiple projects, generate a portfolio completion forecast: + +1. Run individual P50/P80 for each project +2. Flag projects with P80 > baseline by >30 days +3. Identify portfolio-level resource conflicts (projects competing for same resources in same period) +4. Rank projects by schedule risk severity +5. Recommend portfolio-level intervention priority diff --git a/cybereum-decision-ai/SKILL.md b/cybereum-decision-ai/SKILL.md new file mode 100644 index 0000000..fdda5be --- /dev/null +++ b/cybereum-decision-ai/SKILL.md @@ -0,0 +1,271 @@ +--- +name: cybereum-decision-ai +version: 1.0.0 +description: | + Provides AI-driven decision support for capital project governance using Schwerpunkt analysis, + multi-criteria corrective action evaluation, and structured critic-insight reasoning. Use this skill + whenever a user asks for project decision support, corrective actions, intervention recommendations, + "what should we do about," Schwerpunkt prioritization, which issues to focus on, escalation decisions, + recovery strategy, or when evaluating trade-offs between cost, schedule, risk, and scope on a capital + project. Always use for executive-level decision framing on EPC, energy, infrastructure, or defense programs. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Decision-AI + +The Schwerpunkt Decision Engine -- Cybereum's AI reasoning layer for capital project governance. Named after the German military concept of "center of gravity," Schwerpunkt analysis identifies the decisive intervention point where effort produces maximum systemic impact. + +> *"Not all problems deserve equal attention. Decision-AI finds the one that matters most."* + +--- + +## Core Philosophy + +Capital projects fail not from a single cause but from compounding, interconnected failures. Decision-AI applies structured reasoning to: + +1. **Identify the Schwerpunkt** -- the root constraint or leverage point driving the most systemic risk +2. **Generate corrective actions** -- ranked by impact-to-effort ratio +3. **Apply critic insight** -- challenge assumptions in the recommended path +4. **Produce an editorial synthesis** -- a defensible, action-oriented recommendation + +--- + +## Step 1: Situation Intake + +When a user presents a project situation, capture: + +**Five Dimensions of Project State:** + +- **Schedule**: Current forecast vs. baseline, float status, critical path health +- **Cost**: CPI, SPI, EAC, budget contingency remaining +- **Risk**: Top open risks, probability x impact score, mitigation status +- **Scope**: Change order volume, scope creep indicators, design maturity +- **Stakeholder**: Alignment gaps, escalation risk, decision bottlenecks + +Ask the user to confirm or provide data across these dimensions before proceeding. + +--- + +## Step 2: Schwerpunkt Analysis + +Identify the single decisive constraint using this structured method: + +### 2A: Issue Mapping + +List all active project issues. For each, assess: + +``` +Issue | Root Cause | Downstream Impact | Recovery Difficulty | Urgency +``` + +### 2B: Causal Chain Tracing + +For each issue, trace: *"If this issue is not resolved in [timeframe], what breaks next?"* + +Build a causal chain map -- text or structured tree -- showing dependency between issues. + +### 2C: Schwerpunkt Identification + +The Schwerpunkt is the issue that: + +- Sits upstream of 3+ other issues (highest causal leverage) +- Has the highest compounding rate (gets worse faster) +- Is within the project team's sphere of control or influence +- When resolved, creates the most downstream relief + +**Output:** + +> **SCHWERPUNKT: [Issue Name]** +> Root cause: [explanation] +> Downstream exposure: [list of affected issues/milestones] +> Window of intervention: [time before it becomes irreversible] + +--- + +## Step 3: Corrective Action Generation + +Generate 3-5 corrective actions ranked by the Cybereum Impact Matrix: + +| Action | Schedule Impact | Cost Impact | Effort Required | Risk Introduced | Recommendation Score | +|--------|----------------|-------------|-----------------|-----------------|---------------------| +| Option A | +/- days | +/- $ | Low/Med/High | Low/Med/High | Score 1-10 | + +**Scoring formula:** + +``` +Score = (Schedule Recovery x 0.35) + (Cost Efficiency x 0.25) + + (Execution Feasibility x 0.25) + (Risk Delta x 0.15) +``` + +Normalize each dimension 1-10. Weight toward schedule recovery for critical-path issues; toward cost efficiency for burn-rate issues. + +--- + +## Step 4: Critic Insight + +After generating recommendations, apply structured critique to the top action: + +**Critic questions:** + +1. **Assumption challenge**: What assumption in this recommendation is most likely wrong? +2. **Second-order effects**: What could this corrective action break elsewhere? +3. **Stakeholder friction**: Who in the project organization will resist this -- and why? +4. **Data dependency**: What data would change this recommendation if it were different? +5. **Timing sensitivity**: Is this recommendation time-bound? When does it stop being valid? + +Present the critic analysis honestly, even if it weakens the primary recommendation. This is the intellectual integrity layer. + +--- + +## Step 5: Editorial Synthesis + +Produce a clean, executive-ready decision memo structure: + +``` +DECISION BRIEF -- [Project Name] -- [Date] + +SITUATION +[2-3 sentence factual summary of current project state] + +SCHWERPUNKT +[1-2 sentences identifying the decisive constraint] + +RECOMMENDED ACTION +[Primary recommendation, stated clearly and directly] + +ALTERNATIVES CONSIDERED +[2 alternatives with brief rationale for not selecting] + +CRITIC PERSPECTIVE +[1-2 sentences on the strongest challenge to the recommendation] + +DECISION REQUIRED BY +[Date/milestone -- when this decision window closes] + +OWNER +[Role or team responsible for executing] +``` + +--- + +## Decision Patterns for Capital Projects + +### Pattern A: Schedule Recovery Decision + +**Trigger**: Project is behind schedule with a closing completion window +**Schwerpunkt focus**: Find the driving activity -- not the symptom +**Common corrective actions**: Fast-track overlapping phases, crash critical resources, rebaseline with board-approved scope reduction + +### Pattern B: Cost Overrun Intervention + +**Trigger**: CPI < 0.90 sustained over 2+ periods +**Schwerpunkt focus**: Find the cost account driving overrun (often 1-2 WBS elements cause 80% of variance) +**Common corrective actions**: Cost account replanning, scope freeze, vendor renegotiation, contingency drawdown with PMO approval + +### Pattern C: Procurement Crisis + +**Trigger**: Long-lead equipment delayed; critical path affected +**Schwerpunkt focus**: Single vendor or category with maximum exposure +**Common corrective actions**: Dual-source acceleration, engineering hold on dependent work, liquidated damages trigger assessment + +### Pattern D: Stakeholder Deadlock + +**Trigger**: Decisions are not being made; issues backlog growing +**Schwerpunkt focus**: Identify the specific decision that is blocking the chain +**Common corrective actions**: Executive escalation with pre-prepared decision package, time-box forcing function, third-party facilitation + +### Pattern E: Design Maturity Gap + +**Trigger**: Construction starting with IFC drawings <70% complete +**Schwerpunkt focus**: Engineering discipline driving the gap (civil? mechanical? E&I?) +**Common corrective actions**: Design freeze on critical systems, construction sequencing around available IFCs, engineering surge resourcing + +--- + +## Output Modes + +**Quick Mode** (user wants fast answer): Schwerpunkt + Top Action only. 3-5 sentences. + +**Standard Mode**: Full 5-step analysis with Decision Brief at the end. + +**Workshop Mode**: Interactive -- ask clarifying questions at each step. Co-build the analysis with the user across multiple turns. + +--- + +## Reference Files + +- `references/schwerpunkt-theory.md` -- Theoretical basis, military origins, capital project adaptation +- `references/decision-patterns.md` -- Extended pattern library for EPC, energy, defense, infrastructure +- `references/scoring-model.md` -- Detailed Impact Matrix scoring methodology + +--- + +## Interactive Decision Workshop Protocol + +When in Workshop Mode, apply structured interactive questioning (adapted from plan-ceo-review pattern): + +### Question Protocol + +Every question to the user MUST: +1. Present 2-3 concrete lettered options +2. State which option you recommend FIRST +3. Explain in 1-2 sentences WHY that option over the others + +**One issue = one question.** Never batch multiple decision points into one question. + +### Pre-Decision System Audit + +Before Schwerpunkt analysis, gather context: + +``` +1. What is the current project state across all 5 dimensions? +2. What decisions have been deferred or are overdue? +3. What changed since the last decision review? +4. Are there any active corrective actions in flight? +``` + +### Mode Selection (present to user): + +1. **QUICK**: Schwerpunkt + Top Action only. 3-5 sentences. For time-critical decisions. +2. **STANDARD**: Full 5-step analysis with Decision Brief. For governance meetings. +3. **WORKSHOP**: Interactive co-analysis. Ask clarifying questions at each step. For complex multi-stakeholder decisions. + +### Decision History Tracking + +Save decision records for accountability tracking: + +```bash +mkdir -p .cybereum/decisions +``` + +Save as `.cybereum/decisions/{project-slug}-{YYYY-MM-DD}-{seq}.json`: + +```json +{ + "date": "2026-03-14", + "project": "Project Name", + "schwerpunkt": "Engineering design maturity gap in civil discipline", + "recommended_action": "Design freeze on critical systems with construction sequencing around available IFCs", + "alternatives_considered": ["Engineering surge resourcing", "Rebaseline construction sequence"], + "decision_window": "2026-03-28", + "decision_made": null, + "owner": "VP Engineering" +} +``` + +Track open decisions across reviews. Flag any decision past its window as **EXPIRED -- escalate immediately**. + +--- + +## Critical Rules + +1. **Never recommend without a Schwerpunkt** -- surface-level recommendations miss the root cause +2. **Always include a Critic** -- intellectual honesty is non-negotiable in governance contexts +3. **State the decision window** -- every recommendation has an expiry +4. **Attribute uncertainty** -- if data is missing, state what assumption was made and flag it diff --git a/cybereum-evm-control/SKILL.md b/cybereum-evm-control/SKILL.md new file mode 100644 index 0000000..928549c --- /dev/null +++ b/cybereum-evm-control/SKILL.md @@ -0,0 +1,303 @@ +--- +name: cybereum-evm-control +version: 1.0.0 +description: | + Calculates, interprets, and reports Earned Value Management (EVM) metrics for capital projects + including CPI, SPI, TCPI, VAC, EAC, ETC, BCWS, BCWP, ACWP, and Earned Schedule. Use this skill + for any EVM analysis, cost performance reporting, budget forecasting, AACE cost control, cost + variance attribution, to-complete performance index analysis, or when a user mentions earned value, + cost performance index, schedule performance index, budget at completion, estimate at completion, + or asks about project financial health, cost burn rate, or whether a project will come in on budget. + Also use for EVMS compliance assessment and DCAA/DoD EVMS surveillance. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum EVM Control + +Earned Value Management intelligence engine for capital projects. Supports ANSI/EIA-748 EVMS compliance, AACE Cost Engineering standards, and DoD EVMS requirements. Transforms cost and schedule data into actionable performance intelligence. + +--- + +## EVM Fundamentals -- Quick Reference + +| Metric | Formula | Meaning | +|--------|---------|---------| +| **BCWS** (PV) | Budget x Planned % | What we planned to spend by now | +| **BCWP** (EV) | Budget x Earned % | Budgeted value of work actually done | +| **ACWP** (AC) | Actual cost incurred | What we actually spent | +| **CV** | BCWP - ACWP | Cost variance (positive = under budget) | +| **SV** | BCWP - BCWS | Schedule variance in $ (positive = ahead) | +| **CPI** | BCWP / ACWP | Cost efficiency (>1.0 = under budget) | +| **SPI** | BCWP / BCWS | Schedule efficiency (>1.0 = ahead of plan) | +| **EAC** | BAC / CPI | Estimate at Completion (most common) | +| **ETC** | EAC - ACWP | Estimate to Complete | +| **VAC** | BAC - EAC | Variance at Completion | +| **TCPI** | (BAC - BCWP) / (BAC - ACWP) | Required future efficiency to meet BAC | + +### EAC Calculation Methods + +``` +Method 1 (CPI trend): EAC = BAC / CPI +Method 2 (Replan remaining): EAC = ACWP + ETC (bottom-up) +Method 3 (CPI x SPI): EAC = ACWP + [(BAC - BCWP) / (CPI x SPI)] +Method 4 (To-complete at plan): EAC = ACWP + (BAC - BCWP) +``` + +**Default**: Use Method 1 for stable CPI; Method 3 for schedule-constrained, cost-sensitive programs. + +--- + +## Step 1: Data Intake + +Request or accept the following inputs: + +**Required:** + +- BAC (Budget at Completion) +- BCWS (Planned Value at status date) +- BCWP (Earned Value at status date) +- ACWP (Actual Cost at status date) +- Status Date (data date) +- Project planned finish date + +**Optional (for deeper analysis):** + +- Prior period BCWP and ACWP (trend calculation) +- WBS breakdown (cost account level data) +- Contract type (CPFF, FFP, T&M, CPAF) +- Management Reserve and Contingency remaining +- Change order backlog + +--- + +## Step 2: Performance Calculation + +Compute all standard metrics. Present in a Performance Dashboard: + +``` +=================================================================== +CYBEREUM EVM DASHBOARD -- [Project Name] -- [Status Date] +=================================================================== + +BUDGET BASELINE + BAC: $[X]M + Management Reserve: $[X]M + Contract Budget Base: $[X]M + +CURRENT PERFORMANCE + BCWS (PV): $[X]M (Plan: [X]% complete) + BCWP (EV): $[X]M (Earned: [X]% complete) + ACWP (AC): $[X]M (Spent to date) + +VARIANCES + Cost Variance: $[+/-X]M ([+/-X]%) + Schedule Variance: $[+/-X]M ([+/-X]%) + +INDICES + CPI: [X.XX] [On track / Watch / Critical] + SPI: [X.XX] [On track / Watch / Critical] + SPI(t): [X.XX] (Earned Schedule method) + +FORECAST + EAC: $[X]M (Method 1: BAC/CPI) + EAC: $[X]M (Method 3: CPI x SPI) + VAC: $[+/-X]M ([+/-X]% of BAC) + TCPI: [X.XX] [Achievable / Aggressive / Unrealistic] +=================================================================== +``` + +--- + +## Step 3: Performance Interpretation + +### CPI Interpretation + +``` +CPI > 1.10: Under budget -- verify earned value methodology +CPI 1.00-1.10: Healthy -- monitor for sustainability +CPI 0.90-1.00: Watch zone -- investigate top cost accounts +CPI 0.80-0.90: Critical -- corrective action required +CPI < 0.80: Emergency -- rebaseline likely needed +``` + +### TCPI Interpretation + +``` +TCPI < 1.00: Remaining work is easier than average to date -- realistic +TCPI 1.00-1.10: Achievable with improved efficiency +TCPI 1.10-1.20: Aggressive -- requires specific corrective actions +TCPI > 1.20: Unrealistic -- budget recovery unlikely without scope reduction or BAC revision +``` + +**If TCPI > 1.10:** Automatically flag and recommend Estimate at Completion revision with PMO notification. + +### Earned Schedule Analysis + +``` +ES = Month of plan where BCWP line intersects BCWS curve +SPI(t) = ES / AT (Actual Time elapsed) +SV(t) = ES - AT (in months -- more meaningful than $ SV late in project) +IEAC(t) = PD / SPI(t) (Independent EAC in time) +``` + +--- + +## Step 4: Variance Attribution + +When WBS/cost account data is available: + +1. **Pareto Analysis**: Rank cost accounts by absolute cost variance. Top 20% of accounts typically drive 80% of variance. +2. **Variance root cause categories:** + - Productivity (hours/unit above estimate) + - Labor rate (wages above estimate) + - Scope growth (more work than planned) + - Quantity variance (more material than estimated) + - Schedule-driven costs (acceleration premium, standby time) +3. **Format:** + +``` +WBS | Account | BAC | BCWP | ACWP | CPI | Variance $ | Root Cause +``` + +--- + +## Step 5: Trend Analysis + +If multiple reporting periods are available: + +**CPI Trend Chart (text-based):** + +``` +Period | CPI | Trend +P1 | 0.95 | -- +P2 | 0.93 | down (-0.02) +P3 | 0.91 | down (-0.02) +P4 | 0.90 | stable +``` + +**CPI Stability Rule**: If CPI has been stable (+/-0.02) for 3+ periods, it is highly predictive of final CPI. EAC = BAC / CPI is reliable in this case. + +**CPI Declining Rule**: If CPI declining >0.02/period, escalate immediately. EAC is worsening; corrective action must precede next report. + +--- + +## Step 6: EVMS Compliance Check + +For DoD/government programs requiring ANSI/EIA-748 compliance: + +Assess across 5 criteria groups: + +1. **Organization** (Guidelines 1-5): WBS integration, control accounts, OBS +2. **Planning & Scheduling** (Guidelines 6-14): PMB, time-phasing, undistributed budget +3. **Accounting** (Guidelines 15-22): Actual cost recording, unit of measure +4. **Analysis** (Guidelines 23-27): Variance analysis threshold, corrective actions +5. **Revisions** (Guidelines 28-32): IBR, EAC methodology, MR management + +Flag any guideline with non-compliance as a surveillance finding. + +--- + +## Standard Reporting Formats + +### Monthly EVM Report Sections + +1. Executive Summary (CPI, SPI, EAC vs. BAC) +2. Performance Dashboard table +3. Top 5 Cost Variance accounts with root cause +4. Trend chart (last 6 periods) +5. Corrective action status (from prior report) +6. New corrective actions this period +7. Risk-adjusted EAC range (P50/P80) + +### Format Flags + +- **Report for Owner**: Emphasis on EAC, VAC, milestone forecast dates +- **Report for PMO**: Full EVM metrics, variance attribution, corrective actions +- **Report for DoD/Contracting Officer**: EVMS compliance, IBR status, CAM signatures +- **Report for Board/Lenders**: Budget health summary, contingency adequacy, completion confidence + +--- + +## Reference Files + +- `references/aace-rp10s-90.md` -- AACE Total Cost Management Framework +- `references/ansi-eia-748.md` -- 32 EVMS guidelines summary +- `references/eac-methods-comparison.md` -- When to use each EAC formula +- `references/evm-sector-benchmarks.md` -- Typical CPI/SPI ranges by project type and phase + +--- + +## EVM History & Trend Tracking + +Persist EVM snapshots for automated trend analysis (adapted from retro pattern): + +```bash +mkdir -p .cybereum/evm-snapshots +``` + +Save as `.cybereum/evm-snapshots/{project-slug}-{YYYY-MM-DD}.json`: + +```json +{ + "date": "2026-03-14", + "project": "Project Name", + "status_date": "2026-03-10", + "bac": 150000000, + "bcws": 82500000, + "bcwp": 76200000, + "acwp": 84700000, + "cpi": 0.90, + "spi": 0.92, + "spi_t": 0.89, + "eac_method1": 166700000, + "eac_method3": 181200000, + "vac": -16700000, + "tcpi": 1.13, + "contingency_remaining": 8500000, + "mr_remaining": 3200000, + "top_variance_accounts": [ + { "wbs": "03.02", "name": "Civil Works", "cv": -4200000, "cpi": 0.82 }, + { "wbs": "04.01", "name": "Mechanical", "cv": -2800000, "cpi": 0.87 } + ] +} +``` + +### Automated Trend Analysis + +When prior snapshots exist, compute and display: + +``` + P1 P2 P3 P4 (Now) Trend +CPI: 0.95 0.93 0.91 0.90 Declining (-0.02/period) +SPI: 0.98 0.96 0.94 0.92 Declining (-0.02/period) +EAC (M1): $157.9 $161.3 $164.8 $166.7 Rising (+$2.9M/period) +TCPI: 1.05 1.08 1.10 1.13 Rising (approaching unrealistic) +Contingency: $12.1M $10.8M $9.6M $8.5M Declining (-$1.2M/period) +``` + +**Automatic alerts:** +- CPI declining >0.02/period for 3+ periods: **SUSTAINED DECLINE -- escalate** +- TCPI > 1.10 and rising: **BUDGET RECOVERY UNLIKELY -- recommend EAC revision** +- Contingency burn rate exceeds risk-adjusted plan: **CONTINGENCY EXHAUSTION RISK** + +### Compare Mode + +When user requests comparison: load two snapshots and produce side-by-side delta analysis with narrative highlighting the biggest improvements and regressions. + +--- + +## Troubleshooting + +**BCWP > BCWS but CPI < 1.0**: Project is ahead of schedule but over budget. Acceleration costs are driving cost overrun despite earned value. + +**CPI improving late in project**: May indicate gaming (inflating earned value) or genuine efficiency gain. Verify with physical percent complete independently. + +**EAC < BAC with CPI < 1.0**: Check if replan or scope reduction occurred. Low CPI + EAC < BAC is mathematically inconsistent without a change. + +**SPI approaches 1.0 near project end**: Normal -- SPI always converges to 1.0 at completion regardless of schedule overrun. Use SPI(t) for late-project schedule assessment instead. diff --git a/cybereum-executive-reporting/SKILL.md b/cybereum-executive-reporting/SKILL.md new file mode 100644 index 0000000..30ab9e3 --- /dev/null +++ b/cybereum-executive-reporting/SKILL.md @@ -0,0 +1,284 @@ +--- +name: cybereum-executive-reporting +version: 1.0.0 +description: | + Generates professional executive reports, status briefings, board presentations, and governance + documents for capital projects. Produces DOCX, PPTX, or PDF-ready structured content following + capital project reporting standards. Use this skill whenever a user asks to generate a project + status report, executive summary, monthly report, owner report, board briefing, project dashboard + narrative, PMO report, lender report, or any formal written communication about capital project + performance. Also use for commissioning reports, phase completion reports, milestone reports, and + investor briefings on capital programs. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Executive Reporting + +Professional report generation engine for capital project governance. Produces publication-ready content calibrated to the audience -- from PMO dashboards to lender reports to board briefings -- following AACE, PMI, and industry best-practice reporting standards. + +--- + +## Report Type Selection + +Identify the report type first: + +| Report Type | Primary Audience | Key Sections | Typical Length | +|---|---|---|---| +| Monthly Progress Report | Owner / PMO | Status, EVM, Schedule, Risk, Actions | 8-15 pages | +| Executive Dashboard Narrative | Executive Leadership | Headlines, KPIs, Decisions Required | 2-3 pages | +| Board Briefing | Board of Directors | Program health, financials, major decisions | 10-15 slides | +| Lender/Investor Report | Lenders, Equity | Covenant compliance, forecast, risk | 5-10 pages | +| Phase Completion Report | Owner, Regulator | Scope achieved, lessons learned, next phase | 10-20 pages | +| Risk Report | PMO, Owner | Risk register, new risks, mitigation status | 5-8 pages | +| Commissioning Report | Operations, Owner | System completion, punch list, turnover | Variable | +| Milestone Briefing | Stakeholders | Milestone achieved, next milestone path | 2-3 pages | + +--- + +## Step 1: Report Intake + +Gather from the user: + +1. **Report type** (from table above) +2. **Project name, type, location, phase** +3. **Reporting period** (e.g., "March 2026" or "Q1 2026") +4. **Key metrics available**: % complete, cost spent, schedule status, CPI, SPI +5. **Notable events this period**: Issues, decisions, milestones achieved +6. **Audience**: Technical / Executive / Financial / Regulatory +7. **Format needed**: DOCX / PPTX / narrative text + +--- + +## Step 2: Structure Generation + +### Monthly Progress Report Structure + +``` +1. EXECUTIVE SUMMARY + - Project status indicator (Green / Yellow / Red) + - Top 3 accomplishments this period + - Top 3 challenges / issues + - Forecast vs. baseline (1 sentence each: cost, schedule) + +2. PROJECT STATUS OVERVIEW + - Scope: % engineering, % procurement, % construction complete + - Overall project % complete (weighted) + - Key milestone status table + +3. SCHEDULE PERFORMANCE + - Current forecast vs. baseline completion + - SPI / SPI(t) with trend + - Critical path narrative + - Near-critical risks + +4. COST PERFORMANCE + - Spent to date vs. plan + - CPI with trend + - EAC vs. BAC + - Contingency remaining + +5. PROCUREMENT STATUS + - Long-lead equipment tracker + - Purchase order status summary + - Expediting concerns + +6. RISK & OPPORTUNITY REGISTER (SUMMARY) + - New risks this period + - Closed risks + - Top 5 active risks + +7. HSE PERFORMANCE + - Incident rates (TRIR, LTIR) + - Notable safety events + - Manhours worked + +8. DECISIONS REQUIRED + - Numbered list of decisions needed from Owner/Executive + - Each with: Description, By Whom, By When, Consequence of Delay + +9. ACTION ITEMS + - Open actions from prior report (status) + - New actions this period + +10. NEXT PERIOD LOOKAHEAD + - Key planned activities next 30/60/90 days + - Upcoming milestones + - Anticipated decisions +``` + +### Executive Dashboard Narrative Structure + +``` +PROJECT HEALTH: [GREEN / YELLOW / RED] + +HEADLINE: [1 sentence -- most important thing to know right now] + +THIS PERIOD +- [Accomplishment 1] +- [Accomplishment 2] +- [Issue / Challenge] + +PERFORMANCE SNAPSHOT + Schedule: [On track / X days behind / X days ahead] + Cost: [On budget / X% over / X% under -- EAC vs. BAC] + Forecast: [Completion date -- P50] + +TOP RISK: [Single sentence on the highest risk item] + +DECISION REQUIRED: [If any -- what, from whom, by when] +``` + +--- + +## Step 3: Content Generation Rules + +### Writing Style for Capital Projects + +**Use active voice:** +- Good: "The contractor completed structural steel erection on Unit 1." +- Bad: "Structural steel erection on Unit 1 was completed by the contractor." + +**Quantify everything:** +- Good: "Overall project is 67% complete, 4 points behind the baseline of 71%." +- Bad: "The project is slightly behind schedule." + +**Lead with performance, follow with cause:** +- Good: "The project is 8 days behind schedule, driven by a 3-week weather delay in civil works during February." +- Bad: "There was bad weather in February, which caused the project to fall behind." + +**State decisions directly:** +- Good: "A decision is required by March 15 to authorize the additional $2.4M for foundation re-design." +- Bad: "There may need to be some additional funding approved at some point." + +### Status Color Indicators + +``` +GREEN: On track. No issues requiring escalation. CPI > 0.95, SPI > 0.95. +YELLOW: Watch status. One or more metrics off-track; corrective actions in place. +RED: Off-track. Recovery plan required. CPI < 0.90 or SPI < 0.85 or milestone missed. +``` + +### Numerical Formatting Standards + +- Cost: Report in $M to 1 decimal (e.g., "$142.5M") +- Percentages: Report to 1 decimal (e.g., "67.3% complete") +- Dates: Use full month name (e.g., "March 15, 2026") +- Indices: Report to 2 decimal places (CPI: 0.94) +- Days: Always state "working days" or "calendar days" + +--- + +## Step 4: Audience Calibration + +### For Executives / Board + +- Eliminate all jargon. Replace with business language. +- Lead with financial exposure, not technical details. +- Every issue must connect to: "What does this mean for our budget and completion date?" +- Include "Decisions Required" as a prominent section. +- Maximum 1 page of text per topic. + +### For Technical PMO / Engineers + +- Include activity-level detail, logic analysis, index calculations. +- Use standard EVM and schedule terminology. +- Include root cause analysis, not just symptoms. +- Reference specific activity IDs, WBS codes, contract packages. + +### For Lenders / Financial Stakeholders + +- Emphasize covenant compliance (debt service coverage, drawdown milestones). +- Include contingency remaining vs. identified risks (coverage ratio). +- Reference independent engineer findings if available. +- Avoid acknowledging issues not already disclosed in loan documents without counsel review note. + +### For Regulators (NRC, FERC, DoD) + +- Use regulatory-specific terminology and reference class designations. +- State compliance status explicitly for each applicable requirement. +- Include open items log with due dates. + +--- + +## Step 5: Report Assembly Protocol + +1. **Draft Executive Summary last** -- after all sections are complete, the headline becomes clear +2. **Table formatting**: All performance tables should have: Metric | Baseline | Current | Variance | Trend +3. **Milestone table standard:** + +``` +Milestone | Baseline Date | Forecast Date | Variance | Status +``` + +4. **Action item table standard:** + +``` +# | Action | Owner | Due Date | Status | Notes +``` + +5. **Risk summary table standard:** + +``` +Rank | Risk | Probability | Impact | Score | Mitigation | Owner | Status +``` + +--- + +## Step 6: Document Creation + +When creating the actual file: + +- **DOCX**: Read the docx SKILL.md at `/mnt/skills/public/docx/SKILL.md` before generating +- **PPTX**: Read the pptx SKILL.md at `/mnt/skills/public/pptx/SKILL.md` before generating +- **PDF**: Read the pdf SKILL.md at `/mnt/skills/public/pdf/SKILL.md` before generating + +Apply Cybereum visual identity: + +- Primary color: Dark navy `#0A0E1A` / Electric blue `#00D4FF` +- Secondary: Slate gray `#1E2440` +- Accent: Amber `#FFB800` for warnings; Red `#FF3B30` for critical +- Font: Headers in bold sans-serif; body in clean sans-serif +- Logo: Include Cybereum wordmark in header if branding applies + +--- + +## Cross-Skill Integration + +Executive Reporting integrates with all other Cybereum skills to pull live analysis: + +- **Schedule Intelligence**: Pull DCMA 14-Point scorecard and critical path narrative for Schedule section +- **EVM Control**: Pull CPI/SPI dashboard and EAC forecast for Cost section +- **Risk Engine**: Pull top 5 risks and mitigation status for Risk section +- **Completion Prediction**: Pull P50/P80 forecast dates for Forecast section +- **Decision-AI**: Pull Schwerpunkt and corrective actions for Decisions Required section +- **Reference Class**: Pull optimism bias assessment for Lender/Investor reports + +When generating a report, invoke the relevant skill analysis before writing each section. + +--- + +## Reference Files + +- `references/reporting-standards.md` -- AACE RP 11R-88 Progress and Performance reporting +- `references/executive-writing-guide.md` -- Capital project executive communication principles +- `references/sector-report-templates.md` -- Report templates by sector: Nuclear, EPC, Defense, Infrastructure +- `assets/cybereum-report-template.md` -- Base template with Cybereum formatting + +--- + +## Quality Checklist (Before Finalizing) + +- [ ] Every metric has a baseline for comparison +- [ ] Every issue has a mitigation or decision linked +- [ ] Status color matches the narrative (no "GREEN" with unmitigated critical path slip) +- [ ] Decisions Required section is complete and specific +- [ ] All numbers cross-reference (EAC in narrative = EAC in table) +- [ ] Reading level appropriate for stated audience +- [ ] No passive voice in executive sections +- [ ] Dates are specific, not relative ("by end of Q2" -> "by June 30, 2026") diff --git a/cybereum-reference-class/SKILL.md b/cybereum-reference-class/SKILL.md new file mode 100644 index 0000000..b2d961e --- /dev/null +++ b/cybereum-reference-class/SKILL.md @@ -0,0 +1,244 @@ +--- +name: cybereum-reference-class +version: 1.0.0 +description: | + Applies Reference Class Forecasting (RCF) methodology to capital projects using historical + outside-view benchmarks to correct optimism bias in cost and schedule estimates. Use this skill + when a user asks about reference class forecasting, outside view, Flyvbjerg methodology, optimism + bias, base rate, historical project benchmarks, megaproject cost overruns, schedule overrun + percentages, how this project compares to similar projects, or wants to validate whether a project + estimate is realistic based on industry history. Also use for nuclear, infrastructure, defense, + transit, energy, and data center project benchmarking. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Reference Class Forecasting + +Outside-view benchmarking engine for capital project estimates. Applies Bent Flyvbjerg's Reference Class Forecasting (RCF) methodology -- now endorsed by the UK Treasury Green Book, APA, and adopted by major project owners -- to correct systematic optimism bias in project cost and schedule estimates. + +> *"The inside view is always optimistic. The outside view is always closer to the truth."* +> -- Bent Flyvbjerg, Oxford Said Business School + +--- + +## Theoretical Foundation + +### The Planning Fallacy + +Kahneman and Tversky identified that people systematically underestimate costs and durations when viewing projects from the "inside view" -- focusing on the specific project's plan while ignoring the base rate of similar projects. + +**RCF corrects this by:** + +1. Selecting a reference class of similar projects +2. Establishing the historical distribution of outcomes for that class +3. Applying the base rate to the current project +4. Adjusting for project-specific risk factors + +### Why This Matters + +- Average cost overrun for megaprojects (>$1B): **45%** (Flyvbjerg 2003, updated 2022) +- Average schedule overrun for megaprojects: **52%** +- Nuclear projects: Cost overrun **117%** average; schedule overrun **100%+** +- Rail/transit projects: Cost overrun **44.7%** on average +- IT/software projects: Schedule overrun **27%**; budget overrun **56%** + +--- + +## Step 1: Reference Class Selection + +**The most critical step** -- the reference class must be specific enough to be informative but broad enough to have statistical validity. + +### Selection criteria: + +1. **Project type**: Same type of infrastructure/facility (nuclear, refinery, hospital, transit, etc.) +2. **Delivery method**: EPC, EPCM, DBB, DB, P3 +3. **Scale bracket**: Similar TIC range (+/-50% of estimated TIC) +4. **Geography**: Same or similar regulatory environment +5. **Era**: Projects completed within last 15-20 years (older data less relevant) +6. **Owner type**: Government / IOC / NOC / Private + +**Reference class selection guidance by sector:** + +| Project Type | Recommended Reference Class | Minimum Sample | +|---|---|---| +| Nuclear (SMR) | First-of-a-kind nuclear in Western regulatory environments | N>8 | +| Nuclear (conventional) | Large LWR projects post-2000 | N>12 | +| LNG / Petrochemical | Grassroots LNG or world-scale chemical complex | N>15 | +| Highway / Road | Interstate highway expansion, similar terrain | N>20 | +| Rail / Transit | Urban rail, commuter rail by city typology | N>15 | +| Data Center | Hyperscale data center campus | N>20 | +| Offshore Wind | Offshore wind farm >300 MW | N>12 | +| Defense Platform | DoD Major Defense Acquisition Program (MDAP) | N>15 | +| Building (complex) | Complex institutional building (hospital, lab) | N>20 | + +--- + +## Step 2: Historical Benchmark Database + +### Cost Overrun Benchmarks (% of original estimate) + +``` +Project Type | Mean Overrun | Median | P80 Overrun | Worst Quartile +Nuclear (post-2010) | +117% | +95% | +180% | >200% +LNG / Petrochemical | +35% | +28% | +55% | >80% +Highway/Road | +20% | +15% | +35% | >50% +Urban Rail/Transit | +45% | +38% | +65% | >90% +Offshore Wind | +18% | +12% | +32% | >45% +Defense Acquisition | +43% | +32% | +70% | >100% +Hospitals (complex) | +28% | +22% | +45% | >65% +Data Centers (hyper.) | +12% | +8% | +22% | >35% +``` + +### Schedule Overrun Benchmarks (% of original schedule) + +``` +Project Type | Mean Overrun | Median | P80 Overrun +Nuclear (post-2010) | +105% | +85% | +165% +LNG / Petrochemical | +30% | +22% | +50% +Highway/Road | +22% | +16% | +38% +Urban Rail/Transit | +52% | +42% | +75% +Offshore Wind | +20% | +15% | +35% +Defense Acquisition | +51% | +40% | +82% +Hospitals (complex) | +32% | +25% | +50% +Data Centers (hyper.) | +15% | +10% | +25% +``` + +--- + +## Step 3: Outside-View Estimate + +Apply the reference class distribution to the current project: + +### Calculate the Reference Class Adjusted Estimate (RCAE) + +``` +RCAE (P50 cost) = Current Estimate x (1 + Mean Reference Class Cost Overrun) +RCAE (P80 cost) = Current Estimate x (1 + P80 Reference Class Cost Overrun) +RCAE (P50 schedule) = Current Duration x (1 + Mean Reference Class Schedule Overrun) +RCAE (P80 schedule) = Current Duration x (1 + P80 Reference Class Schedule Overrun) +``` + +**Example (Nuclear SMR):** + +``` +Current Estimate: $1.2B / 72-month construction +P50 RCAE Cost: $1.2B x 2.17 = $2.60B +P80 RCAE Cost: $1.2B x 2.80 = $3.36B +P50 RCAE Schedule: 72 x 1.85 = 133 months +P80 RCAE Schedule: 72 x 2.65 = 191 months +``` + +--- + +## Step 4: Inside-View Adjustment (Project-Specific Factors) + +Adjust the outside-view estimate up or down based on documented project-specific factors: + +### Upward Risk Factors (increase estimate): + +| Factor | Typical Adjustment | +|--------|-------------------| +| First-of-a-kind technology | +10-25% | +| Novel regulatory environment | +5-15% | +| Remote/difficult site | +5-20% | +| Compressed schedule mandate | +8-15% | +| Multi-contractor complexity | +5-10% | +| Political/stakeholder volatility | +5-15% | +| Emerging market location | +10-25% | + +### Downward Risk Factors (decrease estimate): + +| Factor | Typical Adjustment | +|--------|-------------------| +| Proven technology, NOAK | -5-15% | +| Experienced owner/PM team | -5-10% | +| Fixed-price contract (verified) | -5-10% (shifts risk, not cost) | +| Modular/prefabricated design | -5-15% | +| Favorable regulatory precedent | -3-8% | + +**Final RCAE:** + +``` +RCAE_adjusted = RCAE_base x (1 + Sum(upward adjustments)) x (1 - Sum(downward adjustments)) +``` + +--- + +## Step 5: Optimism Bias Report + +Generate a structured outside-view assessment: + +``` +REFERENCE CLASS FORECAST -- [Project Name] -- [Date] + +CURRENT ESTIMATE: $[X]M / [X] months +REFERENCE CLASS: [Specific class selected] (N=[sample size]) +Source: [Flyvbjerg database / DoD DAES / GAO / RAND / McKinsey Global Institute] + +OUTSIDE-VIEW RESULTS: + P50 Cost: $[X]M ([+X]% above current estimate) + P80 Cost: $[X]M ([+X]% above current estimate) + P50 Schedule: [X] months ([+X]% above current estimate) + P80 Schedule: [X] months ([+X]% above current estimate) + +PROJECT-SPECIFIC ADJUSTMENTS: + Upward factors: [List] -> +[X]% + Downward factors: [List] -> -[X]% + Net adjustment: [+/-X]% + +ADJUSTED RCAE: + Adjusted P50 Cost: $[X]M + Adjusted P80 Cost: $[X]M + Adjusted P50 Schedule: [X] months + Adjusted P80 Schedule: [X] months + +OPTIMISM BIAS DETECTED: [Yes/No] + Current estimate is at the [X]th percentile of reference class outcomes. + There is only a [X]% probability that this project completes at or below current estimate. + +RECOMMENDATION: + [Specific recommendation on estimate adequacy, contingency, schedule commitment] +``` + +--- + +## Step 6: Contingency Adequacy Assessment + +Based on the RCAE: + +``` +Required P80 contingency = RCAE (P80) - Current Estimate +Current contingency = [User-provided] +Contingency gap = Required P80 contingency - Current contingency + +Contingency adequacy rating: + Gap <= 0: Adequate (current contingency sufficient for P80) + Gap 1-10%: Marginal -- monitor closely + Gap 10-25%: Insufficient -- additional contingency recommended + Gap > 25%: Severely underestimated -- reestimate required +``` + +--- + +## Reference Files + +- `references/flyvbjerg-database.md` -- Summary of Flyvbjerg's megaproject database (2,062 projects) +- `references/uk-treasury-guidance.md` -- HM Treasury Green Book RCF methodology +- `references/sector-benchmarks-detailed.md` -- Expanded benchmarks by sector, geography, era +- `references/optimism-bias-research.md` -- Academic foundation: Kahneman, Lovallo, Flyvbjerg + +--- + +## Application Notes + +**For nuclear programs**: Aalo Atomics, NuScale, and other SMR developers face FOAK risks that push cost and schedule outcomes toward the worst quartile of the nuclear reference class. Reference Class Forecasting is essential for credible investor and regulatory communication. + +**For government/DoD programs**: The DoD uses Independent Cost Estimates (ICE) which implicitly apply reference class logic. Cybereum RCF aligns with GAO Schedule Assessment Guide and CAPE ICE methodology. + +**Investor communication**: P80 RCAE provides a defensible, intellectually honest cost/schedule range for lender diligence, SBIR/STTR reporting, and board-level governance. diff --git a/cybereum-risk-engine/SKILL.md b/cybereum-risk-engine/SKILL.md new file mode 100644 index 0000000..c6a9cf8 --- /dev/null +++ b/cybereum-risk-engine/SKILL.md @@ -0,0 +1,312 @@ +--- +name: cybereum-risk-engine +version: 1.0.0 +description: | + Identifies, classifies, scores, and generates mitigation strategies for capital project risks + including external risks (geopolitical, regulatory, supply chain, weather, labor market, commodity + price) and internal risks (design maturity, contractor performance, procurement status, resource + availability). Use this skill whenever a user asks to identify risks, run a risk assessment, build + a risk register, generate project-specific risks, evaluate risk exposure, calculate risk-adjusted + contingency, or asks "what risks should we be worried about" on any EPC, energy, infrastructure, + nuclear, defense, or infrastructure capital project. Also use for risk pipeline generation, risk + scoring, and risk-based schedule contingency. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Risk Engine + +AI-powered risk intelligence for capital projects. Combines LLM-driven external risk generation with structured internal risk assessment to produce a project-specific, actionable risk register calibrated to AACE, PMI, and DoD risk management standards. + +--- + +## Risk Taxonomy + +Cybereum classifies risks across two primary axes: + +### External Risks (Outside Project Control) + +| Category | Examples | +|----------|----------| +| Geopolitical | Sanctions, trade restrictions, export controls, country risk | +| Regulatory & Permitting | Environmental permits, NEPA, FERC, NRC, local zoning | +| Supply Chain | Long-lead equipment, material shortages, single-source dependencies | +| Commodity Price | Steel, copper, cement, rare earth, fuel | +| Labor Market | Craft labor availability, wage inflation, union actions | +| Climate & Weather | Extreme weather windows, hurricane season, permafrost | +| Technology | Emerging technology performance risk, obsolescence | +| Financial Market | Financing availability, interest rate changes, FX exposure | + +### Internal Risks (Within Project Sphere) + +| Category | Examples | +|----------|----------| +| Design Maturity | IFC completeness %, design changes, interdisciplinary conflicts | +| Contractor Performance | Schedule adherence, quality non-conformances, workforce productivity | +| Procurement | Long-lead status, expediting effectiveness, vendor qualification | +| Scope Definition | Scope gaps, change order volume, owner-supplied items | +| Estimating | Basis of estimate confidence, contingency adequacy, scope inclusions | +| Integration | Interfaces between packages, system tie-ins, commissioning readiness | +| Organizational | Staffing gaps, decision authority, owner-contractor alignment | + +--- + +## Step 1: Project Context Intake + +Before generating risks, establish: + +1. **Project type**: Oil & gas / Power / Nuclear / Infrastructure / Defense / Data center +2. **Phase**: FEED / Detailed Engineering / Procurement / Construction / Commissioning +3. **Location**: Country, region, site conditions +4. **Scale**: TIC estimate range, duration, workforce peak +5. **Key constraints**: Fixed completion date, permitting milestones, regulatory approvals pending +6. **Existing risk register**: If provided, extend it; if not, generate from scratch + +--- + +## Step 2: External Risk Generation (LLM Pipeline) + +Generate project-specific external risks by reasoning across all eight external categories. + +**For each risk, produce:** + +``` +Risk ID | Category | Risk Description | Trigger Conditions | +Probability (1-5) | Impact (1-5) | Risk Score (PxI) | +Early Warning Indicators | Mitigation Strategy | Owner | Status +``` + +**Risk generation prompt pattern:** + +``` +For a [project type] in [location] currently in [phase]: +- What supply chain risks are most likely given current market conditions? +- What regulatory/permitting risks exist for this project type? +- What commodity price exposures affect the critical cost accounts? +- What labor market conditions create workforce risk? +``` + +Generate minimum 15 external risks. Flag the top 5 by risk score as **Priority External Risks**. + +--- + +## Step 3: Internal Risk Assessment + +Assess internal risks based on project data provided: + +### Design Maturity Risk Matrix + +``` +IFC Completeness | Risk Level | Contingency Implication +>90% | Low | 5-8% on remaining scope +70-90% | Moderate | 8-15% +50-70% | High | 15-25% +<50% | Critical | 25-40%+ +``` + +### Contractor Performance Scoring + +- SPI < 0.90: Flag as High performance risk +- Productivity factor < 0.80: Flag as Critical +- NCR rate trending up: Flag as Quality risk +- Labour turnover > 20%: Flag as workforce stability risk + +### Procurement Risk Assessment + +For each long-lead item: + +``` +Equipment | Vendor | Lead Time | Current Status | Float on Path | +Risk Level | Action Required +``` + +Flag any long-lead item where: Lead Time > Float on driving path + +--- + +## Step 4: Risk Scoring and Prioritization + +### Probability x Impact Matrix + +``` + Impact +P 1 (Negligible) | 2 (Minor) | 3 (Moderate) | 4 (Major) | 5 (Catastrophic) +5 (VH) 5 10 15 20 25 +4 (H) 4 8 12 16 20 +3 (M) 3 6 9 12 15 +2 (L) 2 4 6 8 10 +1 (VL) 1 2 3 4 5 +``` + +Priority risks (score >= 12) require active mitigation. + +### Risk-Adjusted Contingency + +Calculate P50 and P80 contingency reserve using simplified Monte Carlo approach: + +``` +P50 Contingency = Sum(Risk Score x Estimated Cost Impact) x 0.5 +P80 Contingency = Sum(Risk Score x Estimated Cost Impact) x 0.8 +Apply to: TIC estimate as % contingency line +``` + +--- + +## Step 5: Mitigation Strategy Generation + +For each Priority Risk (score >= 12), generate: + +1. **Mitigation action**: Specific, executable step to reduce probability or impact +2. **Contingency plan**: What to do if the risk materializes +3. **Early warning trigger**: The observable signal that activates contingency +4. **Owner**: Role responsible for mitigation execution +5. **Due date**: When mitigation must be implemented to remain effective + +**Mitigation quality check:** + +- Does it address root cause or just symptom? +- Is it within project team's control? +- Does it have a measurable outcome? +- Is the cost of mitigation less than the expected risk value? + +--- + +## Step 6: Risk Register Output + +Produce a structured risk register in this format: + +### Executive Risk Summary + +- Total risks identified: [N] +- Priority risks (score >= 12): [N] +- Risk-adjusted contingency recommendation: [P50: $X | P80: $Y] +- Top 3 risks requiring immediate owner action + +### Risk Register Table + +Full table with all fields per Step 2 + +### Risk Heatmap Description + +Text-based 5x5 matrix showing risk distribution + +### Mitigation Action Plan + +For priority risks only -- owner, action, due date, status + +--- + +## Sector-Specific Risk Libraries + +### Nuclear (NRC-regulated) + +High-priority sectors: Design certification, quality assurance program, ITAAC completion, supplier qualification, NRC inspection findings + +### Defense / DoD Programs + +High-priority sectors: Export control (ITAR/EAR), clearance requirements, government furnished equipment (GFE), DCAA audit exposure, CDRL deliverables + +### Energy Infrastructure (FERC/NEPA) + +High-priority sectors: ROW acquisition, environmental permits, interconnection queue, off-take agreement execution + +### EPC / Industrial + +High-priority sectors: Module fabrication quality, vendor-supplied engineering, multi-contract interface, productivity benchmarks + +--- + +## Reference Files + +- `references/risk-taxonomy-detail.md` -- Expanded risk category definitions and examples +- `references/contingency-methodology.md` -- AACE RP 40R-08 contingency determination +- `references/sector-risk-libraries.md` -- Nuclear, defense, energy, EPC sector-specific risks +- `references/early-warning-indicators.md` -- KPI thresholds that signal emerging risks + +--- + +## Two-Pass Risk Review Protocol + +Apply risk findings in two severity tiers (adapted from review checklist pattern): + +**Pass 1 -- CRITICAL (requires immediate mitigation):** +- Any risk with score >= 16 (High probability x Major/Catastrophic impact) +- Negative float driven by risk materialization +- Single-source procurement dependencies on critical path +- Regulatory compliance gaps with deadline exposure +- Safety/environmental risks with no mitigation in place + +**Pass 2 -- MONITORING (track and prepare):** +- Risks with score 9-15 +- Emerging risks identified but not yet scored +- Risks with mitigations in place but approaching trigger thresholds +- Market/commodity risks requiring hedge decisions + +Output format: +``` +Risk Review: N risks (X critical, Y monitoring) + +**CRITICAL** (immediate action): +- [RISK-ID] Risk description (Score: PxI = XX) + Mitigation: specific action required + Owner: [role] | Due: [date] + +**MONITORING** (track): +- [RISK-ID] Risk description (Score: PxI = XX) + Early warning: [indicator to watch] +``` + +--- + +## Risk Register History & Trends + +Persist risk assessments for trend tracking: + +```bash +mkdir -p .cybereum/risk-snapshots +``` + +Save as `.cybereum/risk-snapshots/{project-slug}-{YYYY-MM-DD}.json`: + +```json +{ + "date": "2026-03-14", + "project": "Project Name", + "total_risks": 42, + "critical_risks": 5, + "monitoring_risks": 12, + "closed_since_last": 3, + "new_since_last": 7, + "p50_contingency": 12500000, + "p80_contingency": 18200000, + "top_risk": "Long-lead transformer delivery delay", + "risk_score_distribution": { "1-4": 15, "5-8": 10, "9-15": 12, "16-25": 5 } +} +``` + +When prior snapshots exist, show risk trend: +``` + Last Now Delta +Total Risks: 38 -> 42 +4 (growing) +Critical: 3 -> 5 +2 (escalating) +Contingency (P80): $15.2M -> $18.2M +$3.0M +New Risks: -- -> 7 7 new this period +Closed Risks: -- -> 3 3 resolved +``` + +--- + +## Output Modes + +**Quick Scan**: Top 10 risks, 1-line each, with score. Under 2 minutes. + +**Full Register**: Complete risk register with mitigations. For project setup or periodic review. + +**Risk Update**: Compare current risks to previous register. Flag new risks, closed risks, score changes. + +**Contingency Justification**: Formal AACE-style contingency memo for budget approval. diff --git a/cybereum-sales-intelligence/SKILL.md b/cybereum-sales-intelligence/SKILL.md new file mode 100644 index 0000000..d78c671 --- /dev/null +++ b/cybereum-sales-intelligence/SKILL.md @@ -0,0 +1,264 @@ +--- +name: cybereum-sales-intelligence +version: 1.0.0 +description: | + Generates sales intelligence, prospect research, customized pitch content, outreach messaging, + and competitive positioning for Cybereum capital project governance platform. Use this skill for + any sales development, business development, prospect research, outreach email drafting, pitch + deck customization, competitive analysis, account research, or meeting preparation for Cybereum + prospects in EPC, energy, defense, infrastructure, nuclear, or aerospace sectors. Also use to + research specific companies like Northrop Grumman, SpaceX, BAE Systems, Rheinmetall, McKinsey, + AECOM, Bechtel, Fluor, Los Alamos, Department of Energy, and similar targets. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob + - WebSearch + - WebFetch +--- + +# Cybereum Sales Intelligence + +BD and go-to-market intelligence engine for Cybereum. Produces deep prospect research, operationally resonant outreach, and customized pitch materials for capital project governance sales in defense, EPC, energy, and infrastructure sectors. + +--- + +## Cybereum Value Proposition (Master) + +**What Cybereum is:** + +> Cybereum is an AI-powered capital project governance platform -- the GPS for Capital Project Management. It combines a temporal knowledge graph with AI-driven decision support to deliver forecasting, risk detection, and corrective action recommendations for complex capital programs in EPC, energy, infrastructure, and defense. + +**Core differentiators:** + +1. **Temporal Knowledge Graph**: Projects evolve -- Cybereum tracks how the graph changes over time, enabling true causal analysis, not just snapshots +2. **Dyeus AI Engine**: Named reasoning engine providing Schwerpunkt decision intelligence, not generic chatbot responses +3. **Reference Class Forecasting**: Calibrated against industry-wide historical data, correcting optimism bias in estimates +4. **Patent-protected**: 2 USPTO patents; NSF SBIR funded -- institutional validation +5. **Domain-native**: Built specifically for capital projects, not adapted from generic PM tools + +**Positioning statement:** + +> "Where Primavera P6 tells you what happened, Cybereum tells you what's going to happen and what to do about it." + +--- + +## Step 1: Prospect Research Protocol + +When researching a prospect, systematically build: + +### Company Profile + +- Organization type (Prime contractor / Subcontractor / Owner / Government agency / Consultancy) +- Revenue and headcount scale +- Capital project portfolio size and type +- Key programs or projects currently active +- Recent wins, losses, or notable news + +### Decision Maker Mapping + +- Chief Project Officer / VP of Projects / Head of Capital Programs +- Head of Digital / CIO / Head of AI or Innovation +- Head of Finance / CFO (for cost control angle) +- Procurement / Contracts lead (for SaaS purchasing) + +### Pain Point Hypothesis + +For each prospect type, apply the standard pain hypothesis: + +| Prospect Type | Most Likely Pain | Cybereum Hook | +|---|---|---| +| Prime Defense Contractor | EVMS compliance burden, DCAA audit exposure, program cost growth | EVM Control + Decision-AI automates EVMS reporting and early warning | +| EPC Firm | Schedule slippage, procurement delays, multi-contract coordination | Schedule Intelligence + Completion Prediction + Risk Engine | +| Project Owner (Energy) | Contractor performance, budget overruns, change order proliferation | Full platform governance -- owner-side visibility | +| Government Agency | Program reporting, IG/GAO audit readiness, independent estimate validation | Reference Class Forecasting + Executive Reporting | +| Nuclear Developer | FOAK cost uncertainty, NRC interface, schedule confidence for financing | Reference Class + Completion Prediction + Decision-AI | +| Management Consultancy | Client delivery differentiation, AI capability demonstration | White-label or API integration for client engagements | + +### Active Program Intelligence + +Research: What capital programs is this company currently executing? What are the scale, phase, and reported challenges? + +**Use web search for real-time intelligence:** +- Search for recent press releases, earnings calls, project announcements +- Check government contract awards (USASpending.gov, FPDS) +- Review SEC filings for capital expenditure disclosures +- Search industry publications (ENR, Power Engineering, Defense News) + +--- + +## Step 2: Outreach Message Generation + +### Core Outreach Principles + +1. **Lead with their pain, not your product** -- first sentence must reference something specific to their world +2. **Operational language** -- use their terminology (EVMS, DCAA, IFC, FEED, commissioning, P6, EVM) +3. **Specificity over generality** -- reference specific programs, press releases, or challenges when known +4. **One claim, one proof point** -- don't list features; make one sharp claim with evidence +5. **Short CTA** -- 30-minute discovery call, specific offer (demo, briefing, benchmark analysis) + +### Email Structure + +``` +Subject: [Specific hook -- reference their program or a known industry challenge] + +[Opening: 1 sentence referencing their specific context -- program, announcement, challenge] + +[Problem framing: 1-2 sentences on the specific capital project governance pain for their situation] + +[Cybereum value: 1-2 sentences on how Cybereum addresses this specifically -- not generically] + +[Proof point: 1 sentence -- NSF SBIR, 2 USPTO patents, or specific capability that matters to them] + +[CTA: Single, low-friction ask -- 30-minute call, or offer of a benchmark analysis for their program type] + +[Signature] +``` + +### LinkedIn Message Structure (300 chars) + +``` +[Name], saw [specific trigger]. Cybereum built [one capability] specifically for [their challenge]. Worth a 20-minute call? -- Ananth +``` + +--- + +## Step 3: Pitch Deck Customization + +When building a customized pitch deck for a specific prospect: + +### Five-Slide Structure (Cybereum Standard) + +``` +Slide 1: THEIR WORLD -- Their specific program challenge. Open with their language. +Slide 2: WHY CURRENT TOOLS FAIL -- Primavera gap, Excel gap, BI tool gap for their use case +Slide 3: CYBEREUM -- Temporal knowledge graph + Dyeus AI. Platform overview diagram. +Slide 4: FOR [PROSPECT NAME] SPECIFICALLY -- Use cases mapped to their programs +Slide 5: PROOF + NEXT STEP -- NSF SBIR, patents, advisors, call to action +``` + +### Customization variables by sector: + +**Defense / DoD:** +- Lead slide: EVMS compliance cost, program cost growth statistics (DoD DAES data) +- Terminology: CDRL, DCAA, EVMS, Work Authorization, OBS/WBS, IBR, CAM +- Proof points: DoD EVMS Guidebook alignment, GAO Schedule Assessment compatibility + +**EPC (Bechtel, Fluor, AECOM, Jacobs):** +- Lead slide: Schedule slippage statistics for mega-EPC projects +- Terminology: IFC, FEED, lump sum, EPCM, P6, critical path, TIC, BOE +- Proof points: Flyvbjerg data on cost/schedule overruns; productivity benchmarks + +**Nuclear (Aalo, NuScale, X-energy, Terrestrial):** +- Lead slide: FOAK nuclear cost and schedule uncertainty; investor risk +- Terminology: ITAAC, IFC, NRC, FOAK, NOAK, Class 5/4/3/2/1 estimates +- Proof points: Reference Class Forecasting + SMR benchmark data + +**Management Consulting (McKinsey, BCG, Bain):** +- Lead slide: Consulting differentiation through AI-powered project intelligence +- Terminology: Capital project advisory, transformation, digital, performance improvement +- Proof points: Platform as white-label or API for client delivery + +--- + +## Step 4: Competitive Intelligence + +### Primary Competitors + +| Competitor | What They Do | Cybereum Advantage | +|---|---|---| +| Oracle Primavera P6 | Schedule management | P6 is a recorder; Cybereum is a predictor. P6 tells you what happened. | +| Procore | Construction management | Procore is field ops; Cybereum is executive governance and forecasting | +| Hexagon EcoSys | EPC project controls | EcoSys is data aggregation; Cybereum is AI decision intelligence | +| InEight | Estimating & project controls | InEight is estimate-based; Cybereum adds temporal AI reasoning | +| Bentley SYNCHRO | 4D BIM scheduling | SYNCHRO is simulation; Cybereum is governance intelligence | +| Microsoft Project | Basic scheduling | Not competitive at capital project scale | +| Custom Excel/BI | Spreadsheet reporting | Cybereum replaces the manual layer with AI-native analysis | + +**Winning positioning against all competitors:** + +> "All of these tools capture data. Cybereum is the layer that reasons about it -- telling you what's going to happen before it does and what to do about it." + +--- + +## Step 5: Account-Specific Intelligence (Named Prospects) + +### McKinsey Capital Projects Practice + +- Contact: Justin Dahl (known contact; P6 schedule file pending for pilot) +- Leverage: Justina Gallegos advisor connection +- Angle: Differentiate McKinsey client delivery with Cybereum AI layer; white-label positioning +- Status: Active -- pilot stalled, waiting on P6 XER from Justin +- Next action: Follow-up sequence through Justina or direct reengagement + +### Aalo Atomics + +- Contact: Robert Kessler, Head of AI +- Angle: Reference Class Forecasting for SMR programs; investor-ready forecast confidence +- Prepared asset: RCF briefing for nuclear programs +- Key message: FOAK nuclear risk quantification; Dyeus for SMR program governance + +### Department of Energy + +- Angle: Program performance, independent cost validation, audit readiness +- Programs: Loan Programs Office portfolios; Office of Nuclear Energy SMR programs +- Regulatory fit: GAO Schedule Assessment Guide alignment, DOE Order 413.3B + +### Los Alamos National Lab + +- Angle: Capital infrastructure programs, security-sensitive project governance +- Programs: PF-4 recapitalization; CMRR project history creates RCF opportunity +- Key message: AI-powered cost/schedule risk for national lab infrastructure + +### California High-Speed Rail (CAHSR) + +- Angle: Mega-project schedule recovery, reference class benchmarking, political accountability +- Context: Project is dramatically over budget and schedule -- perfect RCF case study +- Key message: Cybereum as the governance intelligence layer the project desperately needs + +--- + +## Step 6: Advisor Leverage Playbook + +### Justina Gallegos + +- Network: McKinsey; DOE; infrastructure finance +- Highest-value action: Forwardable email to McKinsey capital projects leadership +- Secondary: DOE program connections; infrastructure equity introductions + +### Patrick Suermann + +- Network: DoD / Military construction; SAME (Society of American Military Engineers) +- Highest-value action: Introduction to DoD construction program offices (USACE, NAVFAC) +- Secondary: Defense contractor BD leads + +### Rick Rundell + +- Network: NSF SBIR program; academic capital project community +- Highest-value action: Phase II SBIR support and validation quotes + +--- + +## Reference Files + +- `references/cybereum-platform-detail.md` -- Technical platform depth for technical buyers +- `references/prospect-database.md` -- Known prospects, contacts, status, next actions +- `references/sector-pain-libraries.md` -- Detailed pain point research by sector +- `references/objection-handling.md` -- Common objections and responses + +--- + +## Output Modes + +**Prospect Brief**: 1-page intelligence brief on a specific company + +**Outreach Draft**: Cold email or LinkedIn message, ready to send + +**Pitch Deck**: Custom 5-slide narrative for specific prospect (use pptx skill) + +**Competitive Response**: How to position against a specific competitor in an active deal + +**Meeting Prep**: Background + talking points + questions for a specific stakeholder meeting diff --git a/cybereum-schedule-intelligence/SKILL.md b/cybereum-schedule-intelligence/SKILL.md new file mode 100644 index 0000000..cb2f4e8 --- /dev/null +++ b/cybereum-schedule-intelligence/SKILL.md @@ -0,0 +1,279 @@ +--- +name: cybereum-schedule-intelligence +version: 1.0.0 +description: | + Analyzes capital project schedules including P6 XER/XML files, Primavera exports, and schedule data + for critical path analysis, float erosion, logic gaps, near-critical activity detection, and schedule + health scoring. Use this skill whenever a user uploads or references a schedule file, mentions XER, P6, + Primavera, CPM, critical path, activity IDs, WBS, float, or asks to review/audit/analyze a project + schedule. Always use this skill for schedule risk, delay analysis, recovery planning, or any question + about project timeline performance. +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob +--- + +# Cybereum Schedule Intelligence + +Advanced capital project schedule analytics engine for EPC, energy, infrastructure, and defense programs. Applies temporal knowledge graph principles and construction industry best practices (DCMA 14-Point, AACE RP 49R-06, GAO Schedule Assessment Guide). + +## Core Capabilities + +- **Critical Path Analysis**: Identify true critical and near-critical paths (total float <= 10 days) +- **Schedule Health Scoring**: DCMA 14-Point assessment with numeric scoring +- **Float Erosion Detection**: Track float consumption trends over schedule updates +- **Logic Gap Identification**: Missing predecessors, successors, open ends, dangling activities +- **Compression Analytics**: Detect schedule compression, unrealistic durations, resource overloads +- **Milestone Variance**: Planned vs. actual milestone slippage with recovery trajectory +- **WBS Integrity Check**: Hierarchy completeness, orphaned activities, level-of-detail gaps + +--- + +## Step 1: Ingest and Parse Schedule + +When a user provides a schedule file: + +- **XER files**: Parse using the xer_analyzer library pattern -- split by `%T` table delimiters, extract `TASK`, `TASKPRED`, `PROJECT`, `PROJWBS`, `RSRC` tables +- **XML/P6XML**: Navigate ``, ``, `` nodes +- **CSV/Excel exports**: Map columns to standard fields (Activity ID, Name, Start, Finish, Duration, Total Float, Free Float, Predecessors, Successors) + +**Key fields to extract:** + +``` +Activity ID | WBS Code | Activity Name | Type | Duration (OD/RD) | +ES | EF | LS | LF | Total Float | Free Float | % Complete | +Predecessor List | Successor List | Constraint Type | Constraint Date | +Calendar | Resource | Cost Account +``` + +**Before analysis, confirm:** + +1. Data date (status date) -- required for variance calculations +2. Project planned finish vs. current forecast finish +3. Number of activities, milestones, and WBS levels + +--- + +## Step 2: Schedule Health Assessment (DCMA 14-Point) + +Run all 14 checks and score each. Report percentage failing each check. + +| # | Check | Threshold | Cybereum Scoring | +|---|-------|-----------|-----------------| +| 1 | Logic (missing predecessors) | <5% open ends | Critical if >10% | +| 2 | Leads (negative lags) | 0% | Flag any | +| 3 | Lags | <5% | Review if >10% | +| 4 | Relationship types | >90% FS | Note SS/FF patterns | +| 5 | Hard constraints | <5% | Critical if >10% | +| 6 | High float (>44 days) | <5% | Review outliers | +| 7 | Negative float | 0% | Critical -- always flag | +| 8 | High duration (>44 days) | <5% | Review level of detail | +| 9 | Invalid dates | 0% | Critical | +| 10 | Resources | >90% loaded | Note unloaded critical activities | +| 11 | Missed logic | Project-specific | Review hard-constrained activities | +| 12 | Critical path length index (CPLI) | >0.95 | Flag if <0.80 | +| 13 | BEI (Baseline Execution Index) | >0.95 | Flag if <0.80 | +| 14 | Incomplete activities with actual start | N/A | Flag any | + +**Output: Schedule Health Score** (0-100) with severity tier: + +- 85-100: Healthy +- 70-84: Moderate Risk +- 50-69: High Risk +- <50: Critical -- Immediate Intervention Required + +--- + +## Step 3: Critical Path and Near-Critical Analysis + +1. **Identify critical path**: Activities with Total Float = 0 (or <= threshold) +2. **Near-critical band**: Activities with Float 1-10 days -- list top 20 by lowest float +3. **Critical path narrative**: Describe the end-to-end logic chain in plain language +4. **Longest path check**: Compare CPM critical path vs. longest path algorithm result +5. **Parallel critical paths**: Flag if >1 path has 0 float -- indicates schedule fragility + +**Format the critical path as:** + +``` +[Start Milestone] -> [Engineering Phase] -> [Procurement Lead Item] -> +[Fabrication] -> [Civil/Structural] -> [Mechanical Install] -> +[Systems Completion] -> [Project Completion] +``` + +--- + +## Step 4: Risk and Anomaly Detection + +Automatically flag: + +- **Logic gaps**: Open-start or open-finish activities (except project start/end) +- **Constraint overuse**: Hard-constrained activities masking float +- **Resource overloads**: >100% allocation on any resource in any period +- **Spec-driven durations**: Activities with exact round-number durations (30, 60, 90 days) -- may be placeholder +- **Backward logic**: Finish-to-Start relationships with negative lag +- **Out-of-sequence progress**: Activities with % complete but no actual start date +- **Milestone logic**: Key milestones with no driving predecessor + +--- + +## Step 5: Variance and Delay Attribution + +For schedule updates (two snapshots): + +1. **Slippage calculation**: Current forecast finish - Baseline finish (in working days and calendar days) +2. **Variance contributors**: Activities that slipped most -- sort by Finish Variance (days) +3. **Float erosion**: Activities where float has decreased >10 days since last update +4. **Earned Schedule**: Calculate SPI(t) = ES / AT where ES = earned schedule, AT = actual time +5. **Delay attribution categories**: Owner-caused / Contractor-caused / Force Majeure / Procurement / Design + +--- + +## Step 6: Recovery Scenario Analysis + +When asked for recovery options: + +1. **Compression opportunities**: Fast-tracking candidates (FS -> SS), crashing candidates (add resources) +2. **Logic re-sequencing**: Activities that can be parallelized without technical risk +3. **Scope reduction**: Activities that could be deferred to scope relief +4. **Recovery timeline**: Model compressed duration to estimate recovery weeks +5. **Risk trade-offs**: Each recovery option rated by Cost Impact / Schedule Risk / Execution Risk + +--- + +## Output Format + +Always produce: + +### Executive Summary (3-5 sentences) + +- Current forecast vs. baseline finish +- Health score and primary risk +- Top recommendation + +### Schedule Health Scorecard + +Table of all 14 DCMA checks with pass/fail and count + +### Critical Path Summary + +Narrative + activity list with float values + +### Top 10 Risk Activities + +Ranked by float, with flags for logic issues + +### Recommended Actions + +Numbered list, prioritized by schedule impact + +--- + +## Reference Files + +- `references/dcma-14-point.md` -- Detailed DCMA check methodology and thresholds +- `references/aace-rp49r06.md` -- AACE schedule quality metrics +- `references/xer-parsing-guide.md` -- XER table structure and field mapping +- `references/industry-benchmarks.md` -- EPC, energy, defense schedule norms + +--- + +## Schedule History & Trend Tracking + +Persist schedule analysis results for trend comparison across updates (adapted from retro pattern): + +### Save Schedule Snapshot + +After each analysis, save a JSON snapshot: + +```bash +mkdir -p .cybereum/schedule-snapshots +``` + +Save as `.cybereum/schedule-snapshots/{project-slug}-{YYYY-MM-DD}.json`: + +```json +{ + "date": "2026-03-14", + "project": "Project Name", + "data_date": "2026-03-10", + "health_score": 72, + "dcma_checks": { + "logic": { "pass": true, "pct": 3.2 }, + "leads": { "pass": true, "pct": 0 }, + "lags": { "pass": true, "pct": 2.1 } + }, + "critical_path_length": 245, + "near_critical_count": 18, + "total_float_avg": 12.4, + "negative_float_count": 0, + "forecast_finish": "2027-06-15", + "baseline_finish": "2027-03-01", + "slippage_days": 76, + "activity_count": 2400, + "open_ends_pct": 4.1 +} +``` + +### Trend Comparison + +If prior snapshots exist, load the most recent and show deltas: + +``` + Last Now Delta +Health Score: 78 -> 72 -6 (declining) +Slippage: 45d -> 76d +31d (worsening) +Open Ends: 2.8% -> 4.1% +1.3pp +Near-Critical: 12 -> 18 +6 activities +Neg Float: 0 -> 0 stable +``` + +Flag any metric that has worsened for 3+ consecutive snapshots as a **sustained negative trend**. + +--- + +## Two-Pass Review Protocol + +Apply findings in two severity tiers (adapted from review checklist pattern): + +**Pass 1 -- CRITICAL (requires immediate action):** +- Negative float on any activity +- Missing logic on critical path activities +- CPLI < 0.80 +- BEI < 0.80 +- Invalid dates or out-of-sequence progress + +**Pass 2 -- INFORMATIONAL (monitor and address):** +- High float outliers (>44 days) +- High duration activities (>44 days) +- Lag overuse (>10%) +- Resource unloaded critical activities +- Round-number placeholder durations + +Output format: +``` +Schedule Review: N issues (X critical, Y informational) + +**CRITICAL** (requires action): +- [Activity ID] Problem description + Fix: suggested corrective action + +**Issues** (monitoring): +- [WBS/Activity] Problem description + Fix: suggested corrective action +``` + +--- + +## Troubleshooting + +**XER won't parse**: Check file encoding (UTF-8 vs. Latin-1). Look for `%T TASK` delimiter. Some P6 exports use `\r\n` line endings. + +**Float calculation mismatch**: Verify calendar assignments. Night/weekend calendars change float significantly. + +**Critical path doesn't reach project end**: Check for missing successor on last activity before project finish milestone. + +**Negative float with no hard constraints**: Look for out-of-sequence actual progress forcing backward calculation. From 672acedd0aa41d8159efc9ba04787d49426eaa83 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 15 Mar 2026 13:28:38 +0000 Subject: [PATCH 2/3] feat: adapt all workflow skills for Cybereum development organization Ship: bun test/build instead of Rails/Vitest, Cybereum skill commit ordering, cross-skill consistency in PR body Review: Cybereum-specific checklist replacing Rails checks -- Data & Calculation Integrity, Graph Consistency, LLM Trust Boundary, Skill Content Quality, Cross-Skill Consistency, File Parsing Safety QA: calculation verification (EVM formulas, risk scoring, DCMA thresholds, completion prediction multipliers, RCF benchmarks), cross-skill consistency checks, output completeness validation, Cybereum health scoring rubric Plan-CEO-Review: Cybereum product context (temporal knowledge graph, Dyeus AI, 8 analytical skills, competitive positioning), capital project governance framing for all 10 review sections Plan-Eng-Review: Cybereum architecture context (cross-skill data flows, snapshot schemas, calculation engine correctness, industry standard compliance) Retro: Cybereum skill development tracking section (which skills touched, cross- skill consistency risk, skill coverage, snapshot schema changes) https://claude.ai/code/session_01NbTWS982B1xJFJCyt6ZVig --- plan-ceo-review/SKILL.md | 508 ++++++-------------------------- plan-eng-review/SKILL.md | 196 ++++++------ qa/SKILL.md | 385 +++++++++++------------- qa/references/issue-taxonomy.md | 121 ++++---- retro/SKILL.md | 28 +- review/SKILL.md | 22 +- review/checklist.md | 131 ++++---- ship/SKILL.md | 161 +++------- 8 files changed, 565 insertions(+), 987 deletions(-) diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 8ac026c..5312c8f 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -2,10 +2,10 @@ name: plan-ceo-review version: 1.0.0 description: | - CEO/founder-mode plan review. Rethink the problem, find the 10-star product, - challenge premises, expand scope when it creates a better product. Three modes: - SCOPE EXPANSION (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION - (strip to essentials). + CEO/founder-mode plan review for Cybereum. Rethink the problem through the lens + of capital project governance, find the 10-star product, challenge premises, + expand scope when it creates a better platform. Three modes: SCOPE EXPANSION + (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials). allowed-tools: - Read - Grep @@ -14,471 +14,149 @@ allowed-tools: - AskUserQuestion --- -# Mega Plan Review Mode +# Mega Plan Review Mode -- Cybereum + +## Cybereum Product Context + +Cybereum is an AI-powered capital project governance platform -- the GPS for Capital Project Management. It combines a temporal knowledge graph with AI-driven decision support (Dyeus AI Engine) to deliver forecasting, risk detection, and corrective action recommendations for complex capital programs. + +**Core platform components:** +- **Temporal Knowledge Graph**: Projects evolve -- Cybereum tracks how the graph changes over time for causal analysis +- **Dyeus AI Engine**: Named reasoning engine providing Schwerpunkt decision intelligence +- **8 Analytical Skills**: Schedule Intelligence, Decision-AI, Risk Engine, EVM Control, Completion Prediction, Reference Class Forecasting, Executive Reporting, Sales Intelligence +- **Patent-protected**: 2 USPTO patents; NSF SBIR funded + +**Target markets:** EPC, energy, nuclear, defense, infrastructure capital programs + +**Key competitors to beat:** Oracle Primavera P6, Procore, Hexagon EcoSys, InEight + +**Positioning:** "Where Primavera P6 tells you what happened, Cybereum tells you what's going to happen and what to do about it." ## Philosophy You are not here to rubber-stamp this plan. You are here to make it extraordinary, catch every landmine before it explodes, and ensure that when this ships, it ships at the highest possible standard. But your posture depends on what the user needs: -* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" The answer to "should we also build X?" is "yes, if it serves the vision." You have permission to dream. -* HOLD SCOPE: You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof — catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand. +* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal of capital project governance AI. Push scope UP. Ask "what would make this 10x better for 2x the effort?" The answer to "should we also build X?" is "yes, if it serves the vision." You have permission to dream. +* HOLD SCOPE: You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof -- catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand. * SCOPE REDUCTION: You are a surgeon. Find the minimum viable version that achieves the core outcome. Cut everything else. Be ruthless. -Critical rule: Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully. +Critical rule: Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 -- after that, execute the chosen mode faithfully. Do NOT make any code changes. Do NOT start implementation. Your only job right now is to review the plan with maximum rigor and the appropriate level of ambition. ## Prime Directives -1. Zero silent failures. Every failure mode must be visible — to the system, to the team, to the user. If a failure can happen silently, that is a critical defect in the plan. -2. Every error has a name. Don't say "handle errors." Name the specific exception class, what triggers it, what rescues it, what the user sees, and whether it's tested. rescue StandardError is a code smell — call it out. +1. Zero silent failures. Every failure mode must be visible -- to the system, to the team, to the user. If a failure can happen silently, that is a critical defect in the plan. +2. Every error has a name. Don't say "handle errors." Name the specific exception, what triggers it, what catches it, what the user sees, and whether it's tested. 3. Data flows have shadow paths. Every data flow has a happy path and three shadow paths: nil input, empty/zero-length input, and upstream error. Trace all four for every new flow. -4. Interactions have edge cases. Every user-visible interaction has edge cases: double-click, navigate-away-mid-action, slow connection, stale state, back button. Map them. -5. Observability is scope, not afterthought. New dashboards, alerts, and runbooks are first-class deliverables, not post-launch cleanup items. -6. Diagrams are mandatory. No non-trivial flow goes undiagrammed. ASCII art for every new data flow, state machine, processing pipeline, dependency graph, and decision tree. -7. Everything deferred must be written down. Vague intentions are lies. TODOS.md or it doesn't exist. -8. Optimize for the 6-month future, not just today. If this plan solves today's problem but creates next quarter's nightmare, say so explicitly. -9. You have permission to say "scrap it and do this instead." If there's a fundamentally better approach, table it. I'd rather hear it now. +4. Calculation correctness is non-negotiable. Every EVM formula, risk score, schedule metric, and reference class benchmark must be verifiable against its cited standard (ANSI/EIA-748, DCMA, AACE, Flyvbjerg). +5. Cross-skill consistency is mandatory. The same metric cannot mean different things in different skills. Terminology, thresholds, and formulas must be unified. +6. Observability is scope, not afterthought. New dashboards, alerts, and trend tracking are first-class deliverables. +7. Diagrams are mandatory. No non-trivial flow goes undiagrammed. ASCII art for every new data flow, state machine, processing pipeline, and decision tree. +8. Everything deferred must be written down. Vague intentions are lies. TODOS.md or it doesn't exist. +9. Optimize for the 6-month future, not just today. If this plan solves today's problem but creates next quarter's nightmare, say so explicitly. +10. You have permission to say "scrap it and do this instead." If there's a fundamentally better approach, table it. ## Engineering Preferences (use these to guide every recommendation) -* DRY is important — flag repetition aggressively. -* Well-tested code is non-negotiable; I'd rather have too many tests than too few. -* I want code that's "engineered enough" — not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity). -* I err on the side of handling more edge cases, not fewer; thoughtfulness > speed. -* Bias toward explicit over clever. +* DRY is important -- flag repetition aggressively, especially across skills. +* Well-tested code is non-negotiable; calculation-heavy code needs exhaustive edge case tests. +* I want code that's "engineered enough" -- not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity). +* Bias toward explicit over clever. Capital project professionals must be able to audit every calculation. * Minimal diff: achieve the goal with the fewest new abstractions and files touched. -* Observability is not optional — new codepaths need logs, metrics, or traces. -* Security is not optional — new codepaths need threat modeling. -* Deployments are not atomic — plan for partial states, rollbacks, and feature flags. -* ASCII diagrams in code comments for complex designs — Models (state transitions), Services (pipelines), Controllers (request flow), Concerns (mixin behavior), Tests (non-obvious setup). -* Diagram maintenance is part of the change — stale diagrams are worse than none. +* Industry standard compliance is not optional -- new codepaths must cite their methodology source. +* Temporal integrity matters -- every data mutation must preserve the knowledge graph's causal chain. +* ASCII diagrams in code comments for complex designs. +* Diagram maintenance is part of the change. ## Priority Hierarchy Under Context Pressure Step 0 > System audit > Error/rescue map > Test diagram > Failure modes > Opinionated recommendations > Everything else. -Never skip Step 0, the system audit, the error/rescue map, or the failure modes section. These are the highest-leverage outputs. +Never skip Step 0, the system audit, the error/rescue map, or the failure modes section. ## PRE-REVIEW SYSTEM AUDIT (before Step 0) -Before doing anything else, run a system audit. This is not the plan review — it is the context you need to review the plan intelligently. -Run the following commands: +Before doing anything else, run a system audit: ``` -git log --oneline -30 # Recent history -git diff main --stat # What's already changed -git stash list # Any stashed work -grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l -find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files +git log --oneline -30 +git diff main --stat +git stash list ``` -Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map: +Then read CLAUDE.md, and all relevant SKILL.md files for skills this plan touches. Map: * What is the current system state? -* What is already in flight (other open PRs, branches, stashed changes)? +* What is already in flight (other open PRs, branches)? * What are the existing known pain points most relevant to this plan? -* Are there any FIXME/TODO comments in files this plan touches? +* Are any analytical skills internally inconsistent or incomplete? ### Retrospective Check -Check the git log for this branch. If there are prior commits suggesting a previous review cycle (review-driven refactors, reverted changes), note what was changed and whether the current plan re-touches those areas. Be MORE aggressive reviewing areas that were previously problematic. Recurring problem areas are architectural smells — surface them as architectural concerns. +Check the git log for this branch. If there are prior commits suggesting a previous review cycle, note what was changed and whether the current plan re-touches those areas. Be MORE aggressive reviewing areas that were previously problematic. ### Taste Calibration (EXPANSION mode only) -Identify 2-3 files or patterns in the existing codebase that are particularly well-designed. Note them as style references for the review. Also note 1-2 patterns that are frustrating or poorly designed — these are anti-patterns to avoid repeating. +Identify 2-3 skills or patterns in the existing codebase that are particularly well-designed. Note them as style references. Also note 1-2 patterns that are frustrating or poorly designed. Report findings before proceeding to Step 0. ## Step 0: Nuclear Scope Challenge + Mode Selection ### 0A. Premise Challenge -1. Is this the right problem to solve? Could a different framing yield a dramatically simpler or more impactful solution? -2. What is the actual user/business outcome? Is the plan the most direct path to that outcome, or is it solving a proxy problem? +1. Is this the right problem to solve for capital project governance? Could a different framing yield a dramatically simpler or more impactful solution for EPC/energy/defense users? +2. What is the actual user/business outcome? Is the plan the most direct path to that outcome? 3. What would happen if we did nothing? Real pain point or hypothetical one? -### 0B. Existing Code Leverage -1. What existing code already partially or fully solves each sub-problem? Map every sub-problem to existing code. Can we capture outputs from existing flows rather than building parallel ones? -2. Is this plan rebuilding anything that already exists? If yes, explain why rebuilding is better than refactoring. +### 0B. Existing Skill Leverage +1. What existing skills already partially or fully solve each sub-problem? Can we extend an existing skill rather than building new? +2. Is this plan rebuilding anything that already exists in the 8 analytical skills or the workflow skills? ### 0C. Dream State Mapping -Describe the ideal end state of this system 12 months from now. Does this plan move toward that state or away from it? +Describe the ideal end state of the Cybereum platform 12 months from now. Does this plan move toward that state or away from it? ``` CURRENT STATE THIS PLAN 12-MONTH IDEAL [describe] ---> [describe delta] ---> [describe target] ``` ### 0D. Mode-Specific Analysis -**For SCOPE EXPANSION** — run all three: -1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely. -2. Platonic ideal: If the best engineer in the world had unlimited time and perfect taste, what would this system look like? What would the user feel when using it? Start from experience, not architecture. -3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 3. +**For SCOPE EXPANSION:** +1. 10x check: What's the version of this that makes Cybereum the undisputed leader in capital project governance AI? +2. Platonic ideal: If the best capital project engineer and the best AI researcher collaborated with unlimited time, what would this system look like? +3. Delight opportunities: What adjacent improvements would make a project controls professional think "this is magic"? -**For HOLD SCOPE** — run this: -1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts. -2. What is the minimum set of changes that achieves the stated goal? Flag any work that could be deferred without blocking the core objective. +**For HOLD SCOPE:** +1. Complexity check: If the plan touches more than 4 skills or introduces more than 2 new data flows, challenge whether the same goal can be achieved with fewer moving parts. +2. What is the minimum set of changes that achieves the stated goal? -**For SCOPE REDUCTION** — run this: -1. Ruthless cut: What is the absolute minimum that ships value to a user? Everything else is deferred. No exceptions. -2. What can be a follow-up PR? Separate "must ship together" from "nice to ship together." +**For SCOPE REDUCTION:** +1. Ruthless cut: What is the absolute minimum that ships value to a capital project user? +2. What can be a follow-up PR? ### 0E. Temporal Interrogation (EXPANSION and HOLD modes) -Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan? +Think ahead to implementation: What decisions will need to be made during implementation? ``` - HOUR 1 (foundations): What does the implementer need to know? - HOUR 2-3 (core logic): What ambiguities will they hit? - HOUR 4-5 (integration): What will surprise them? - HOUR 6+ (polish/tests): What will they wish they'd planned for? + HOUR 1 (foundations): What does the implementer need to know about the temporal graph? + HOUR 2-3 (core logic): What ambiguities will they hit in the analytical methodology? + HOUR 4-5 (integration): What cross-skill consistency issues will surface? + HOUR 6+ (polish/tests): What edge cases in the calculation engine will they wish they'd planned for? ``` -Surface these as questions for the user NOW, not as "figure it out later." ### 0F. Mode Selection -Present three options: -1. **SCOPE EXPANSION:** The plan is good but could be great. Propose the ambitious version, then review that. Push scope up. Build the cathedral. -2. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof. -3. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that. +Present three options. Context-dependent defaults: +* New analytical capability -> default EXPANSION +* Bug fix or calculation correction -> default HOLD SCOPE +* Skill refactoring -> default HOLD SCOPE +* Plan touching >4 skills -> suggest REDUCTION unless user pushes back -Context-dependent defaults: -* Greenfield feature → default EXPANSION -* Bug fix or hotfix → default HOLD SCOPE -* Refactor → default HOLD SCOPE -* Plan touching >15 files → suggest REDUCTION unless user pushes back -* User says "go big" / "ambitious" / "cathedral" → EXPANSION, no question - -Once selected, commit fully. Do not silently drift. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. +**STOP.** AskUserQuestion once per issue. Recommend + WHY. Do NOT proceed until user responds. ## Review Sections (10 sections, after scope and mode are agreed) -### Section 1: Architecture Review -Evaluate and diagram: -* Overall system design and component boundaries. Draw the dependency graph. -* Data flow — all four paths. For every new data flow, ASCII diagram the: - * Happy path (data flows correctly) - * Nil path (input is nil/missing — what happens?) - * Empty path (input is present but empty/zero-length — what happens?) - * Error path (upstream call fails — what happens?) -* State machines. ASCII diagram for every new stateful object. Include impossible/invalid transitions and what prevents them. -* Coupling concerns. Which components are now coupled that weren't before? Is that coupling justified? Draw the before/after dependency graph. -* Scaling characteristics. What breaks first under 10x load? Under 100x? -* Single points of failure. Map them. -* Security architecture. Auth boundaries, data access patterns, API surfaces. For each new endpoint or data mutation: who can call it, what do they get, what can they change? -* Production failure scenarios. For each new integration point, describe one realistic production failure (timeout, cascade, data corruption, auth failure) and whether the plan accounts for it. -* Rollback posture. If this ships and immediately breaks, what's the rollback procedure? Git revert? Feature flag? DB migration rollback? How long? - -**EXPANSION mode additions:** -* What would make this architecture beautiful? Not just correct — elegant. Is there a design that would make a new engineer joining in 6 months say "oh, that's clever and obvious at the same time"? -* What infrastructure would make this feature a platform that other features can build on? - -Required ASCII diagram: full system architecture showing new components and their relationships to existing ones. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 2: Error & Rescue Map -This is the section that catches silent failures. It is not optional. -For every new method, service, or codepath that can fail, fill in this table: -``` - METHOD/CODEPATH | WHAT CAN GO WRONG | EXCEPTION CLASS - -------------------------|-----------------------------|----------------- - ExampleService#call | API timeout | Faraday::TimeoutError - | API returns 429 | RateLimitError - | API returns malformed JSON | JSON::ParserError - | DB connection pool exhausted| ActiveRecord::ConnectionTimeoutError - | Record not found | ActiveRecord::RecordNotFound - -------------------------|-----------------------------|----------------- - - EXCEPTION CLASS | RESCUED? | RESCUE ACTION | USER SEES - -----------------------------|-----------|------------------------|------------------ - Faraday::TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable" - RateLimitError | Y | Backoff + retry | Nothing (transparent) - JSON::ParserError | N ← GAP | — | 500 error ← BAD - ConnectionTimeoutError | N ← GAP | — | 500 error ← BAD - ActiveRecord::RecordNotFound | Y | Return nil, log warning | "Not found" message -``` -Rules for this section: -* `rescue StandardError` is ALWAYS a smell. Name the specific exceptions. -* `rescue => e` with only `Rails.logger.error(e.message)` is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request. -* Every rescued error must either: retry with backoff, degrade gracefully with a user-visible message, or re-raise with added context. "Swallow and continue" is almost never acceptable. -* For each GAP (unrescued error that should be rescued): specify the rescue action and what the user should see. -* For LLM/AI service calls specifically: what happens when the response is malformed? When it's empty? When it hallucinates invalid JSON? When the model returns a refusal? Each of these is a distinct failure mode. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 3: Security & Threat Model -Security is not a sub-bullet of architecture. It gets its own section. -Evaluate: -* Attack surface expansion. What new attack vectors does this plan introduce? New endpoints, new params, new file paths, new background jobs? -* Input validation. For every new user input: is it validated, sanitized, and rejected loudly on failure? What happens with: nil, empty string, string when integer expected, string exceeding max length, unicode edge cases, HTML/script injection attempts? -* Authorization. For every new data access: is it scoped to the right user/role? Is there a direct object reference vulnerability? Can user A access user B's data by manipulating IDs? -* Secrets and credentials. New secrets? In env vars, not hardcoded? Rotatable? -* Dependency risk. New gems/npm packages? Security track record? -* Data classification. PII, payment data, credentials? Handling consistent with existing patterns? -* Injection vectors. SQL, command, template, LLM prompt injection — check all. -* Audit logging. For sensitive operations: is there an audit trail? - -For each finding: threat, likelihood (High/Med/Low), impact (High/Med/Low), and whether the plan mitigates it. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 4: Data Flow & Interaction Edge Cases -This section traces data through the system and interactions through the UI with adversarial thoroughness. - -**Data Flow Tracing:** For every new data flow, produce an ASCII diagram showing: -``` - INPUT ──▶ VALIDATION ──▶ TRANSFORM ──▶ PERSIST ──▶ OUTPUT - │ │ │ │ │ - ▼ ▼ ▼ ▼ ▼ - [nil?] [invalid?] [exception?] [conflict?] [stale?] - [empty?] [too long?] [timeout?] [dup key?] [partial?] - [wrong [wrong type?] [OOM?] [locked?] [encoding?] - type?] -``` -For each node: what happens on each shadow path? Is it tested? - -**Interaction Edge Cases:** For every new user-visible interaction, evaluate: -``` - INTERACTION | EDGE CASE | HANDLED? | HOW? - ---------------------|------------------------|----------|-------- - Form submission | Double-click submit | ? | - | Submit with stale CSRF | ? | - | Submit during deploy | ? | - Async operation | User navigates away | ? | - | Operation times out | ? | - | Retry while in-flight | ? | - List/table view | Zero results | ? | - | 10,000 results | ? | - | Results change mid-page| ? | - Background job | Job fails after 3 of | ? | - | 10 items processed | | - | Job runs twice (dup) | ? | - | Queue backs up 2 hours | ? | -``` -Flag any unhandled edge case as a gap. For each gap, specify the fix. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 5: Code Quality Review -Evaluate: -* Code organization and module structure. Does new code fit existing patterns? If it deviates, is there a reason? -* DRY violations. Be aggressive. If the same logic exists elsewhere, flag it and reference the file and line. -* Naming quality. Are new classes, methods, and variables named for what they do, not how they do it? -* Error handling patterns. (Cross-reference with Section 2 — this section reviews the patterns; Section 2 maps the specifics.) -* Missing edge cases. List explicitly: "What happens when X is nil?" "When the API returns 429?" etc. -* Over-engineering check. Any new abstraction solving a problem that doesn't exist yet? -* Under-engineering check. Anything fragile, assuming happy path only, or missing obvious defensive checks? -* Cyclomatic complexity. Flag any new method that branches more than 5 times. Propose a refactor. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 6: Test Review -Make a complete diagram of every new thing this plan introduces: -``` - NEW UX FLOWS: - [list each new user-visible interaction] - - NEW DATA FLOWS: - [list each new path data takes through the system] - - NEW CODEPATHS: - [list each new branch, condition, or execution path] - - NEW BACKGROUND JOBS / ASYNC WORK: - [list each] - - NEW INTEGRATIONS / EXTERNAL CALLS: - [list each] - - NEW ERROR/RESCUE PATHS: - [list each — cross-reference Section 2] -``` -For each item in the diagram: -* What type of test covers it? (Unit / Integration / System / E2E) -* Does a test for it exist in the plan? If not, write the test spec header. -* What is the happy path test? -* What is the failure path test? (Be specific — which failure?) -* What is the edge case test? (nil, empty, boundary values, concurrent access) +Follow the same 10-section structure as the standard mega plan review: -Test ambition check (all modes): For each new feature, answer: -* What's the test that would make you confident shipping at 2am on a Friday? -* What's the test a hostile QA engineer would write to break this? -* What's the chaos test? +1. **Architecture Review** -- with emphasis on temporal knowledge graph integrity, cross-skill data flows, and calculation engine correctness +2. **Error & Rescue Map** -- with emphasis on malformed schedule data, missing EVM inputs, and AI hallucination in recommendations +3. **Security & Threat Model** -- with emphasis on sensitive project financial data, client confidentiality, and LLM output trust boundaries +4. **Data Flow & Interaction Edge Cases** -- with emphasis on schedule parsing edge cases, EVM boundary conditions, and Monte Carlo parameter ranges +5. **Code Quality Review** -- with emphasis on DRY across skills, formula correctness, and industry standard compliance +6. **Test Review** -- with emphasis on calculation verification tests, cross-skill consistency tests, and snapshot round-trip tests +7. **Performance Review** -- with emphasis on Monte Carlo simulation performance, large schedule file parsing, and portfolio-level analysis scalability +8. **Observability & Debuggability Review** -- with emphasis on trend tracking (`.cybereum/` snapshots), alert thresholds, and forecast accuracy monitoring +9. **Deployment & Rollout Review** -- with emphasis on skill deployment to `~/.claude/skills/gstack/`, backward compatibility of snapshot schemas +10. **Long-Term Trajectory Review** -- with emphasis on platform extensibility, new sector support, and competitive positioning against P6/EcoSys/InEight -Test pyramid check: Many unit, fewer integration, few E2E? Or inverted? -Flakiness risk: Flag any test depending on time, randomness, external services, or ordering. -Load/stress test requirements: For any new codepath called frequently or processing significant data. +For each section: **STOP.** AskUserQuestion once per issue. Recommend + WHY. Do NOT proceed until user responds. -For LLM/prompt changes: Check CLAUDE.md for the "Prompt/LLM changes" file patterns. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against. -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 7: Performance Review -Evaluate: -* N+1 queries. For every new ActiveRecord association traversal: is there an includes/preload? -* Memory usage. For every new data structure: what's the maximum size in production? -* Database indexes. For every new query: is there an index? -* Caching opportunities. For every expensive computation or external call: should it be cached? -* Background job sizing. For every new job: worst-case payload, runtime, retry behavior? -* Slow paths. Top 3 slowest new codepaths and estimated p99 latency. -* Connection pool pressure. New DB connections, Redis connections, HTTP connections? -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 8: Observability & Debuggability Review -New systems break. This section ensures you can see why. -Evaluate: -* Logging. For every new codepath: structured log lines at entry, exit, and each significant branch? -* Metrics. For every new feature: what metric tells you it's working? What tells you it's broken? -* Tracing. For new cross-service or cross-job flows: trace IDs propagated? -* Alerting. What new alerts should exist? -* Dashboards. What new dashboard panels do you want on day 1? -* Debuggability. If a bug is reported 3 weeks post-ship, can you reconstruct what happened from logs alone? -* Admin tooling. New operational tasks that need admin UI or rake tasks? -* Runbooks. For each new failure mode: what's the operational response? - -**EXPANSION mode addition:** -* What observability would make this feature a joy to operate? -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 9: Deployment & Rollout Review -Evaluate: -* Migration safety. For every new DB migration: backward-compatible? Zero-downtime? Table locks? -* Feature flags. Should any part be behind a feature flag? -* Rollout order. Correct sequence: migrate first, deploy second? -* Rollback plan. Explicit step-by-step. -* Deploy-time risk window. Old code and new code running simultaneously — what breaks? -* Environment parity. Tested in staging? -* Post-deploy verification checklist. First 5 minutes? First hour? -* Smoke tests. What automated checks should run immediately post-deploy? - -**EXPANSION mode addition:** -* What deploy infrastructure would make shipping this feature routine? -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -### Section 10: Long-Term Trajectory Review -Evaluate: -* Technical debt introduced. Code debt, operational debt, testing debt, documentation debt. -* Path dependency. Does this make future changes harder? -* Knowledge concentration. Documentation sufficient for a new engineer? -* Reversibility. Rate 1-5: 1 = one-way door, 5 = easily reversible. -* Ecosystem fit. Aligns with Rails/JS ecosystem direction? -* The 1-year question. Read this plan as a new engineer in 12 months — obvious? - -**EXPANSION mode additions:** -* What comes after this ships? Phase 2? Phase 3? Does the architecture support that trajectory? -* Platform potential. Does this create capabilities other features can leverage? -**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. - -## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. - -## For Each Issue You Find -* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. -* Describe the problem concretely, with file and line references. -* Present 2-3 options, including "do nothing" where reasonable. -* For each option: effort, risk, and maintenance burden in one line. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. -* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference. -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). -* **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. +## CRITICAL RULE -- How to ask questions +Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY. No batching multiple issues into one question. ## Required Outputs - -### "NOT in scope" section -List work considered and explicitly deferred, with one-line rationale each. - -### "What already exists" section -List existing code/flows that partially solve sub-problems and whether the plan reuses them. - -### "Dream state delta" section -Where this plan leaves us relative to the 12-month ideal. - -### Error & Rescue Registry (from Section 2) -Complete table of every method that can fail, every exception class, rescued status, rescue action, user impact. - -### Failure Modes Registry -``` - CODEPATH | FAILURE MODE | RESCUED? | TEST? | USER SEES? | LOGGED? - ---------|----------------|----------|-------|----------------|-------- -``` -Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**. - -### TODOS.md updates -Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. - -For each TODO, describe: -* **What:** One-line description of the work. -* **Why:** The concrete problem it solves or value it unlocks. -* **Pros:** What you gain by doing this work. -* **Cons:** Cost, complexity, or risks of doing it. -* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start. -* **Effort estimate:** S/M/L/XL -* **Priority:** P1/P2/P3 -* **Depends on / blocked by:** Any prerequisites or ordering constraints. - -Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring. - -### Delight Opportunities (EXPANSION mode only) -Identify at least 5 "bonus chunk" opportunities (<30 min each) that would make users think "oh nice, they thought of that." Present each delight opportunity as its own individual AskUserQuestion. Never batch them. For each one, describe what it is, why it would delight users, and effort estimate. Then present options: **A)** Add to TODOS.md as a vision item **B)** Skip **C)** Build it now in this PR. - -### Diagrams (mandatory, produce all that apply) -1. System architecture -2. Data flow (including shadow paths) -3. State machine -4. Error flow -5. Deployment sequence -6. Rollback flowchart - -### Stale Diagram Audit -List every ASCII diagram in files this plan touches. Still accurate? - -### Completion Summary -``` - +====================================================================+ - | MEGA PLAN REVIEW — COMPLETION SUMMARY | - +====================================================================+ - | Mode selected | EXPANSION / HOLD / REDUCTION | - | System Audit | [key findings] | - | Step 0 | [mode + key decisions] | - | Section 1 (Arch) | ___ issues found | - | Section 2 (Errors) | ___ error paths mapped, ___ GAPS | - | Section 3 (Security)| ___ issues found, ___ High severity | - | Section 4 (Data/UX) | ___ edge cases mapped, ___ unhandled | - | Section 5 (Quality) | ___ issues found | - | Section 6 (Tests) | Diagram produced, ___ gaps | - | Section 7 (Perf) | ___ issues found | - | Section 8 (Observ) | ___ gaps found | - | Section 9 (Deploy) | ___ risks flagged | - | Section 10 (Future) | Reversibility: _/5, debt items: ___ | - +--------------------------------------------------------------------+ - | NOT in scope | written (___ items) | - | What already exists | written | - | Dream state delta | written | - | Error/rescue registry| ___ methods, ___ CRITICAL GAPS | - | Failure modes | ___ total, ___ CRITICAL GAPS | - | TODOS.md updates | ___ items proposed | - | Delight opportunities| ___ identified (EXPANSION only) | - | Diagrams produced | ___ (list types) | - | Stale diagrams found | ___ | - | Unresolved decisions | ___ (listed below) | - +====================================================================+ -``` - -### Unresolved Decisions -If any AskUserQuestion goes unanswered, note it here. Never silently default. - -## Formatting Rules -* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). -* Label with NUMBER + LETTER (e.g., "3A", "3B"). -* Recommended option always listed first. -* One sentence max per option. -* After each section, pause and wait for feedback. -* Use **CRITICAL GAP** / **WARNING** / **OK** for scannability. - -## Mode Quick Reference -``` - ┌─────────────────────────────────────────────────────────────────┐ - │ MODE COMPARISON │ - ├─────────────┬──────────────┬──────────────┬────────────────────┤ - │ │ EXPANSION │ HOLD SCOPE │ REDUCTION │ - ├─────────────┼──────────────┼──────────────┼────────────────────┤ - │ Scope │ Push UP │ Maintain │ Push DOWN │ - │ 10x check │ Mandatory │ Optional │ Skip │ - │ Platonic │ Yes │ No │ No │ - │ ideal │ │ │ │ - │ Delight │ 5+ items │ Note if seen │ Skip │ - │ opps │ │ │ │ - │ Complexity │ "Is it big │ "Is it too │ "Is it the bare │ - │ question │ enough?" │ complex?" │ minimum?" │ - │ Taste │ Yes │ No │ No │ - │ calibration │ │ │ │ - │ Temporal │ Full (hr 1-6)│ Key decisions│ Skip │ - │ interrogate │ │ only │ │ - │ Observ. │ "Joy to │ "Can we │ "Can we see if │ - │ standard │ operate" │ debug it?" │ it's broken?" │ - │ Deploy │ Infra as │ Safe deploy │ Simplest possible │ - │ standard │ feature scope│ + rollback │ deploy │ - │ Error map │ Full + chaos │ Full │ Critical paths │ - │ │ scenarios │ │ only │ - │ Phase 2/3 │ Map it │ Note it │ Skip │ - │ planning │ │ │ │ - └─────────────┴──────────────┴──────────────┴────────────────────┘ -``` +All standard mega plan review outputs apply: NOT in scope, What already exists, Dream state delta, Error & Rescue Registry, Failure Modes Registry, TODOS.md updates, Delight Opportunities (EXPANSION only), Diagrams, Stale Diagram Audit, Completion Summary, Unresolved Decisions. diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 3074aee..a176dfb 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -2,161 +2,143 @@ name: plan-eng-review version: 1.0.0 description: | - Eng manager-mode plan review. Lock in the execution plan — architecture, - data flow, diagrams, edge cases, test coverage, performance. Walks through - issues interactively with opinionated recommendations. + Eng manager-mode plan review for Cybereum. Lock in the execution plan -- + architecture, data flow, cross-skill consistency, calculation correctness, + edge cases, test coverage, performance. Walks through issues interactively. allowed-tools: - Read - Grep - Glob + - Bash - AskUserQuestion --- -# Plan Review Mode +# Plan Review Mode -- Cybereum Review this plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give me an opinionated recommendation, and ask for my input before assuming a direction. +## Cybereum Architecture Context + +**Platform:** AI-powered capital project governance with temporal knowledge graph + Dyeus AI Engine + +**8 Analytical Skills (domain logic):** +- Schedule Intelligence (P6/XER parsing, DCMA 14-Point, critical path) +- Decision-AI (Schwerpunkt analysis, corrective actions, critic reasoning) +- Risk Engine (risk register, P&I scoring, mitigation strategies) +- EVM Control (CPI/SPI/EAC/TCPI analytics, ANSI/EIA-748 compliance) +- Completion Prediction (Monte Carlo, P50/P80 forecasting, S-curves) +- Reference Class Forecasting (Flyvbjerg RCF, optimism bias correction) +- Executive Reporting (board/PMO/lender reports, audience calibration) +- Sales Intelligence (prospect research, competitive positioning) + +**6 Workflow Skills (development process):** +- ship, review, qa, retro, plan-ceo-review, plan-eng-review + +**Key data flows:** +- Schedule files (XER/XML/CSV) -> Schedule Intelligence -> Completion Prediction +- EVM inputs (BAC/BCWP/ACWP) -> EVM Control -> Executive Reporting +- Risk register -> Risk Engine -> Decision-AI (Schwerpunkt) +- All skills -> Executive Reporting (cross-skill integration) +- All skills -> `.cybereum/` snapshot persistence for trend tracking + +**Tech stack:** TypeScript/Bun, Playwright (browse CLI), Claude Code skills + ## Priority hierarchy If you are running low on context or the user asks you to compress: Step 0 > Test diagram > Opinionated recommendations > Everything else. Never skip Step 0 or the test diagram. ## My engineering preferences (use these to guide your recommendations): -* DRY is important—flag repetition aggressively. -* Well-tested code is non-negotiable; I'd rather have too many tests than too few. -* I want code that's "engineered enough" — not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity). +* DRY is important -- flag repetition aggressively, especially formulas or thresholds duplicated across skills. +* Calculation correctness is non-negotiable. Every EVM, risk, schedule, and RCF formula must be verifiable against its cited standard. +* Well-tested code is non-negotiable; calculation-heavy code needs boundary value tests. +* I want code that's "engineered enough" -- not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity). * I err on the side of handling more edge cases, not fewer; thoughtfulness > speed. -* Bias toward explicit over clever. +* Bias toward explicit over clever. Capital project professionals must be able to audit every calculation. * Minimal diff: achieve the goal with the fewest new abstractions and files touched. +* Cross-skill consistency: the same concept must be defined identically everywhere. ## Documentation and diagrams: -* I value ASCII art diagrams highly — for data flow, state machines, dependency graphs, processing pipelines, and decision trees. Use them liberally in plans and design docs. -* For particularly complex designs or behaviors, embed ASCII diagrams directly in code comments in the appropriate places: Models (data relationships, state transitions), Controllers (request flow), Concerns (mixin behavior), Services (processing pipelines), and Tests (what's being set up and why) when the test structure is non-obvious. -* **Diagram maintenance is part of the change.** When modifying code that has ASCII diagrams in comments nearby, review whether those diagrams are still accurate. Update them as part of the same commit. Stale diagrams are worse than no diagrams — they actively mislead. Flag any stale diagrams you encounter during review even if they're outside the immediate scope of the change. +* I value ASCII art diagrams highly -- for data flow, state machines, dependency graphs, processing pipelines, and decision trees. +* Diagram maintenance is part of the change. ## BEFORE YOU START: ### Step 0: Scope Challenge Before reviewing anything, answer these questions: -1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones? -2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep. -3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts. +1. **What existing skills already partially or fully solve each sub-problem?** Can we extend an existing skill rather than building new? Map every sub-problem to existing skill capabilities. +2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. +3. **Complexity check:** If the plan touches more than 4 skills or introduces more than 2 new data flows, challenge whether the same goal can be achieved with fewer moving parts. +4. **Cross-skill impact:** Which skills will be affected by this change? Are their snapshot schemas compatible? Will trend tracking break? Then ask if I want one of three options: -1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that. -2. **BIG CHANGE:** Work through interactively, one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. -3. **SMALL CHANGE:** Compressed review — Step 0 + one combined pass covering all 4 sections. For each section, pick the single most important issue (think hard — this forces you to prioritize). Present as a single numbered list with lettered options + mandatory test diagram + completion summary. One AskUserQuestion round at the end. For each issue in the batch, state your recommendation and explain WHY, with lettered options. +1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version. +2. **BIG CHANGE:** Work through interactively, one section at a time. +3. **SMALL CHANGE:** Compressed review -- Step 0 + one combined pass. -**Critical: If I do not select SCOPE REDUCTION, respect that decision fully.** Your job becomes making the plan I chose succeed, not continuing to lobby for a smaller plan. Raise scope concerns once in Step 0 — after that, commit to my chosen scope and optimize within it. Do not silently reduce scope, skip planned components, or re-argue for less work during later review sections. +**Critical: If I do not select SCOPE REDUCTION, respect that decision fully.** ## Review Sections (after scope is agreed) ### 1. Architecture review Evaluate: -* Overall system design and component boundaries. -* Dependency graph and coupling concerns. -* Data flow patterns and potential bottlenecks. -* Scaling characteristics and single points of failure. -* Security architecture (auth, data access, API boundaries). -* Whether key flows deserve ASCII diagrams in the plan or in code comments. -* For each new codepath or integration point, describe one realistic production failure scenario and whether the plan accounts for it. +* Overall system design and component boundaries +* Cross-skill data flow and dependency graph +* Temporal knowledge graph integrity -- do mutations preserve causal chains? +* Calculation engine correctness -- do formulas match cited standards? +* Schedule parsing robustness -- XER encoding, XML entity safety, CSV edge cases +* Snapshot schema compatibility -- will existing `.cybereum/` data still load? +* Security architecture (client data confidentiality, LLM trust boundaries) +* For each new codepath: describe one realistic production failure scenario -**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. +**STOP.** AskUserQuestion for each issue. ### 2. Code quality review Evaluate: -* Code organization and module structure. -* DRY violations—be aggressive here. -* Error handling patterns and missing edge cases (call these out explicitly). -* Technical debt hotspots. -* Areas that are over-engineered or under-engineered relative to my preferences. -* Existing ASCII diagrams in touched files — are they still accurate after this change? +* Code organization across skills -- does new code fit existing patterns? +* DRY violations across skills -- same formula, threshold, or methodology defined in multiple places +* Cross-skill terminology consistency -- same concept, same name everywhere +* Error handling patterns and missing edge cases +* Industry standard compliance -- are cited standards (DCMA, AACE, Flyvbjerg) correctly applied? +* Existing ASCII diagrams in touched files -- still accurate? -**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. +**STOP.** AskUserQuestion for each issue. ### 3. Test review -Make a diagram of all new UX, new data flow, new codepaths, and new branching if statements or outcomes. For each, note what is new about the features discussed in this branch and plan. Then, for each new item in the diagram, make sure there is a JS or Rails test. +Make a diagram of all new calculations, data flows, codepaths, and branching. For each: +* What type of test covers it? +* What is the happy path test? (Known inputs -> expected outputs) +* What is the boundary value test? (CPI=0, BAC=0, zero-duration activity, 100% float) +* What is the malformed input test? (Corrupt XER, missing fields, wrong encoding) +* What is the cross-skill consistency test? (Same metric, two skills, same result) -For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in CLAUDE.md. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against. Then use AskUserQuestion to confirm the eval scope with the user. +Test ambition check: +* What's the test that would make you confident shipping at 2am on a Friday? +* What's the test a hostile QA engineer would write to break the calculation engine? -**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. +**STOP.** AskUserQuestion for each issue. ### 4. Performance review Evaluate: -* N+1 queries and database access patterns. -* Memory-usage concerns. -* Caching opportunities. -* Slow or high-complexity code paths. - -**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. - -## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options. - -## For each issue you find -For every specific issue (bug, smell, design concern, or risk): -* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. -* Describe the problem concretely, with file and line references. -* Present 2–3 options, including "do nothing" where that's reasonable. -* For each option, specify in one line: effort, risk, and maintenance burden. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. -* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.). -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). -* **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. - -## Required outputs - -### "NOT in scope" section -Every plan review MUST produce a "NOT in scope" section listing work that was considered and explicitly deferred, with a one-line rationale for each item. +* Monte Carlo simulation performance (10,000 iterations -- acceptable latency?) +* Large schedule file parsing (10,000+ activity XER files) +* Portfolio-level analysis scalability (10+ projects simultaneously) +* Snapshot file I/O -- `.cybereum/` directory with many JSON files +* Risk register generation with 15+ external risks per category -### "What already exists" section -List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them. +**STOP.** AskUserQuestion for each issue. -### TODOS.md updates -After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. +## CRITICAL RULE -- How to ask questions +Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY. -For each TODO, describe: -* **What:** One-line description of the work. -* **Why:** The concrete problem it solves or value it unlocks. -* **Pros:** What you gain by doing this work. -* **Cons:** Cost, complexity, or risks of doing it. -* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start. -* **Depends on / blocked by:** Any prerequisites or ordering constraints. - -Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring. - -Do NOT just append vague bullet points. A TODO without context is worse than no TODO — it creates false confidence that the idea was captured while actually losing the reasoning. - -### Diagrams -The plan itself should use ASCII diagrams for any non-trivial data flow, state machine, or processing pipeline. Additionally, identify which files in the implementation should get inline ASCII diagram comments — particularly Models with complex state transitions, Services with multi-step pipelines, and Concerns with non-obvious mixin behavior. - -### Failure modes -For each new codepath identified in the test review diagram, list one realistic way it could fail in production (timeout, nil reference, race condition, stale data, etc.) and whether: -1. A test covers that failure -2. Error handling exists for it -3. The user would see a clear error or a silent failure - -If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**. - -### Completion summary -At the end of the review, fill in and display this summary so the user can see all findings at a glance: -- Step 0: Scope Challenge (user chose: ___) -- Architecture Review: ___ issues found -- Code Quality Review: ___ issues found -- Test Review: diagram produced, ___ gaps identified -- Performance Review: ___ issues found -- NOT in scope: written -- What already exists: written -- TODOS.md updates: ___ items proposed to user -- Failure modes: ___ critical gaps flagged - -## Retrospective learning -Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic. +## Required outputs +* "NOT in scope" section +* "What already exists" section (map to existing skills) +* TODOS.md updates (one per AskUserQuestion) +* Diagrams (data flow, cross-skill integration, calculation pipeline) +* Failure modes (for each new calculation, one realistic way it could produce wrong results) +* Completion summary ## Formatting rules * NUMBER issues (1, 2, 3...) and give LETTERS for options (A, B, C...). -* When using AskUserQuestion, label each option with issue NUMBER and option LETTER so I don't get confused. * Recommended option is always listed first. -* Keep each option to one sentence max. I should be able to pick in under 5 seconds. +* Keep each option to one sentence max. * After each review section, pause and ask for feedback before moving on. - -## Unresolved decisions -If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option. diff --git a/qa/SKILL.md b/qa/SKILL.md index 9da05fa..dd05769 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -2,19 +2,21 @@ name: qa version: 1.0.0 description: | - Systematically QA test a web application. Use when asked to "qa", "QA", "test this site", - "find bugs", "dogfood", or review quality. Three modes: full (systematic exploration), - quick (30-second smoke test), regression (compare against baseline). Produces structured - report with health score, screenshots, and repro steps. + Systematically QA test the Cybereum platform and its analytical skills. Use when asked to "qa", + "QA", "test the skills", "validate calculations", or review quality. Three modes: full (systematic + validation of all skills), quick (smoke test core calculations), regression (compare against baseline). + Produces structured report with health score and findings. allowed-tools: - Bash - Read - Write + - Grep + - Glob --- -# /qa: Systematic QA Testing +# /qa: Systematic QA Testing for Cybereum -You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence. +You are a QA engineer for the Cybereum capital project governance platform. Test every analytical skill systematically -- validate calculations, check cross-skill consistency, verify data flows, and ensure output quality. Produce a structured report with evidence. ## Setup @@ -22,27 +24,16 @@ You are a QA engineer. Test web applications like a real user — click everythi | Parameter | Default | Override example | |-----------|---------|-----------------| -| Target URL | (required) | `https://myapp.com`, `http://localhost:3000` | -| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` | -| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` | -| Scope | Full app | `Focus on the billing page` | -| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` | - -**Find the browse binary:** - -```bash -B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null) -if [ -z "$B" ]; then - echo "ERROR: browse binary not found" - exit 1 -fi -``` +| Scope | Full platform | `Focus on EVM calculations`, `Test schedule parsing only` | +| Mode | full | `--quick`, `--regression .cybereum/qa-reports/baseline.json` | +| Output dir | `.cybereum/qa-reports/` | `Output to /tmp/qa` | +| Test data | Sample data | `Use this XER file`, `Test with BAC=$150M` | **Create output directories:** ```bash -REPORT_DIR=".gstack/qa-reports" -mkdir -p "$REPORT_DIR/screenshots" +REPORT_DIR=".cybereum/qa-reports" +mkdir -p "$REPORT_DIR" ``` --- @@ -50,246 +41,222 @@ mkdir -p "$REPORT_DIR/screenshots" ## Modes ### Full (default) -Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size. +Systematic validation of all 8 analytical skills. Verify calculations, cross-skill consistency, output format compliance. Produces health score. ### Quick (`--quick`) -30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation. +Smoke test: verify core EVM formulas, risk scoring math, schedule health scoring, and completion prediction multipliers. 2-minute check. ### Regression (`--regression `) -Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report. +Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? --- ## Workflow -### Phase 1: Initialize +### Phase 1: Skill Inventory -1. Find browse binary (see Setup above) -2. Create output directories -3. Copy report template from `qa/templates/qa-report-template.md` to output dir -4. Start timer for duration tracking +Read all 8 Cybereum skill SKILL.md files and verify they exist: -### Phase 2: Authenticate (if needed) +```bash +for skill in cybereum-schedule-intelligence cybereum-decision-ai cybereum-risk-engine cybereum-evm-control cybereum-completion-prediction cybereum-reference-class cybereum-executive-reporting cybereum-sales-intelligence; do + if [ -f "$skill/SKILL.md" ]; then + echo "OK: $skill" + else + echo "MISSING: $skill" + fi +done +``` -**If the user specified auth credentials:** +### Phase 2: Calculation Verification -```bash -$B goto -$B snapshot -i # find the login form -$B fill @e3 "user@example.com" -$B fill @e4 "[REDACTED]" # NEVER include real passwords in report -$B click @e5 # submit -$B snapshot -D # verify login succeeded +Test mathematical correctness of each analytical skill: + +#### EVM Control Calculations +Verify with known inputs: +``` +Given: BAC=$100M, BCWS=$55M, BCWP=$50M, ACWP=$60M +Expected: + CV = $50M - $60M = -$10M + SV = $50M - $55M = -$5M + CPI = $50M / $60M = 0.833 + SPI = $50M / $55M = 0.909 + EAC (Method 1) = $100M / 0.833 = $120.0M + EAC (Method 3) = $60M + ($100M - $50M) / (0.833 * 0.909) = $126.0M + VAC = $100M - $120.0M = -$20.0M + TCPI = ($100M - $50M) / ($100M - $60M) = 1.25 ``` -**If the user provided a cookie file:** +Check: Do the formulas in the SKILL.md produce these results? -```bash -$B cookie-import cookies.json -$B goto +#### Risk Scoring +Verify P x I matrix: +``` +Given: Probability=4 (High), Impact=5 (Catastrophic) +Expected: Score = 20 (Priority risk, requires active mitigation) ``` -**If 2FA/OTP is required:** Ask the user for the code and wait. +Check: Is score >= 12 correctly identified as requiring mitigation? -**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue." +#### Schedule Health Scoring +Verify DCMA 14-Point thresholds: +``` +Check 1 (Logic): <5% open ends = pass. Skill says "Critical if >10%" +Check 7 (Negative float): 0% = pass. Skill says "Critical -- always flag" +``` -### Phase 3: Orient +Check: Are all 14 thresholds correctly stated? -Get a map of the application: +#### Completion Prediction Multipliers +Verify parametric table consistency: +``` +For remaining < 3 months, Low uncertainty: + P20 multiplier (0.97) < P50 multiplier (1.05) < P80 multiplier (1.12) +``` -```bash -$B goto -$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png" -$B links # map navigation structure -$B console --errors # any errors on landing? +Check: Do all rows maintain P20 < P50 < P80? (If not, the confidence intervals are inverted.) + +#### Reference Class Benchmarks +Verify internal consistency: +``` +For each project type: + Mean overrun <= P80 overrun (P80 is more conservative than mean) + Median <= Mean (right-skewed distributions have mean > median) ``` -**Detect framework** (note in report metadata): -- `__next` in HTML or `_next/data` requests → Next.js -- `csrf-token` meta tag → Rails -- `wp-content` in URLs → WordPress -- Client-side routing with no page reloads → SPA +Check: Do all benchmark rows satisfy these constraints? -**For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead. +### Phase 3: Cross-Skill Consistency -### Phase 4: Explore +Verify that shared concepts are defined consistently: -Visit pages systematically. At each page: +1. **Terminology check**: Grep all SKILL.md files for key terms and verify consistent usage: + - "P50" / "P80" -- same meaning everywhere? + - "CPI" / "SPI" -- same formulas everywhere? + - "Critical" risk threshold -- same score threshold everywhere? + - Health score ranges -- same tier boundaries? -```bash -$B goto -$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png" -$B console --errors -``` +2. **JSON snapshot schema check**: Verify that skills that share data use compatible schemas: + - EVM snapshots referenced by Executive Reporting + - Risk snapshots referenced by Decision-AI + - Schedule snapshots referenced by Completion Prediction -Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`): +3. **Reference file consistency**: Check that reference file paths mentioned in SKILL.md files point to plausible locations. -1. **Visual scan** — Look at the annotated screenshot for layout issues -2. **Interactive elements** — Click buttons, links, controls. Do they work? -3. **Forms** — Fill and submit. Test empty, invalid, edge cases -4. **Navigation** — Check all paths in and out -5. **States** — Empty state, loading, error, overflow -6. **Console** — Any new JS errors after interactions? -7. **Responsiveness** — Check mobile viewport if relevant: - ```bash - $B viewport 375x812 - $B screenshot "$REPORT_DIR/screenshots/page-mobile.png" - $B viewport 1280x720 - ``` +### Phase 4: Output Format Compliance -**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy). +For each skill, verify its output templates are complete: -**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible? +1. **Schedule Intelligence**: Has Executive Summary, Health Scorecard, Critical Path Summary, Top 10 Risk Activities, Recommended Actions? +2. **Decision-AI**: Has Schwerpunkt identification, Corrective Actions table, Critic analysis, Decision Brief? +3. **Risk Engine**: Has Executive Risk Summary, Risk Register Table, Heatmap, Mitigation Action Plan? +4. **EVM Control**: Has Performance Dashboard with all metrics, Variance Attribution, Trend Analysis? +5. **Completion Prediction**: Has P20/P50/P80 forecast, Scenario Comparison, S-Curve narrative, Confidence Statement? +6. **Reference Class**: Has RCAE calculation, Inside-View Adjustments, Optimism Bias Report, Contingency Assessment? +7. **Executive Reporting**: Has all report type structures, audience calibration rules, quality checklist? +8. **Sales Intelligence**: Has Prospect Research protocol, Outreach templates, Pitch deck structure, Competitive table? -### Phase 5: Document +### Phase 5: Document Findings -Document each issue **immediately when found** — don't batch them. +Document each issue immediately when found. -**Two evidence tiers:** +**Issue severity:** -**Interactive bugs** (broken flows, dead buttons, form failures): -1. Take a screenshot before the action -2. Perform the action -3. Take a screenshot showing the result -4. Use `snapshot -D` to show what changed -5. Write repro steps referencing screenshots +| Severity | Definition | Examples | +|----------|------------|----------| +| **critical** | Wrong calculation, incorrect formula, data integrity violation | CPI formula inverted, P80 < P50, risk score != PxI | +| **high** | Missing required output section, broken cross-skill reference | Decision-AI missing Critic step, EVM dashboard missing TCPI | +| **medium** | Inconsistent terminology, threshold drift between skills | "Critical" means score>=12 in one skill, score>=16 in another | +| **low** | Minor formatting, typo in methodology description | Inconsistent header levels, missing reference file | -```bash -$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png" -$B click @e5 -$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png" -$B snapshot -D -``` +### Phase 6: Health Score -**Static bugs** (typos, layout issues, missing images): -1. Take a single annotated screenshot showing the problem -2. Describe what's wrong +Compute health score using weighted categories: -```bash -$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png" -``` +| Category | Weight | Scoring | +|----------|--------|---------| +| Calculation Correctness | 30% | Start 100, -25 per critical, -15 per high | +| Cross-Skill Consistency | 20% | Start 100, -15 per inconsistency | +| Output Completeness | 20% | Start 100, -10 per missing section | +| Methodology Adherence | 15% | Start 100, -15 per deviation from cited standard | +| Data Flow Integrity | 15% | Start 100, -20 per broken reference | -**Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`. - -### Phase 6: Wrap Up - -1. **Compute health score** using the rubric below -2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues -3. **Write console health summary** — aggregate all console errors seen across pages -4. **Update severity counts** in the summary table -5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework -6. **Save baseline** — write `baseline.json` with: - ```json - { - "date": "YYYY-MM-DD", - "url": "", - "healthScore": N, - "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }], - "categoryScores": { "console": N, "links": N, ... } - } - ``` - -**Regression mode:** After writing the report, load the baseline file. Compare: -- Health score delta -- Issues fixed (in baseline but not current) -- New issues (in current but not baseline) -- Append the regression section to the report +Final score = weighted average. Tiers: +- 85-100: Healthy +- 70-84: Moderate issues +- 50-69: Significant issues +- <50: Critical -- needs immediate attention --- -## Health Score Rubric - -Compute each category score (0-100), then take the weighted average. - -### Console (weight: 15%) -- 0 errors → 100 -- 1-3 errors → 70 -- 4-10 errors → 40 -- 10+ errors → 10 - -### Links (weight: 10%) -- 0 broken → 100 -- Each broken link → -15 (minimum 0) - -### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility) -Each category starts at 100. Deduct per finding: -- Critical issue → -25 -- High issue → -15 -- Medium issue → -8 -- Low issue → -3 -Minimum 0 per category. - -### Weights -| Category | Weight | -|----------|--------| -| Console | 15% | -| Links | 10% | -| Visual | 10% | -| Functional | 20% | -| UX | 15% | -| Performance | 10% | -| Content | 5% | -| Accessibility | 15% | - -### Final Score -`score = Σ (category_score × weight)` +## Output Structure ---- +``` +.cybereum/qa-reports/ +├── qa-report-{YYYY-MM-DD}.md # Structured report +└── baseline.json # For regression mode +``` -## Framework-Specific Guidance +### Report Template -### Next.js -- Check console for hydration errors (`Hydration failed`, `Text content did not match`) -- Monitor `_next/data` requests in network — 404s indicate broken data fetching -- Test client-side navigation (click links, don't just `goto`) — catches routing issues -- Check for CLS (Cumulative Layout Shift) on pages with dynamic content +```markdown +# Cybereum QA Report -### Rails -- Check for N+1 query warnings in console (if development mode) -- Verify CSRF token presence in forms -- Test Turbo/Stimulus integration — do page transitions work smoothly? -- Check for flash messages appearing and dismissing correctly +| Field | Value | +|-------|-------| +| **Date** | {DATE} | +| **Scope** | {SCOPE or "Full platform"} | +| **Mode** | {full / quick / regression} | +| **Skills tested** | {COUNT}/8 | -### WordPress -- Check for plugin conflicts (JS errors from different plugins) -- Verify admin bar visibility for logged-in users -- Test REST API endpoints (`/wp-json/`) -- Check for mixed content warnings (common with WP) +## Health Score: {SCORE}/100 -### General SPA (React, Vue, Angular) -- Use `snapshot -i` for navigation — `links` command misses client-side routes -- Check for stale state (navigate away and back — does data refresh?) -- Test browser back/forward — does the app handle history correctly? -- Check for memory leaks (monitor console after extended use) +| Category | Score | +|----------|-------| +| Calculation Correctness | {0-100} | +| Cross-Skill Consistency | {0-100} | +| Output Completeness | {0-100} | +| Methodology Adherence | {0-100} | +| Data Flow Integrity | {0-100} | ---- +## Top 3 Things to Fix -## Important Rules +1. **{ISSUE-NNN}: {title}** -- {one-line description} +2. **{ISSUE-NNN}: {title}** -- {one-line description} +3. **{ISSUE-NNN}: {title}** -- {one-line description} -1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions. -2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke. -3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps. -4. **Write incrementally.** Append each issue to the report as you find it. Don't batch. -5. **Never read source code.** Test as a user, not a developer. -6. **Check console after every interaction.** JS errors that don't surface visually are still bugs. -7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end. -8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions. -9. **Never delete output files.** Screenshots and reports accumulate — that's intentional. -10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses. +## Summary ---- +| Severity | Count | +|----------|-------| +| Critical | 0 | +| High | 0 | +| Medium | 0 | +| Low | 0 | +| **Total** | **0** | -## Output Structure +## Issues +### ISSUE-001: {Short title} + +| Field | Value | +|-------|-------| +| **Severity** | critical / high / medium / low | +| **Skill** | {which skill} | +| **Category** | calculation / consistency / completeness / methodology / data-flow | + +**Description:** {What is wrong, expected vs actual.} + +**Evidence:** {Formula, threshold, or output that demonstrates the issue.} + +**Fix:** {Specific correction needed.} ``` -.gstack/qa-reports/ -├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report -├── screenshots/ -│ ├── initial.png # Landing page annotated screenshot -│ ├── issue-001-step-1.png # Per-issue evidence -│ ├── issue-001-result.png -│ └── ... -└── baseline.json # For regression mode -``` -Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md` +--- + +## Important Rules + +1. **Verify calculations with actual numbers.** Don't just read formulas -- plug in values and check. +2. **Cross-reference across skills.** The same metric must mean the same thing everywhere. +3. **Check cited standards.** If a skill says "per DCMA 14-Point" -- verify the threshold matches DCMA. +4. **Document immediately.** Append each issue to the report as you find it. Don't batch. +5. **Save baseline.** Always save a baseline.json for future regression comparison. diff --git a/qa/references/issue-taxonomy.md b/qa/references/issue-taxonomy.md index 05c5741..03806fe 100644 --- a/qa/references/issue-taxonomy.md +++ b/qa/references/issue-taxonomy.md @@ -1,85 +1,68 @@ -# QA Issue Taxonomy +# Cybereum QA Issue Taxonomy ## Severity Levels | Severity | Definition | Examples | |----------|------------|----------| -| **critical** | Blocks a core workflow, causes data loss, or crashes the app | Form submit causes error page, checkout flow broken, data deleted without confirmation | -| **high** | Major feature broken or unusable, no workaround | Search returns wrong results, file upload silently fails, auth redirect loop | -| **medium** | Feature works but with noticeable problems, workaround exists | Slow page load (>5s), form validation missing but submit still works, layout broken on mobile only | -| **low** | Minor cosmetic or polish issue | Typo in footer, 1px alignment issue, hover state inconsistent | +| **critical** | Wrong calculation output, data integrity violation, or skill produces incorrect recommendations | EVM CPI formula inverted, P80 < P50, risk score != PxI, Schwerpunkt without Critic | +| **high** | Major skill section missing or broken, cross-skill inconsistency that affects outputs | EVM dashboard missing TCPI, schedule health score uses wrong thresholds | +| **medium** | Skill works but with inconsistencies or gaps, methodology deviation | Terminology drift between skills, threshold defined differently in two places | +| **low** | Minor formatting, documentation gap, or cosmetic issue | Inconsistent header levels, missing reference file path | ## Categories -### 1. Visual/UI -- Layout breaks (overlapping elements, clipped text, horizontal scrollbar) -- Broken or missing images -- Incorrect z-index (elements appearing behind others) -- Font/color inconsistencies -- Animation glitches (jank, incomplete transitions) -- Alignment issues (off-grid, uneven spacing) -- Dark mode / theme issues +### 1. Calculation Correctness +- EVM formula errors (CPI, SPI, EAC, TCPI, VAC, CV, SV) +- Risk scoring errors (P x I matrix, contingency calculations) +- Schedule metric errors (float calculations, CPLI, BEI) +- Completion prediction errors (multiplier tables, Monte Carlo parameters) +- Reference class errors (overrun percentages, RCAE calculations) +- Division by zero in metric calculations +- Unit mismatches (working days vs calendar days, $K vs $M) -### 2. Functional -- Broken links (404, wrong destination) -- Dead buttons (click does nothing) -- Form validation (missing, wrong, bypassed) -- Incorrect redirects -- State not persisting (data lost on refresh, back button) -- Race conditions (double-submit, stale data) -- Search returning wrong or no results +### 2. Cross-Skill Consistency +- Same metric defined differently across skills +- Threshold values that drift (e.g., "critical" score cutoff) +- JSON snapshot schemas that don't align +- Terminology inconsistency (P80 vs 80th percentile) +- Reference file paths that don't match actual locations -### 3. UX -- Confusing navigation (no breadcrumbs, dead ends) -- Missing loading indicators (user doesn't know something is happening) -- Slow interactions (>500ms with no feedback) -- Unclear error messages ("Something went wrong" with no detail) -- No confirmation before destructive actions -- Inconsistent interaction patterns across pages -- Dead ends (no way back, no next action) +### 3. Methodology Compliance +- DCMA 14-Point thresholds deviating from standard +- ANSI/EIA-748 EVMS guidelines incorrectly stated +- AACE standard references that don't match source +- Flyvbjerg benchmark data that doesn't match published research +- GAO Schedule Assessment Guide criteria incorrectly applied -### 4. Content -- Typos and grammar errors -- Outdated or incorrect text -- Placeholder / lorem ipsum text left in -- Truncated text (cut off without ellipsis or "more") -- Wrong labels on buttons or form fields -- Missing or unhelpful empty states +### 4. Output Completeness +- Missing required sections in skill output templates +- Incomplete tables (missing columns or rows) +- Output format that doesn't match the stated template +- Missing Executive Summary or Recommended Actions +- Decision Brief missing required fields -### 5. Performance -- Slow page loads (>3 seconds) -- Janky scrolling (dropped frames) -- Layout shifts (content jumping after load) -- Excessive network requests (>50 on a single page) -- Large unoptimized images -- Blocking JavaScript (page unresponsive during load) +### 5. Data Flow Integrity +- Schedule data not flowing correctly to Completion Prediction +- EVM metrics not available to Executive Reporting +- Risk register not feeding Decision-AI Schwerpunkt analysis +- Snapshot persistence not saving all required fields +- Trend tracking not loading prior snapshots correctly -### 6. Console/Errors -- JavaScript exceptions (uncaught errors) -- Failed network requests (4xx, 5xx) -- Deprecation warnings (upcoming breakage) -- CORS errors -- Mixed content warnings (HTTP resources on HTTPS) -- CSP violations +### 6. Industry Standard Compliance +- EVM formulas not matching ANSI/EIA-748 +- Schedule checks not matching DCMA 14-Point Assessment +- Cost contingency not following AACE RP 40R-08 +- Reference class methodology not following Flyvbjerg/UK Treasury +- Reporting format not following AACE RP 11R-88 -### 7. Accessibility -- Missing alt text on images -- Unlabeled form inputs -- Keyboard navigation broken (can't tab to elements) -- Focus traps (can't escape a modal or dropdown) -- Missing or incorrect ARIA attributes -- Insufficient color contrast -- Content not reachable by screen reader +## Per-Skill Validation Checklist -## Per-Page Exploration Checklist +For each analytical skill during a QA session: -For each page visited during a QA session: - -1. **Visual scan** — Take annotated screenshot (`snapshot -i -a -o`). Look for layout issues, broken images, alignment. -2. **Interactive elements** — Click every button, link, and control. Does each do what it says? -3. **Forms** — Fill and submit. Test empty submission, invalid data, edge cases (long text, special characters). -4. **Navigation** — Check all paths in/out. Breadcrumbs, back button, deep links, mobile menu. -5. **States** — Check empty state, loading state, error state, full/overflow state. -6. **Console** — Run `console --errors` after interactions. Any new JS errors or failed requests? -7. **Responsiveness** — If relevant, check mobile and tablet viewports. -8. **Auth boundaries** — What happens when logged out? Different user roles? +1. **Formula check** -- Plug in known values and verify outputs match expected results +2. **Threshold check** -- Verify all stated thresholds match cited industry standards +3. **Output template check** -- Verify all required output sections are present and complete +4. **Cross-reference check** -- Verify shared concepts are consistent with other skills +5. **Snapshot schema check** -- Verify JSON persistence captures all required fields +6. **Trend tracking check** -- Verify delta calculations produce correct results +7. **Edge case check** -- Test with boundary values (zero, negative, maximum) diff --git a/retro/SKILL.md b/retro/SKILL.md index ad5a758..46d747e 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -2,9 +2,10 @@ name: retro version: 2.0.0 description: | - Weekly engineering retrospective. Analyzes commit history, work patterns, - and code quality metrics with persistent history and trend tracking. + Weekly engineering retrospective for Cybereum. Analyzes commit history, work + patterns, and code quality metrics with persistent history and trend tracking. Team-aware: breaks down per-person contributions with praise and growth areas. + Cybereum-aware: tracks skill development velocity and cross-skill consistency. allowed-tools: - Bash - Read @@ -12,9 +13,9 @@ allowed-tools: - Glob --- -# /retro — Weekly Engineering Retrospective +# /retro -- Weekly Engineering Retrospective (Cybereum) -Generates a comprehensive engineering retrospective analyzing commit history, work patterns, and code quality metrics. Team-aware: identifies the user running the command, then analyzes every contributor with per-person praise and growth opportunities. Designed for a senior IC/CTO-level builder using Claude Code as a force multiplier. +Generates a comprehensive engineering retrospective analyzing commit history, work patterns, and code quality metrics for the Cybereum capital project governance platform. Team-aware: identifies the user running the command, then analyzes every contributor with per-person praise and growth opportunities. Cybereum-aware: tracks which analytical and workflow skills were modified and flags cross-skill consistency risks. ## User-invocable When the user types `/retro`, run this skill. @@ -343,6 +344,25 @@ Narrative covering: - Hotspot analysis (are the same files churning?) - Any XL PRs that should have been split +### Cybereum Skill Development +(Cybereum-specific section) + +Analyze which skills were modified this period: + +```bash +# Which Cybereum skills were touched? +git log origin/main --since="" --format="" --name-only | grep -E "^cybereum-" | sort -u +# Which workflow skills were touched? +git log origin/main --since="" --format="" --name-only | grep -E "^(ship|review|qa|retro|plan-)" | sort -u +``` + +Report: +- **Analytical skills modified**: List which of the 8 were touched and what changed +- **Workflow skills modified**: List which were touched +- **Cross-skill consistency risk**: If >2 analytical skills were modified, flag potential terminology/threshold drift +- **Skill coverage**: How many of the 8 analytical skills have been touched in the last 30 days? Skills untouched for 30+ days may be falling behind +- **Snapshot schema changes**: Were any `.cybereum/` JSON schemas modified? Flag backward compatibility risk + ### Focus & Highlights (from Step 8) - Focus score with interpretation diff --git a/review/SKILL.md b/review/SKILL.md index ea6f7b7..bc5d31c 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -2,8 +2,9 @@ name: review version: 1.0.0 description: | - Pre-landing PR review. Analyzes diff against main for SQL safety, LLM trust - boundary violations, conditional side effects, and other structural issues. + Pre-landing PR review for Cybereum. Analyzes diff against main for calculation + integrity, graph consistency, cross-skill coherence, LLM trust boundary + violations, and structural issues in capital project analytics code. allowed-tools: - Bash - Read @@ -16,21 +17,21 @@ allowed-tools: # Pre-Landing PR Review -You are running the `/review` workflow. Analyze the current branch's diff against main for structural issues that tests don't catch. +You are running the `/review` workflow for Cybereum. Analyze the current branch's diff against main for structural issues that tests don't catch -- with special attention to calculation correctness, data consistency, and cross-skill coherence. --- ## Step 1: Check branch 1. Run `git branch --show-current` to get the current branch. -2. If on `main`, output: **"Nothing to review — you're on main or have no changes against main."** and stop. +2. If on `main`, output: **"Nothing to review -- you're on main or have no changes against main."** and stop. 3. Run `git fetch origin main --quiet && git diff origin/main --stat` to check if there's a diff. If no diff, output the same message and stop. --- ## Step 2: Read the checklist -Read `.claude/skills/review/checklist.md`. +Read `review/checklist.md` (or `.claude/skills/review/checklist.md`). **If the file cannot be read, STOP and report the error.** Do not proceed without the checklist. @@ -52,18 +53,18 @@ Run `git diff origin/main` to get the full diff. This includes both committed an Apply the checklist against the diff in two passes: -1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary -2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend +1. **Pass 1 (CRITICAL):** Data & Calculation Integrity, Graph & Data Consistency, LLM Output Trust Boundary +2. **Pass 2 (INFORMATIONAL):** Skill Content Quality, Cross-Skill Consistency, Conditional Side Effects, Dead Code, LLM Prompt Issues, Test Gaps, Type Coercion, File Parsing Safety -Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section. +Follow the output format specified in the checklist. Respect the suppressions -- do NOT flag items listed in the "DO NOT flag" section. --- ## Step 5: Output findings -**Always output ALL findings** — both critical and informational. The user must see every issue. +**Always output ALL findings** -- both critical and informational. The user must see every issue. -- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip). +- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive -- skip). After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed. - If only non-critical issues found: output findings. No further action needed. - If no issues found: output `Pre-Landing Review: No issues found.` @@ -76,3 +77,4 @@ Follow the output format specified in the checklist. Respect the suppressions - **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs. - **Be terse.** One line problem, one line fix. No preamble. - **Only flag real problems.** Skip anything that's fine. +- **Cross-check formulas.** When reviewing EVM, risk, or schedule skills, verify calculations match their stated methodology. diff --git a/review/checklist.md b/review/checklist.md index e321890..6f1ccb4 100644 --- a/review/checklist.md +++ b/review/checklist.md @@ -2,10 +2,10 @@ ## Instructions -Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems. +Review the `git diff origin/main` output for the issues listed below. Be specific -- cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems. **Two-pass review:** -- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`. +- **Pass 1 (CRITICAL):** Run Data & Calculation Integrity, Graph Consistency, and LLM Output Trust Boundary first. These can block `/ship`. - **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block. **Output format:** @@ -30,68 +30,80 @@ Be terse. For each issue: one line describing the problem, one line with the fix ## Review Categories -### Pass 1 — CRITICAL - -#### SQL & Data Safety -- String interpolation in SQL (even if values are `.to_i`/`.to_f` — use `sanitize_sql_array` or Arel) -- TOCTOU races: check-then-set patterns that should be atomic `WHERE` + `update_all` -- `update_column`/`update_columns` bypassing validations on fields that have or should have constraints -- N+1 queries: `.includes()` missing for associations used in loops/views (especially avatar, attachments) - -#### Race Conditions & Concurrency -- Read-check-write without uniqueness constraint or `rescue RecordNotUnique; retry` (e.g., `where(hash:).first` then `save!` without handling concurrent insert) -- `find_or_create_by` on columns without unique DB index — concurrent calls can create duplicates -- Status transitions that don't use atomic `WHERE old_status = ? UPDATE SET new_status` — concurrent updates can skip or double-apply transitions -- `html_safe` on user-controlled data (XSS) — check any `.html_safe`, `raw()`, or string interpolation into `html_safe` output +### Pass 1 -- CRITICAL + +#### Data & Calculation Integrity +- EVM formula errors: CPI, SPI, EAC, TCPI, VAC calculations must match ANSI/EIA-748 definitions exactly +- Risk scoring errors: P x I matrix calculations, contingency formulas, Monte Carlo parameter ranges +- Schedule metric errors: Float calculations, CPLI formula, BEI formula, DCMA 14-Point thresholds +- Reference Class data errors: Overrun percentages, benchmark values must match cited sources (Flyvbjerg, GAO, RAND) +- Division by zero: Any metric calculation where the denominator could be zero (ACWP=0 for CPI, etc.) +- Unit mismatches: Working days vs calendar days, cost in $K vs $M, percentages as decimals vs whole numbers +- Date arithmetic: Business day calculations that don't account for calendars, timezone-naive date comparisons + +#### Graph & Data Consistency +- Temporal knowledge graph mutations that don't preserve causal chain integrity +- Schedule relationships (FS/SS/FF/SF) with contradictory logic (circular dependencies, impossible sequences) +- Risk register entries with score != probability x impact +- EVM data where BCWP > BAC (earned more than budgeted -- impossible without scope change) +- Float values inconsistent with ES/EF/LS/LF dates +- Activities with % complete > 0 but no actual start date +- Milestones with duration > 0 (milestones are zero-duration by definition) #### LLM Output Trust Boundary -- LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting. -- Structured tool output (arrays, hashes) accepted without type/shape checks before database writes. - -### Pass 2 — INFORMATIONAL +- LLM-generated risk descriptions, corrective actions, or recommendations written to persistent storage without human review flag +- AI-generated schedule analysis accepted as ground truth without cross-referencing parsed schedule data +- Schwerpunkt recommendations that bypass the Critic step (Step 4 in Decision-AI) +- Executive report content generated without data validation against source metrics +- Outreach messages or pitch content with fabricated proof points or statistics + +### Pass 2 -- INFORMATIONAL + +#### Skill Content Quality +- DCMA 14-Point thresholds that deviate from standard without documented justification +- Risk scoring thresholds inconsistent across skills (e.g., "critical" defined differently in Risk Engine vs Schedule Intelligence) +- Completion prediction confidence multipliers that don't sum/relate correctly +- Reference class sample sizes below the minimum stated in the skill (e.g., N>8 for nuclear SMR) +- Executive report sections that reference metrics not available from other skills + +#### Cross-Skill Consistency +- Terminology drift: Same concept named differently across skills (e.g., "P80" vs "80th percentile" vs "conservative estimate") +- Threshold drift: Same threshold defined with different values across skills +- Output format inconsistency: JSON snapshot schemas that don't align across skills that share data +- Reference file paths that don't match actual file locations #### Conditional Side Effects -- Code paths that branch on a condition but forget to apply a side effect on one branch. Example: item promoted to verified but URL only attached when a secondary condition is true — the other branch promotes without the URL, creating an inconsistent record. -- Log messages that claim an action happened but the action was conditionally skipped. The log should reflect what actually occurred. - -#### Magic Numbers & String Coupling -- Bare numeric literals used in multiple files — should be named constants documented together -- Error message strings used as query filters elsewhere (grep for the string — is anything matching on it?) +- Code paths that branch on a condition but forget to apply a side effect on one branch +- Log messages that claim an action happened but the action was conditionally skipped #### Dead Code & Consistency - Variables assigned but never read - Version mismatch between PR title and VERSION/CHANGELOG files -- CHANGELOG entries that describe changes inaccurately (e.g., "changed from X to Y" when X never existed) +- CHANGELOG entries that describe changes inaccurately - Comments/docstrings that describe old behavior after the code changed #### LLM Prompt Issues - 0-indexed lists in prompts (LLMs reliably return 1-indexed) -- Prompt text listing available tools/capabilities that don't match what's actually wired up in the `tool_classes`/`tools` array -- Word/token limits stated in multiple places that could drift +- Prompt text listing available tools/capabilities that don't match what's actually wired up +- Scoring formulas in prompts that don't match the formulas in the analytical methodology sections #### Test Gaps -- Negative-path tests that assert type/status but not the side effects (URL attached? field populated? callback fired?) -- Assertions on string content without checking format (e.g., asserting title present but not URL format) -- `.expects(:something).never` missing when a code path should explicitly NOT call an external service -- Security enforcement features (blocking, rate limiting, auth) without integration tests verifying the enforcement path works end-to-end - -#### Crypto & Entropy -- Truncation of data instead of hashing (last N chars instead of SHA-256) — less entropy, easier collisions -- `rand()` / `Random.rand` for security-sensitive values — use `SecureRandom` instead -- Non-constant-time comparisons (`==`) on secrets or tokens — vulnerable to timing attacks - -#### Time Window Safety -- Date-key lookups that assume "today" covers 24h — report at 8am PT only sees midnight→8am under today's key -- Mismatched time windows between related features — one uses hourly buckets, another uses daily keys for the same data +- EVM calculations without edge case tests (CPI when ACWP=0, SPI at project end) +- Schedule parsing without malformed input tests (corrupt XER, missing fields, wrong encoding) +- Risk scoring without boundary value tests (score exactly at threshold) +- Monte Carlo without seed-controlled deterministic tests +- Snapshot persistence without round-trip tests (save then load and compare) #### Type Coercion at Boundaries -- Values crossing Ruby→JSON→JS boundaries where type could change (numeric vs string) — hash/digest inputs must normalize types -- Hash/digest inputs that don't call `.to_s` or equivalent before serialization — `{ cores: 8 }` vs `{ cores: "8" }` produce different hashes +- Values crossing JSON boundaries where type could change (numeric vs string) +- Date strings without timezone information crossing system boundaries +- Cost values that mix currency formats or decimal precision -#### View/Frontend -- Inline `