Skip to content

perf(ci): comprehensive pipeline optimization — target 2m push, 8m release #371

@danielmeppiel

Description

@danielmeppiel

Summary

The current build-release.yml critical path is 17-18m for tag releases (excluding macOS ARM queue which adds 0-6h). After PR #368 (macOS consolidation, merged), this plan targets ~2m for push-to-main and ~8m for tag releases through 9 validated optimizations across all 3 workflows. All changes were validated through structured expert panel analysis with parallel red-team and domain-specific assessments.

Problem

Measured Baselines

Job Duration Runner
test (Linux x86) 55s ubuntu-24.04
test (Linux ARM) 62s ubuntu-24.04-arm
test (Windows) 92-136s windows-latest
Build binary (Linux x86) 47s ubuntu-24.04
Build binary (Linux ARM) 50s ubuntu-24.04-arm
Build binary (Windows) 66-80s windows-latest
macOS Intel (consolidated) 115s macos-15-intel
macOS ARM (consolidated) ~120s + queue macos-latest
Integration tests (Linux x86) 135-238s ubuntu-24.04
Release validation (any) 67-99s varies
create-release ~60s ubuntu-24.04

Optimizations

O1: Remove setup-node@v4 from unit test/build jobs

Risk: Low
Savings: ~15-25s per job (4 jobs total)

  • Safe to remove from: test + build in build-release.yml, test in ci.yml, smoke-test in ci-integration.yml
  • MUST KEEP in: integration-tests and release-validation (apm runtime setup copilot → npm install -g)
  • Red-team validated: exact boundary identified

O2: Replace curl | sh with astral-sh/setup-uv@v7

Risk: Low
Savings: ~20-40s per job + supply chain hardening

O3: Merge test + build into build-and-test

Risk: Low
Savings: ~1.5m (eliminates runner re-provisioning)

  • Red-team validated: no outputs consumed, identical matrices, dependency extras are a superset
  • macOS jobs become root nodes (no needs:) — they run their own tests
  • 8 downstream needs: arrays rewritten

O5: Cache UPX installation

Risk: Low
Savings: ~15-30s per job

  • Linux: cache binary or pin GitHub Release download
  • macOS: direct release binary download instead of brew install

O9: Decouple live inference from release pipeline

Risk: Low (expert panel unanimous)
Savings: Eliminates 14.9% false-negative rate, reduces secret exposure

  • Remove apm run tests from integration-tests and release-validation
  • Remove GH_MODELS_PAT from release pipeline jobs
  • Create ci-runtime.yml — nightly + manual dispatch + path-filtered, Linux x86_64 only
  • Keep hero scenario validation up to apm compile step (validates 95% of plumbing)

Rationale:

  • APM's core UVPs (install, compile, audit, governance) require zero live inference
  • apm run is explicitly marked EXPERIMENTAL in documentation
  • 8 inference executions × 2% failure rate = 14.9% false-negative per release
  • GH_MODELS_PAT in 12+ jobs = unnecessary secret exposure surface
  • Industry precedent: npm, cargo, pip, Homebrew, Docker CLI all isolate external API tests from release gates

O7: Fix ci-integration.yml circular dependency

Risk: Medium
Problem: When ci.yml has a bug → ci-integration approval skips → all tests skip → failure status → CI-fixing PR blocked

Design a bypass for PRs that only modify .github/workflows/

O8: Add PR traceability to ci-integration.yml

Risk: None
Improvement: DX enhancement

::notice annotation with originating PR URL in smoke-test job

Pipeline Diagrams

Current Pipeline (tag release)

graph LR
    test["🧪 test<br/>(3 platforms)"]
    build["📦 build<br/>(3 platforms)"]
    mac_intel["🍎 macOS Intel<br/>(build+validate)"]
    mac_arm["🍎 macOS ARM<br/>(build+validate)"]
    integ["🔬 integration-tests<br/>(3 platforms)"]
    relval["✅ release-validation<br/>(3 platforms)"]
    release["🏷️ create-release"]
    compat["🔗 gh-aw-compat"]
    pypi["📤 publish-pypi"]
    brew["🍺 update-homebrew"]
    scoop["🪣 update-scoop"]

    test --> build
    test --> mac_intel
    test --> mac_arm
    test --> integ
    build --> integ
    test --> relval
    build --> relval
    integ --> relval
    test --> release
    build --> release
    mac_intel --> release
    mac_arm --> release
    integ --> release
    relval --> release
    release --> compat
    compat --> pypi
    pypi --> brew
    pypi --> scoop
Loading

Proposed Pipeline (tag release, after optimization)

graph LR
    bat["🧪📦 build-and-test<br/>(3 platforms)"]
    mac_intel["🍎 macOS Intel<br/>(build+validate)"]
    mac_arm["🍎 macOS ARM<br/>(build+validate)"]
    integ["🔬 integration-tests<br/>(3 platforms, no inference)"]
    relval["✅ release-validation<br/>(3 platforms, no inference)"]
    release["🏷️ create-release"]
    compat["🔗 gh-aw-compat"]
    pypi["📤 publish-pypi"]
    brew["🍺 update-homebrew"]
    scoop["🪣 update-scoop"]

    bat --> integ
    bat --> relval
    integ --> relval
    bat --> release
    mac_intel --> release
    mac_arm --> release
    integ --> release
    relval --> release
    release --> compat
    compat --> pypi
    pypi --> brew
    pypi --> scoop
Loading

Separate: Runtime Inference Validation (ci-runtime.yml)

graph LR
    trigger["⏰ Nightly / 🔀 Path filter / 👆 Manual"]
    inference["🤖 live-inference-smoke<br/>(Linux x86_64 only)"]
    report["📊 Status annotation"]

    trigger --> inference
    inference --> report
Loading

Implementation Phases

Phase 1: Quick wins (no structural change)

Phase 2: Structural optimizations

  • O3: Merge test+build → build-and-test, rewrite 8 needs: arrays, macOS as root nodes
  • O9: Decouple inference from release pipeline, create ci-runtime.yml

Phase 3: DX improvements

  • O8: PR traceability annotation in ci-integration.yml
  • O7: CI circular dependency bypass

Expected Outcomes

Pipeline Current Target Improvement
PR CI (ci.yml) 1.7m ~1.3m -24%
PR CI + Integration 7-11m ~5m -45%
Push to main 4m ~2m -50%
Tag release (no queue) 17-18m ~8m -53%
Tag release (with queue) 30m-6h+ 8m + 1× ARM queue eliminated 3 queue slots

Validation Methodology

All changes were validated through a structured expert panel process:

  • Round 1 (Red Team): 4 parallel agents assessed structural changes (Node.js removal, setup-uv migration, test+build merge, ci-integration analysis)
  • Round 2 (Inference Panel): 3 parallel domain experts assessed live inference in CI/CD (Product Strategy Architect, Release Reliability Architect, Supply Chain Security Engineer)

Decisions and Trade-offs

Rejected optimizations

  • O4: Parallelize integration tests with builds — Integration tests use the compiled binary (verified: test-integration.sh downloads artifact, sets up binary in PATH). Dependency is correct.
  • O6: Flatten release tailgh-aw-compat downloads the just-released binary via microsoft/apm-action@v1. Chain must stay sequential.

Inference decoupling rationale (O9)

  • APM's core UVPs (install, compile, audit, governance) require zero live inference
  • apm run is explicitly marked EXPERIMENTAL in documentation
  • 8 inference executions × 2% failure rate = 14.9% false-negative per release
  • GH_MODELS_PAT in 12+ jobs = unnecessary secret exposure surface
  • Industry precedent: npm, cargo, pip, Homebrew, Docker CLI all isolate external API tests from release gates

Metadata

Metadata

Labels

CI/CDDeprecated: use area/ci-cd. Kept for issue history; will be removed in milestone 0.10.0.performanceDeprecated: use type/performance. Kept for issue history; will be removed in milestone 0.10.0.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions