CI Operator Runbook: Failure Signatures to Replay Commands

Maps common CI failure signatures to exact replay commands, key artifact paths, and remediation steps.

Bead: bd-1f42.8.9 Policy: docs/testing-policy.md QA Runbook: docs/qa-runbook.md

Quick Reference: Replay from Any Failure

# 1. Replay all failed suites from a previous E2E run
./scripts/e2e/run_all.sh --rerun-from tests/e2e_results/<ts>/summary.json

# 2. Replay a single suite
cargo test --test <suite_name> -- --nocapture

# 3. Replay a single test function
cargo test --test <suite_name> <test_name> -- --nocapture

# 4. Replay with debug output
RUST_LOG=debug RUST_BACKTRACE=1 cargo test --test <suite_name> -- --nocapture

# 5. Replay CI gate failures
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact

Failure Signature Map

Non-mock compliance gate failure

Signature: non_mock_compliance_gate ... FAILED

Artifacts:

docs/non-mock-rubric.json (rubric thresholds)
docs/test_double_inventory.json (current inventory)

Replay:

cargo test --test non_mock_compliance_gate -- --nocapture

Remediation:

Check which module fell below its floor threshold.
Review docs/non-mock-rubric.json for the affected module's floor values.
Migrate mock/stub usages to VCR or real implementations.
See docs/testing-policy.md "Allowlisted Exceptions" for the approval process.

Extension conformance gate failure

Signature: conformance_must_pass_gate ... FAILED

Artifacts:

tests/ext_conformance/reports/gate/must_pass_gate_verdict.json
tests/ext_conformance/reports/conformance_summary.json

Replay:

cargo test --test ext_conformance_generated --features ext-conformance \
  -- conformance_must_pass_gate --nocapture --exact

Remediation:

Check conformance_summary.json for pass/fail/N/A counts.
Look for newly failing extensions in the summary.
Common causes: missing node shim, new hostcall not dispatched, QuickJS module resolution.
See docs/conformance-operator-playbook.md for debugging workflows.

Cross-platform matrix failure

Signature: cross_platform_matrix ... FAILED

Artifacts:

tests/cross_platform_reports/linux/platform_report.json

Replay:

cargo test --test ci_cross_platform_matrix -- cross_platform_matrix --nocapture --exact

Remediation:

Read the platform report to identify which checks failed.
Common causes: missing system dependencies, path separator issues, permission differences.
Fix the platform-specific code and re-run.

Evidence bundle validation failure

Signature: build_evidence_bundle ... FAILED

Artifacts:

tests/evidence_bundle/index.json

Replay:

cargo test --test ci_evidence_bundle -- build_evidence_bundle --nocapture --exact

Remediation:

Evidence bundle validates that all required artifacts exist and are well-formed.
Check for missing artifact files (summary.json, environment.json, etc.).
Ensure scripts/e2e/run_all.sh completed all post-run phases.

Certification remediation backlog missing or stale

Signature: certification/readiness checks fail on missing extension_remediation_backlog.json or schema mismatch.

Artifacts:

tests/full_suite_gate/certification_dossier.json
tests/full_suite_gate/extension_remediation_backlog.json
tests/full_suite_gate/extension_remediation_backlog.md

Replay:

cargo test --test qa_certification_dossier -- certification_dossier --nocapture --exact

Remediation:

Regenerate certification artifacts and backlog in a single run (command above).
Verify backlog schema is pi.qa.extension_remediation_backlog.v1.
Ensure the backlog summary/entries are non-empty when conformance failures exist.
Re-run dependent gates after artifact refresh.

Suite classification guard failure

Signature: suite_classification gate fails

Artifacts:

tests/suite_classification.toml

Replay:

cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact

Remediation:

A new test file in tests/ is not listed in tests/suite_classification.toml.
Classify the file into [suite.unit], [suite.vcr], or [suite.e2e].
Keep entries sorted alphabetically within each suite.

Waiver lifecycle failure

Signature: waiver_lifecycle_audit ... FAILED or waiver_lifecycle gate fails

Artifacts:

tests/full_suite_gate/waiver_audit.json
tests/suite_classification.toml (waiver entries)

Replay:

cargo test --test ci_full_suite_gate -- waiver_lifecycle_audit --nocapture --exact

Remediation:

Check waiver_audit.json for expired or invalid waivers.
Expired waivers must be either renewed (new expires date, max +30 days) or removed.
Invalid waivers are missing required fields; add all 7 fields.
See docs/qa-runbook.md "Waiver Lifecycle" for the full schema.

Provider streaming regression

Signature: provider_streaming or e2e_provider_streaming test failures

Artifacts:

tests/fixtures/vcr/ (VCR cassettes)

Replay:

# VCR-backed
VCR_MODE=playback cargo test --test provider_streaming -- --nocapture

# E2E
cargo test --test e2e_provider_streaming -- --nocapture

Remediation:

Check if VCR cassettes are stale (model IDs changed, API format updated).
Verify api_key: Some("vcr-playback".to_string()) in StreamOptions.
For URL mismatches: VCR uses strict URL matching; ensure model ID in test matches cassette.

E2E TUI test failure

Signature: e2e_tui tests fail

Artifacts:

E2E results directory

Replay:

cargo test --test e2e_tui -- --nocapture

Remediation:

TUI tests require tmux. Verify tmux is installed and accessible.
Set PI_TEST_MODE=1 for deterministic rendering.
VCR cassettes provide provider responses; check cassette freshness.

Flaky test (passes locally, fails on CI)

Signature: Inconsistent pass/fail across runs on the same commit.

Replay:

# Run with same parallelism as CI
cargo test --test <suite> -- --nocapture --test-threads=1

# Multiple runs to detect flakiness
for i in $(seq 1 5); do
    cargo test --test <suite> -- <test_name> --exact --nocapture || echo "FAIL on run $i"
done

Remediation:

Classify the flake per taxonomy (FLAKE-TIMING/ENV/NET/RES/EXT/LOGIC).
Add quarantine entry to tests/suite_classification.toml.
See docs/testing-policy.md "Flaky-Test Quarantine" for the full lifecycle.

Parity Incident Response (DROPIN-162)

This section defines the operator workflow for parity regressions that threaten strict drop-in claims.

Incident triggers (open incident immediately)

tests/e2e_results/<ts>/triage_diff.json has status = "regression" or summary.regression_count > 0.
tests/full_suite_gate/full_suite_verdict.json shows a failed blocking gate affecting parity/test-log evidence (e2e_log_contract, suite_classification, conformance_pass_rate, evidence_bundle, or other blocking gate).
docs/dropin-certification-verdict.json is missing or has overall_verdict != CERTIFIED when release messaging needs strict drop-in wording.
CI parity suite gate fails (PARITY GATE FAIL) in .github/workflows/ci.yml.

Severity and response targets

Severity	Criteria	Response target
`SEV-1`	Blocking parity regression on `main` or release cut path	Assign owner + post incident context within 30 minutes
`SEV-2`	New regression in PR/branch with no current release block	Assign owner + post context within 4 hours
`SEV-3`	Evidence/documentation drift without active behavior regression	Assign owner + post context within 1 business day

Evidence bundle for every parity incident

Collect and attach these artifacts to the incident bead and Agent Mail thread:

tests/e2e_results/<ts>/summary.json
tests/e2e_results/<ts>/triage_diff.json
tests/e2e_results/<ts>/replay_bundle.json
tests/e2e_results/<ts>/failure_diagnostics_index.json
tests/full_suite_gate/full_suite_verdict.json
tests/full_suite_gate/full_suite_events.jsonl
tests/full_suite_gate/full_suite_report.md
tests/evidence_bundle/index.json
docs/dropin-certification-contract.json
docs/dropin-certification-verdict.json (if present in the run)

Response flow

Capture a reproducible baseline diff:

./scripts/e2e/run_all.sh --profile ci \
  --diff-from tests/e2e_results/<baseline-ts>/summary.json

Run gate replay commands for failing lanes:

cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact
cargo test --test ci_full_suite_gate -- preflight_fast_fail --nocapture --exact
cargo test --test ci_full_suite_gate -- full_certification --nocapture --exact

Extract exact per-gate remediation commands from the verdict:

python3 - <<'PY'
import json
from pathlib import Path
p = Path("tests/full_suite_gate/full_suite_verdict.json")
if not p.exists():
    raise SystemExit("missing full_suite_verdict.json")
data = json.loads(p.read_text(encoding="utf-8"))
for gate in data.get("gates", []):
    if gate.get("status") == "fail":
        print(f"{gate['id']}: {gate.get('reproduce_command', 'N/A')}")
PY

Create/update the owning bead and notify the swarm in-thread (thread_id = bead id) with: failing gate IDs, triage_diff.status, top ranked_diagnostics, and one-command replay.
Apply fix and rerun:

./scripts/e2e/run_all.sh --rerun-from tests/e2e_results/<ts>/summary.json
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact

Close only when all exit criteria are true:
- triage_diff.status is not regression.
- Blocking full-suite gates pass.
- Drop-in wording guard is satisfied (overall_verdict = CERTIFIED) for release claims.
- Bead + Agent Mail thread contain artifact links and final remediation note.

Escalation path

If unresolved beyond response target: escalate to maintainer in the same bead thread.
If release train is active and SEV-1 persists: freeze strict drop-in messaging until parity incident is closed.
Use rollback mode (CI_GATE_PROMOTION_MODE=rollback) only as a short-lived emergency control; record rationale + expiry in the incident bead and restore strict after fix.

PERF-3X Gate Incident Addendum (bd-3ar8v.6.4)

When the incident affects performance certification (not only drop-in wording), also apply this fail-closed checklist:

Treat missing/stale PERF-3X artifacts as blocking failures:
- tests/full_suite_gate/perf3x_bead_coverage_audit.json
- tests/full_suite_gate/practical_finish_checkpoint.json
- tests/perf/reports/budget_summary.json
- tests/perf/reports/perf_comparison.json
- tests/perf/reports/stress_triage.json
- tests/perf/reports/parameter_sweeps.json
Attach tests/full_suite_gate/certification_events.jsonl plus perf event streams:
- tests/perf/reports/budget_events.jsonl
- tests/perf/reports/perf_comparison_events.jsonl
- tests/perf/reports/stress_events.jsonl
- tests/perf/reports/parameter_sweeps_events.jsonl
Use the log-query playbooks in docs/qa-runbook.md under PERF-3X Regression Triage (bd-3ar8v.6.4) for attribution and replay targeting.
Do not close the incident until detection, attribution, mitigation, and verification are all recorded in the bead thread with artifact links.

PERF-3X signature: `parameter_sweeps_integrity` gate failure

Signature: full_suite_verdict.json contains gate parameter_sweeps_integrity with status = "fail" and detail mentioning parameter_sweeps.* schema/readiness/source contract drift.

Artifacts:

tests/perf/reports/parameter_sweeps.json
tests/perf/reports/parameter_sweeps_events.jsonl
tests/perf/reports/phase1_matrix_validation.json
tests/full_suite_gate/full_suite_verdict.json

Replay:

rch exec -- cargo test --test release_evidence_gate -- \
  parameter_sweeps_contract_links_phase1_matrix_and_readiness --nocapture --exact
rch exec -- cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact

Remediation:

Enforce artifact schema pi.perf.parameter_sweeps.v1.
Enforce source_identity contract (source_artifact = "phase1_matrix_validation" and source_artifact_path references phase1_matrix_validation.json).
Enforce readiness invariants:
- status = ready -> ready_for_phase5 = true and blocking_reasons = []
- status = blocked -> ready_for_phase5 = false and non-empty blocking_reasons
Ensure selected_defaults are positive integers and sweep_plan.dimensions includes required knobs.
Re-run full-suite gate and re-attach updated parameter_sweeps artifact + event stream.

PERF-3X signature: `practical_finish_checkpoint` readiness drift

Signature: gate practical_finish_checkpoint fails with detail like technical PERF-3X issue(s) still open or Fail-closed practical-finish source read error.

Artifacts:

tests/full_suite_gate/practical_finish_checkpoint.json
.beads/issues.jsonl (or fallback .beads/beads.base.jsonl)
tests/full_suite_gate/full_suite_verdict.json
tests/full_suite_gate/certification_events.jsonl

Replay:

rch exec -- cargo test --test ci_full_suite_gate -- \
  practical_finish_report_fails_when_technical_open_issues_remain --nocapture --exact
rch exec -- cargo test --test release_readiness -- practical_finish_checkpoint_ -- --nocapture
rch exec -- cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact

Remediation:

Verify practical_finish_checkpoint.json schema is pi.perf3x.practical_finish_checkpoint.v1.
Ensure required contract fields are coherent: status, non-empty detail, technical_completion_reached, residual_open_scope, and count equality (open_perf3x_count = technical_open_count + docs_or_report_open_count).
Close or re-scope remaining technical PERF-3X issues; only docs/report residuals are allowed.
Re-run full-suite gate and attach refreshed checkpoint artifact + certification events before closure.

FrankenNode claim signature: `claim_tier_order_drift`

Signature: claim-contract validation reports tier-order drift (or missing canonical sequence) for:

TIER-1-EXTENSION-HOST-PARITY
TIER-2-TARGETED-RUNTIME-PARITY
TIER-3-FULL-NODE-BUN-REPLACEMENT

Artifacts:

docs/franken-node-claim-gating-contract.json
tests/full_suite_gate/franken_node_claim_verdict.json
tests/full_suite_gate/practical_finish_checkpoint.json

Replay:

rch exec -- cargo test --test franken_node_claim_contract -- \
  franken_node_claim_contract_declares_expected_tier_order -- --nocapture
rch exec -- cargo test --test release_evidence_gate -- \
  franken_node_claim_contract_is_present_and_valid --nocapture --exact

Remediation:

Restore canonical tier order in claim_tiers to Tier-1 -> Tier-2 -> Tier-3.
Ensure every tier still carries non-empty required_evidence, allowed_claim_language, and forbidden_claim_language.
Keep strict replacement gating fail-closed (overall_verdict = CERTIFIED required) and regenerate franken_node_claim_verdict.json before incident closure.

FrankenNode kernel-boundary signature: `kernel_boundary_drift`

Signature: kernel-extraction boundary contract/report drift is detected in manifest validation output, especially missing module ownership coverage, duplicate ownership, or banned cross-boundary pair regressions.

Artifacts:

docs/franken-node-kernel-extraction-boundary-manifest.json
tests/full_suite_gate/franken_node_kernel_boundary_drift_report.json
tests/full_suite_gate/practical_finish_checkpoint.json

Replay:

rch exec -- cargo test --test franken_node_kernel_extraction_boundary_manifest -- \
  kernel_boundary_manifest_ -- --nocapture
rch exec -- cargo test --test qa_docs_policy_validation -- \
  franken_node_mission_contract_tier_mapping_declares_required_checks_and_phase6_beads -- --nocapture

Remediation:

Ensure drift report checks remain present and fail-closed: kernel_boundary.all_modules_mapped_or_deferred, kernel_boundary.no_duplicate_domain_ownership, and kernel_boundary.banned_cross_boundary_pairs_absent.
Restore strict tier evidence linkage tokens in mission contract: docs/franken-node-kernel-extraction-boundary-manifest.json and tests/full_suite_gate/franken_node_kernel_boundary_drift_report.json.
Re-run the replay commands and attach refreshed artifacts before clearing the incident.

FrankenNode compat harness signature: `node_runtime_unavailable_or_shimmed`

Signature: semantic compatibility harness fails or hard-skips because a real Node runtime is unavailable, or Bun's node shim is incorrectly treated as Node. Typical signals include Node.js not found and SKIP: generate_compatibility_matrix requires both Node.js and Bun.

Artifacts:

tests/franken_node_compat_harness.rs
tests/franken_node_compat/fixtures/
tests/full_suite_gate/full_suite_verdict.json

Replay:

rch exec -- cargo test --test franken_node_compat_harness -- \
  node_detection_rejects_bun_node_shim_when_present -- --nocapture
rch exec -- cargo test --test franken_node_compat_harness -- \
  generate_compatibility_matrix -- --nocapture

Remediation:

Keep find_node() and is_real_node() aligned with fail-closed detection: Bun's /home/ubuntu/.bun/bin/node shim must not pass as real Node.
Preserve deterministic skip diagnostics when Node/Bun are unavailable: SKIP: Node.js not found on this machine, SKIP: Bun not found on this machine, and SKIP: generate_compatibility_matrix requires both Node.js and Bun.
After runtime availability is corrected, re-run harness replay commands and attach refreshed verdict artifacts before clearing the incident.

Evidence Artifact Interpretation

summary.json

The primary run summary. Key fields:

Field	Meaning
`failed_names`	List of failed E2E suite names
`failed_unit_names`	List of failed unit target names
`passed_suites` / `total_suites`	E2E suite pass rate
`replay_bundle.one_command_replay`	One-command to replay all failures
`triage_diff`	Baseline comparison (if `--diff-from` was used)

replay_bundle.json

Consolidated replay commands and environment context:

Field	Meaning
`one_command_replay`	Single command to reproduce all failures
`environment.profile`	Run profile (quick/focused/ci/full)
`environment.vcr_mode`	VCR mode during the run
`environment.git_sha`	Git commit of the run
`failed_suites[].cargo_replay`	Per-suite cargo test command
`failed_suites[].targeted_replay`	Single-test cargo command
`failed_suites[].digest_path`	Path to per-suite failure digest

failure_digest.json

Per-suite failure analysis:

Field	Meaning
`root_cause_class`	Classification: assertion_failure, timeout, panic, etc.
`impacted_scenario_ids`	List of failed test names
`first_failing_assertion`	Location and message of first failure
`remediation_pointer.replay_command`	Runner-level replay
`remediation_pointer.suite_replay_command`	Suite-level cargo test
`remediation_pointer.targeted_test_replay_command`	Single-test cargo test

triage_diff.json

Baseline comparison for regressions:

Field	Meaning
`status`	`regression`, `stable`, or `known_failures_only`
`summary.regression_count`	New failures vs baseline
`ranked_diagnostics`	Severity-ranked list of changes
`recommended_commands.runner_repro_command`	Replay all problem targets
`recommended_commands.ranked_repro_commands`	Prioritized per-target commands

Shard Workflow

The CI runner supports sharding for parallel execution:

# Run shard 0 of 3 for E2E suites
./scripts/e2e/run_all.sh --profile ci --shard-kind suite --shard-index 0 --shard-total 3

# Run shard 1 of 4 for unit targets
./scripts/e2e/run_all.sh --profile ci --shard-kind unit --shard-index 1 --shard-total 4

Shard context is captured in:

environment.json: shard.kind, shard.index, shard.total
summary.json: same shard fields
replay_bundle.json: environment.shard_kind, shard_index, shard_total

To replay a specific shard's failures, use the --rerun-from flag with that shard's summary.json.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Operator Runbook: Failure Signatures to Replay Commands

Quick Reference: Replay from Any Failure

Failure Signature Map

Non-mock compliance gate failure

Extension conformance gate failure

Cross-platform matrix failure

Evidence bundle validation failure

Certification remediation backlog missing or stale

Suite classification guard failure

Waiver lifecycle failure

Provider streaming regression

E2E TUI test failure

Flaky test (passes locally, fails on CI)

Parity Incident Response (DROPIN-162)

Incident triggers (open incident immediately)

Severity and response targets

Evidence bundle for every parity incident

Response flow

Escalation path

PERF-3X Gate Incident Addendum (bd-3ar8v.6.4)

PERF-3X signature: `parameter_sweeps_integrity` gate failure

PERF-3X signature: `practical_finish_checkpoint` readiness drift

FrankenNode claim signature: `claim_tier_order_drift`

FrankenNode kernel-boundary signature: `kernel_boundary_drift`

FrankenNode compat harness signature: `node_runtime_unavailable_or_shimmed`

Evidence Artifact Interpretation

summary.json

replay_bundle.json

failure_digest.json

triage_diff.json

Shard Workflow

FilesExpand file tree

ci-operator-runbook.md

Latest commit

History

ci-operator-runbook.md

File metadata and controls

CI Operator Runbook: Failure Signatures to Replay Commands

Quick Reference: Replay from Any Failure

Failure Signature Map

Non-mock compliance gate failure

Extension conformance gate failure

Cross-platform matrix failure

Evidence bundle validation failure

Certification remediation backlog missing or stale

Suite classification guard failure

Waiver lifecycle failure

Provider streaming regression

E2E TUI test failure

Flaky test (passes locally, fails on CI)

Parity Incident Response (DROPIN-162)

Incident triggers (open incident immediately)

Severity and response targets

Evidence bundle for every parity incident

Response flow

Escalation path

PERF-3X Gate Incident Addendum (bd-3ar8v.6.4)

PERF-3X signature: parameter_sweeps_integrity gate failure

PERF-3X signature: practical_finish_checkpoint readiness drift

FrankenNode claim signature: claim_tier_order_drift

FrankenNode kernel-boundary signature: kernel_boundary_drift

FrankenNode compat harness signature: node_runtime_unavailable_or_shimmed

Evidence Artifact Interpretation

summary.json

replay_bundle.json

failure_digest.json

triage_diff.json

Shard Workflow

PERF-3X signature: `parameter_sweeps_integrity` gate failure

PERF-3X signature: `practical_finish_checkpoint` readiness drift

FrankenNode claim signature: `claim_tier_order_drift`

FrankenNode kernel-boundary signature: `kernel_boundary_drift`

FrankenNode compat harness signature: `node_runtime_unavailable_or_shimmed`