Maps common CI failure signatures to exact replay commands, key artifact paths, and remediation steps.
Bead: bd-1f42.8.9 Policy: docs/testing-policy.md QA Runbook: docs/qa-runbook.md
# 1. Replay all failed suites from a previous E2E run
./scripts/e2e/run_all.sh --rerun-from tests/e2e_results/<ts>/summary.json
# 2. Replay a single suite
cargo test --test <suite_name> -- --nocapture
# 3. Replay a single test function
cargo test --test <suite_name> <test_name> -- --nocapture
# 4. Replay with debug output
RUST_LOG=debug RUST_BACKTRACE=1 cargo test --test <suite_name> -- --nocapture
# 5. Replay CI gate failures
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exactSignature: non_mock_compliance_gate ... FAILED
Artifacts:
docs/non-mock-rubric.json(rubric thresholds)docs/test_double_inventory.json(current inventory)
Replay:
cargo test --test non_mock_compliance_gate -- --nocaptureRemediation:
- Check which module fell below its floor threshold.
- Review
docs/non-mock-rubric.jsonfor the affected module's floor values. - Migrate mock/stub usages to VCR or real implementations.
- See
docs/testing-policy.md"Allowlisted Exceptions" for the approval process.
Signature: conformance_must_pass_gate ... FAILED
Artifacts:
tests/ext_conformance/reports/gate/must_pass_gate_verdict.jsontests/ext_conformance/reports/conformance_summary.json
Replay:
cargo test --test ext_conformance_generated --features ext-conformance \
-- conformance_must_pass_gate --nocapture --exactRemediation:
- Check
conformance_summary.jsonfor pass/fail/N/A counts. - Look for newly failing extensions in the summary.
- Common causes: missing node shim, new hostcall not dispatched, QuickJS module resolution.
- See
docs/conformance-operator-playbook.mdfor debugging workflows.
Signature: cross_platform_matrix ... FAILED
Artifacts:
tests/cross_platform_reports/linux/platform_report.json
Replay:
cargo test --test ci_cross_platform_matrix -- cross_platform_matrix --nocapture --exactRemediation:
- Read the platform report to identify which checks failed.
- Common causes: missing system dependencies, path separator issues, permission differences.
- Fix the platform-specific code and re-run.
Signature: build_evidence_bundle ... FAILED
Artifacts:
tests/evidence_bundle/index.json
Replay:
cargo test --test ci_evidence_bundle -- build_evidence_bundle --nocapture --exactRemediation:
- Evidence bundle validates that all required artifacts exist and are well-formed.
- Check for missing artifact files (summary.json, environment.json, etc.).
- Ensure
scripts/e2e/run_all.shcompleted all post-run phases.
Signature: certification/readiness checks fail on missing
extension_remediation_backlog.json or schema mismatch.
Artifacts:
tests/full_suite_gate/certification_dossier.jsontests/full_suite_gate/extension_remediation_backlog.jsontests/full_suite_gate/extension_remediation_backlog.md
Replay:
cargo test --test qa_certification_dossier -- certification_dossier --nocapture --exactRemediation:
- Regenerate certification artifacts and backlog in a single run (command above).
- Verify backlog schema is
pi.qa.extension_remediation_backlog.v1. - Ensure the backlog summary/entries are non-empty when conformance failures exist.
- Re-run dependent gates after artifact refresh.
Signature: suite_classification gate fails
Artifacts:
tests/suite_classification.toml
Replay:
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exactRemediation:
- A new test file in
tests/is not listed intests/suite_classification.toml. - Classify the file into
[suite.unit],[suite.vcr], or[suite.e2e]. - Keep entries sorted alphabetically within each suite.
Signature: waiver_lifecycle_audit ... FAILED or waiver_lifecycle gate fails
Artifacts:
tests/full_suite_gate/waiver_audit.jsontests/suite_classification.toml(waiver entries)
Replay:
cargo test --test ci_full_suite_gate -- waiver_lifecycle_audit --nocapture --exactRemediation:
- Check
waiver_audit.jsonfor expired or invalid waivers. - Expired waivers must be either renewed (new
expiresdate, max +30 days) or removed. - Invalid waivers are missing required fields; add all 7 fields.
- See
docs/qa-runbook.md"Waiver Lifecycle" for the full schema.
Signature: provider_streaming or e2e_provider_streaming test failures
Artifacts:
tests/fixtures/vcr/(VCR cassettes)
Replay:
# VCR-backed
VCR_MODE=playback cargo test --test provider_streaming -- --nocapture
# E2E
cargo test --test e2e_provider_streaming -- --nocaptureRemediation:
- Check if VCR cassettes are stale (model IDs changed, API format updated).
- Verify
api_key: Some("vcr-playback".to_string())inStreamOptions. - For URL mismatches: VCR uses strict URL matching; ensure model ID in test matches cassette.
Signature: e2e_tui tests fail
Artifacts:
- E2E results directory
Replay:
cargo test --test e2e_tui -- --nocaptureRemediation:
- TUI tests require tmux. Verify
tmuxis installed and accessible. - Set
PI_TEST_MODE=1for deterministic rendering. - VCR cassettes provide provider responses; check cassette freshness.
Signature: Inconsistent pass/fail across runs on the same commit.
Replay:
# Run with same parallelism as CI
cargo test --test <suite> -- --nocapture --test-threads=1
# Multiple runs to detect flakiness
for i in $(seq 1 5); do
cargo test --test <suite> -- <test_name> --exact --nocapture || echo "FAIL on run $i"
doneRemediation:
- Classify the flake per taxonomy (FLAKE-TIMING/ENV/NET/RES/EXT/LOGIC).
- Add quarantine entry to
tests/suite_classification.toml. - See
docs/testing-policy.md"Flaky-Test Quarantine" for the full lifecycle.
This section defines the operator workflow for parity regressions that threaten strict drop-in claims.
tests/e2e_results/<ts>/triage_diff.jsonhasstatus = "regression"orsummary.regression_count > 0.tests/full_suite_gate/full_suite_verdict.jsonshows a failed blocking gate affecting parity/test-log evidence (e2e_log_contract,suite_classification,conformance_pass_rate,evidence_bundle, or other blocking gate).docs/dropin-certification-verdict.jsonis missing or hasoverall_verdict != CERTIFIEDwhen release messaging needs strict drop-in wording.- CI parity suite gate fails (
PARITY GATE FAIL) in.github/workflows/ci.yml.
| Severity | Criteria | Response target |
|---|---|---|
SEV-1 |
Blocking parity regression on main or release cut path |
Assign owner + post incident context within 30 minutes |
SEV-2 |
New regression in PR/branch with no current release block | Assign owner + post context within 4 hours |
SEV-3 |
Evidence/documentation drift without active behavior regression | Assign owner + post context within 1 business day |
Collect and attach these artifacts to the incident bead and Agent Mail thread:
tests/e2e_results/<ts>/summary.jsontests/e2e_results/<ts>/triage_diff.jsontests/e2e_results/<ts>/replay_bundle.jsontests/e2e_results/<ts>/failure_diagnostics_index.jsontests/full_suite_gate/full_suite_verdict.jsontests/full_suite_gate/full_suite_events.jsonltests/full_suite_gate/full_suite_report.mdtests/evidence_bundle/index.jsondocs/dropin-certification-contract.jsondocs/dropin-certification-verdict.json(if present in the run)
- Capture a reproducible baseline diff:
./scripts/e2e/run_all.sh --profile ci \
--diff-from tests/e2e_results/<baseline-ts>/summary.json- Run gate replay commands for failing lanes:
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact
cargo test --test ci_full_suite_gate -- preflight_fast_fail --nocapture --exact
cargo test --test ci_full_suite_gate -- full_certification --nocapture --exact- Extract exact per-gate remediation commands from the verdict:
python3 - <<'PY'
import json
from pathlib import Path
p = Path("tests/full_suite_gate/full_suite_verdict.json")
if not p.exists():
raise SystemExit("missing full_suite_verdict.json")
data = json.loads(p.read_text(encoding="utf-8"))
for gate in data.get("gates", []):
if gate.get("status") == "fail":
print(f"{gate['id']}: {gate.get('reproduce_command', 'N/A')}")
PY-
Create/update the owning bead and notify the swarm in-thread (
thread_id = bead id) with: failing gate IDs,triage_diff.status, topranked_diagnostics, and one-command replay. -
Apply fix and rerun:
./scripts/e2e/run_all.sh --rerun-from tests/e2e_results/<ts>/summary.json
cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exact- Close only when all exit criteria are true:
triage_diff.statusis notregression.- Blocking full-suite gates pass.
- Drop-in wording guard is satisfied (
overall_verdict = CERTIFIED) for release claims. - Bead + Agent Mail thread contain artifact links and final remediation note.
- If unresolved beyond response target: escalate to maintainer in the same bead thread.
- If release train is active and
SEV-1persists: freeze strict drop-in messaging until parity incident is closed. - Use rollback mode (
CI_GATE_PROMOTION_MODE=rollback) only as a short-lived emergency control; record rationale + expiry in the incident bead and restorestrictafter fix.
When the incident affects performance certification (not only drop-in wording), also apply this fail-closed checklist:
- Treat missing/stale PERF-3X artifacts as blocking failures:
tests/full_suite_gate/perf3x_bead_coverage_audit.jsontests/full_suite_gate/practical_finish_checkpoint.jsontests/perf/reports/budget_summary.jsontests/perf/reports/perf_comparison.jsontests/perf/reports/stress_triage.jsontests/perf/reports/parameter_sweeps.json
- Attach
tests/full_suite_gate/certification_events.jsonlplus perf event streams:tests/perf/reports/budget_events.jsonltests/perf/reports/perf_comparison_events.jsonltests/perf/reports/stress_events.jsonltests/perf/reports/parameter_sweeps_events.jsonl
- Use the log-query playbooks in
docs/qa-runbook.mdunder PERF-3X Regression Triage (bd-3ar8v.6.4) for attribution and replay targeting. - Do not close the incident until detection, attribution, mitigation, and verification are all recorded in the bead thread with artifact links.
Signature: full_suite_verdict.json contains gate parameter_sweeps_integrity with
status = "fail" and detail mentioning parameter_sweeps.* schema/readiness/source contract drift.
Artifacts:
tests/perf/reports/parameter_sweeps.jsontests/perf/reports/parameter_sweeps_events.jsonltests/perf/reports/phase1_matrix_validation.jsontests/full_suite_gate/full_suite_verdict.json
Replay:
rch exec -- cargo test --test release_evidence_gate -- \
parameter_sweeps_contract_links_phase1_matrix_and_readiness --nocapture --exact
rch exec -- cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exactRemediation:
- Enforce artifact schema
pi.perf.parameter_sweeps.v1. - Enforce
source_identitycontract (source_artifact = "phase1_matrix_validation"andsource_artifact_pathreferencesphase1_matrix_validation.json). - Enforce readiness invariants:
status = ready->ready_for_phase5 = trueandblocking_reasons = []status = blocked->ready_for_phase5 = falseand non-emptyblocking_reasons
- Ensure
selected_defaultsare positive integers andsweep_plan.dimensionsincludes required knobs. - Re-run full-suite gate and re-attach updated
parameter_sweepsartifact + event stream.
Signature: gate practical_finish_checkpoint fails with detail like
technical PERF-3X issue(s) still open or Fail-closed practical-finish source read error.
Artifacts:
tests/full_suite_gate/practical_finish_checkpoint.json.beads/issues.jsonl(or fallback.beads/beads.base.jsonl)tests/full_suite_gate/full_suite_verdict.jsontests/full_suite_gate/certification_events.jsonl
Replay:
rch exec -- cargo test --test ci_full_suite_gate -- \
practical_finish_report_fails_when_technical_open_issues_remain --nocapture --exact
rch exec -- cargo test --test release_readiness -- practical_finish_checkpoint_ -- --nocapture
rch exec -- cargo test --test ci_full_suite_gate -- full_suite_gate --nocapture --exactRemediation:
- Verify
practical_finish_checkpoint.jsonschema ispi.perf3x.practical_finish_checkpoint.v1. - Ensure required contract fields are coherent:
status, non-emptydetail,technical_completion_reached,residual_open_scope, and count equality (open_perf3x_count = technical_open_count + docs_or_report_open_count). - Close or re-scope remaining technical PERF-3X issues; only docs/report residuals are allowed.
- Re-run full-suite gate and attach refreshed checkpoint artifact + certification events before closure.
Signature: claim-contract validation reports tier-order drift (or missing canonical sequence) for:
TIER-1-EXTENSION-HOST-PARITYTIER-2-TARGETED-RUNTIME-PARITYTIER-3-FULL-NODE-BUN-REPLACEMENT
Artifacts:
docs/franken-node-claim-gating-contract.jsontests/full_suite_gate/franken_node_claim_verdict.jsontests/full_suite_gate/practical_finish_checkpoint.json
Replay:
rch exec -- cargo test --test franken_node_claim_contract -- \
franken_node_claim_contract_declares_expected_tier_order -- --nocapture
rch exec -- cargo test --test release_evidence_gate -- \
franken_node_claim_contract_is_present_and_valid --nocapture --exactRemediation:
- Restore canonical tier order in
claim_tiersto Tier-1 -> Tier-2 -> Tier-3. - Ensure every tier still carries non-empty
required_evidence,allowed_claim_language, andforbidden_claim_language. - Keep strict replacement gating fail-closed (
overall_verdict = CERTIFIEDrequired) and regeneratefranken_node_claim_verdict.jsonbefore incident closure.
Signature: kernel-extraction boundary contract/report drift is detected in manifest validation output, especially missing module ownership coverage, duplicate ownership, or banned cross-boundary pair regressions.
Artifacts:
docs/franken-node-kernel-extraction-boundary-manifest.jsontests/full_suite_gate/franken_node_kernel_boundary_drift_report.jsontests/full_suite_gate/practical_finish_checkpoint.json
Replay:
rch exec -- cargo test --test franken_node_kernel_extraction_boundary_manifest -- \
kernel_boundary_manifest_ -- --nocapture
rch exec -- cargo test --test qa_docs_policy_validation -- \
franken_node_mission_contract_tier_mapping_declares_required_checks_and_phase6_beads -- --nocaptureRemediation:
- Ensure drift report checks remain present and fail-closed:
kernel_boundary.all_modules_mapped_or_deferred,kernel_boundary.no_duplicate_domain_ownership, andkernel_boundary.banned_cross_boundary_pairs_absent. - Restore strict tier evidence linkage tokens in mission contract:
docs/franken-node-kernel-extraction-boundary-manifest.jsonandtests/full_suite_gate/franken_node_kernel_boundary_drift_report.json. - Re-run the replay commands and attach refreshed artifacts before clearing the incident.
Signature: semantic compatibility harness fails or hard-skips because a real
Node runtime is unavailable, or Bun's node shim is incorrectly treated as
Node. Typical signals include Node.js not found and
SKIP: generate_compatibility_matrix requires both Node.js and Bun.
Artifacts:
tests/franken_node_compat_harness.rstests/franken_node_compat/fixtures/tests/full_suite_gate/full_suite_verdict.json
Replay:
rch exec -- cargo test --test franken_node_compat_harness -- \
node_detection_rejects_bun_node_shim_when_present -- --nocapture
rch exec -- cargo test --test franken_node_compat_harness -- \
generate_compatibility_matrix -- --nocaptureRemediation:
- Keep
find_node()andis_real_node()aligned with fail-closed detection: Bun's/home/ubuntu/.bun/bin/nodeshim must not pass as real Node. - Preserve deterministic skip diagnostics when Node/Bun are unavailable:
SKIP: Node.js not found on this machine,SKIP: Bun not found on this machine, andSKIP: generate_compatibility_matrix requires both Node.js and Bun. - After runtime availability is corrected, re-run harness replay commands and attach refreshed verdict artifacts before clearing the incident.
The primary run summary. Key fields:
| Field | Meaning |
|---|---|
failed_names |
List of failed E2E suite names |
failed_unit_names |
List of failed unit target names |
passed_suites / total_suites |
E2E suite pass rate |
replay_bundle.one_command_replay |
One-command to replay all failures |
triage_diff |
Baseline comparison (if --diff-from was used) |
Consolidated replay commands and environment context:
| Field | Meaning |
|---|---|
one_command_replay |
Single command to reproduce all failures |
environment.profile |
Run profile (quick/focused/ci/full) |
environment.vcr_mode |
VCR mode during the run |
environment.git_sha |
Git commit of the run |
failed_suites[].cargo_replay |
Per-suite cargo test command |
failed_suites[].targeted_replay |
Single-test cargo command |
failed_suites[].digest_path |
Path to per-suite failure digest |
Per-suite failure analysis:
| Field | Meaning |
|---|---|
root_cause_class |
Classification: assertion_failure, timeout, panic, etc. |
impacted_scenario_ids |
List of failed test names |
first_failing_assertion |
Location and message of first failure |
remediation_pointer.replay_command |
Runner-level replay |
remediation_pointer.suite_replay_command |
Suite-level cargo test |
remediation_pointer.targeted_test_replay_command |
Single-test cargo test |
Baseline comparison for regressions:
| Field | Meaning |
|---|---|
status |
regression, stable, or known_failures_only |
summary.regression_count |
New failures vs baseline |
ranked_diagnostics |
Severity-ranked list of changes |
recommended_commands.runner_repro_command |
Replay all problem targets |
recommended_commands.ranked_repro_commands |
Prioritized per-target commands |
The CI runner supports sharding for parallel execution:
# Run shard 0 of 3 for E2E suites
./scripts/e2e/run_all.sh --profile ci --shard-kind suite --shard-index 0 --shard-total 3
# Run shard 1 of 4 for unit targets
./scripts/e2e/run_all.sh --profile ci --shard-kind unit --shard-index 1 --shard-total 4Shard context is captured in:
environment.json:shard.kind,shard.index,shard.totalsummary.json: same shard fieldsreplay_bundle.json:environment.shard_kind,shard_index,shard_total
To replay a specific shard's failures, use the --rerun-from flag with that shard's
summary.json.