Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
cdfd328
feat: TTD Hardening Sprint S1 — Gates and Evidence Integrity
Feb 14, 2026
f3dd84a
docs: update CHANGELOG for TTD Hardening Sprint S1
Feb 15, 2026
e4dec7a
fix(ci): fix classify-changes job dependencies
Feb 15, 2026
f5d00a4
fix(ci): use yq to convert policy to JSON to avoid js-yaml dependency
Feb 15, 2026
2da9849
fix(ci): add CI and scripts to det-policy.yaml
Feb 15, 2026
c7e7cbb
docs: add missing docstrings to scripts and benchmarks
Feb 15, 2026
1904654
docs: add missing docs to benchmark and allow for criterion macro
Feb 15, 2026
2d45853
fix: address CodeRabbit feedback on TTD hardening sprint
Feb 15, 2026
5de4666
chore: finalize PR with changelog and roadmap updates
flyingrobots Feb 15, 2026
62c9c3d
fix(ci): classify all repo files and improve evidence robustness
flyingrobots Feb 15, 2026
84828d7
fix(ci): use isolated directories for build reproducibility check
flyingrobots Feb 15, 2026
3faf5bf
fix(ci): ensure wasm target is added in isolated builds
flyingrobots Feb 15, 2026
92cd6f9
fix(ci): use hyphenated wasm filename in reproducibility check
flyingrobots Feb 15, 2026
5915fc0
fix(ci): improve build reproducibility check and evidence generation
flyingrobots Feb 15, 2026
4b67a8f
fix(ci): address PR #283 review feedback — security, scope, and corre…
flyingrobots Feb 15, 2026
86f93ba
docs: add det-gates backlog items to TASKS-DAG (#284, #285, #286, #287)
flyingrobots Feb 15, 2026
0428729
fix(ci): address round-2 review — run_none logic, iter_batched, claims
flyingrobots Feb 15, 2026
1d49446
fix(ci): round-3 review — regex escaping, zero-test guard, claims
flyingrobots Feb 15, 2026
a125ae0
fix(ci): anchor zero-test guard to prevent substring false positives
flyingrobots Feb 15, 2026
3b79157
fix(ci): round-4 review — permissions, idempotency, local sentinel, b…
flyingrobots Feb 21, 2026
18ded17
fix: address remaining PR #283 review feedback — round 5
flyingrobots Feb 21, 2026
4057812
fix(ci): round-6 review — G3 gate coverage + dind classification
flyingrobots Feb 21, 2026
f04c529
fix(ci): round-6 review — clarify G3 unconditional execution + max-cl…
flyingrobots Feb 22, 2026
a538897
fix(ci): round-7 review — sync guardrails, catch-all policy, macOS claim
flyingrobots Feb 22, 2026
023a669
fix(ci): round-8 — restore rustup target, add timeouts, early-exit op…
flyingrobots Feb 22, 2026
52ad723
docs: add backlog items for Rust caching and perf baseline comparison
flyingrobots Feb 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
333 changes: 333 additions & 0 deletions .github/workflows/det-gates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
# SPDX-License-Identifier: Apache-2.0
# © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
name: det-gates

on:
pull_request:
push:
branches: [main]

permissions:
contents: read

concurrency:
group: det-gates-${{ github.head_ref || github.ref }}
cancel-in-progress: true

jobs:
classify-changes:
name: classify-changes
runs-on: ubuntu-latest
timeout-minutes: 5
outputs:
run_full: ${{ steps.classify.outputs.run_full }}
run_reduced: ${{ steps.classify.outputs.run_reduced }}
run_none: ${{ steps.classify.outputs.run_none }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Detect changed files
id: changed
env:
BASE_REF: ${{ github.base_ref }}
EVENT_NAME: ${{ github.event_name }}
run: |
if [ "$EVENT_NAME" = "pull_request" ]; then
git fetch origin "$BASE_REF" --depth=1
git diff --name-only "origin/$BASE_REF...HEAD" > changed.txt
else
git diff --name-only HEAD~1..HEAD > changed.txt || true
fi
Comment on lines +36 to +42
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Push-to-main path only inspects the last commit — multi-commit merges or force pushes can slip through unclassified.

Line 33: git diff --name-only HEAD~1..HEAD captures changes from the immediate parent only. For a merge commit this includes all merged changes (correct), but for a force push landing multiple new non-merge commits, only the final commit's diff is analyzed. Files changed in earlier commits won't appear in changed.txt, so they bypass classification entirely.

If branch protection rules prevent force pushes to main, document that assumption. Otherwise, consider using the GitHub-provided ${{ github.event.before }} SHA (via env var) for push events to diff the full push range.

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 28 - 34, The push-to-main path
currently uses `git diff --name-only HEAD~1..HEAD` which only inspects the
immediate parent commit and can miss changes from multi-commit force-pushes;
update the workflow to diff the full push range by using the GitHub event SHAs
(e.g., reference `${{ github.event.before }}` and `${{ github.sha }}` via
environment variables) instead of `HEAD~1..HEAD`, and/or document the assumption
that force pushes to main are disallowed; ensure the block that writes to
`changed.txt` (the branch handling around `EVENT_NAME`, `BASE_REF`, and the `git
diff` invocation) uses the event SHAs for push events so all changed files are
captured.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — The HEAD~1 fallback for push events only inspects the last commit. For squash merges this is correct; for multi-commit merges it could miss files. Deferring — the primary gate path is PRs, and main pushes serve as a safety net.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyingrobots Understood — deferring is reasonable if your workflow enforces squash-merge to main. The HEAD~1 diff captures the entire squash commit's changes correctly in that scenario.

Would you like me to open a GitHub issue to track the multi-commit merge edge case for future hardening? That way you have a backlog item if branch-protection rules ever change or if you want defense-in-depth for accidental force-pushes.


✏️ Learnings added
Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:25.895Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:15.709Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

if [ "$EVENT_NAME" = "push" ] && [ ! -s changed.txt ]; then
echo "Warning: empty changelist on push, treating as full run" >&2
echo "det-policy.yaml" > changed.txt
fi
echo "Changed files:"
cat changed.txt || true

- name: Convert policy to JSON
run: |
yq -o=json det-policy.yaml > det-policy.json

- name: Classify path impact from det-policy.yaml
id: classify
run: |
node ./scripts/classify_changes.cjs det-policy.json changed.txt >> $GITHUB_OUTPUT

determinism-linux:
name: G1 determinism (linux)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: ubuntu-latest
Comment on lines +59 to +63
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

No Rust/cargo caching across any of the six jobs — every PR pays full compilation cost from scratch.

Six jobs (determinism-linux, determinism-macos, static-inspection, decoder-security, perf-regression, build-repro) each start with a cold cargo cache. For Rust projects, compilation dominates CI time. Adding Swatinem/rust-cache@v2 (or actions/cache targeting ~/.cargo and target/) to the test/inspection jobs would cut CI time dramatically.

Exception: build-repro should NOT use target caching since it's verifying reproducibility from clean builds. Cargo registry caching (download cache only) is acceptable there.

⚡ Example — add rust-cache to test jobs
      - uses: actions/checkout@v4
      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2  # Add this after toolchain setup
      - name: Run parity tests (linux)
        run: ...

Also applies to: 80-84, 113-117, 147-151, 167-171, 185-189

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 47 - 51, Add Rust/Cargo caching
to the CI jobs that currently start from cold builds: insert
Swatinem/rust-cache@v2 (or an actions/cache setup caching ~/.cargo and target/)
immediately after the Rust toolchain setup step in the jobs determinism-linux,
determinism-macos, static-inspection, decoder-security, and perf-regression to
avoid full recompiles on every PR; for build-repro do NOT cache the target
directory (only allow registry/download caching if desired) so the
reproducibility check still runs from a clean build.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — Adding Swatinem/rust-cache@v2 would significantly reduce CI build times. Deferring to a CI performance optimization pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 3 seconds before sending another message.

timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run parity tests (linux)
run: |
cargo test -p echo-scene-port test_float_parity_with_js -- --nocapture 2>&1 | tee det-linux.log
grep -q " 0 passed" det-linux.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Run DIND suite (linux)
run: |
node scripts/dind-run-suite.mjs --mode run | tee dind-linux.log
- name: Create digest table
env:
COMMIT_SHA: ${{ github.sha }}
RUN_ID: ${{ github.run_id }}
run: |
mkdir -p artifacts
echo "target,commit,run_id,digest" > artifacts/digest-table.csv
echo "linux,${COMMIT_SHA},${RUN_ID},$(sha256sum dind-report.json | cut -d' ' -f1)" >> artifacts/digest-table.csv
- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: det-linux-artifacts
path: |
det-linux.log
dind-linux.log
dind-report.json
artifacts/digest-table.csv

determinism-macos:
name: G1 determinism (macos)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: macos-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run parity tests (macos)
run: |
cargo test -p echo-scene-port test_float_parity_with_js -- --nocapture 2>&1 | tee det-macos.log
grep -q " 0 passed" det-macos.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Run DIND suite (macos)
run: |
node scripts/dind-run-suite.mjs --mode run | tee dind-macos.log
- name: Create digest table
env:
COMMIT_SHA: ${{ github.sha }}
RUN_ID: ${{ github.run_id }}
run: |
mkdir -p artifacts
echo "target,commit,run_id,digest" > artifacts/digest-table.csv
echo "macos,${COMMIT_SHA},${RUN_ID},$(shasum -a 256 dind-report.json | cut -d' ' -f1)" >> artifacts/digest-table.csv
- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: det-macos-artifacts
path: |
det-macos.log
dind-macos.log
dind-report.json
artifacts/digest-table.csv

static-inspection:
name: DET-001 Static Inspection
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- name: Install ripgrep
run: command -v rg >/dev/null || (sudo apt-get update && sudo apt-get install -y ripgrep)
- name: Compute DETERMINISM_PATHS from policy
id: det_paths
run: |
PATHS=$(yq -o=json det-policy.yaml | jq -r '
.crates | to_entries[] |
select(.value.class == "DET_CRITICAL") |
.value.paths[]' |
grep '^crates/' | sed 's|/\*\*$||' | sort -u | tr '\n' ' ')
echo "paths=$PATHS" >> "$GITHUB_OUTPUT"
- name: Run determinism check
id: det_check
env:
DETERMINISM_PATHS: ${{ steps.det_paths.outputs.paths }}
run: |
./scripts/ban-nondeterminism.sh | tee static-inspection.log
- name: Create report
if: always()
env:
DET_OUTCOME: ${{ steps.det_check.outcome }}
run: |
if [ "$DET_OUTCOME" = "success" ]; then
echo '{"claim_id": "DET-001", "status": "PASSED"}' > static-inspection.json
else
echo '{"claim_id": "DET-001", "status": "FAILED"}' > static-inspection.json
fi
Comment on lines +156 to +165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Evidence integrity gap: generate_evidence.cjs ignores the content of static-inspection.json.

The static-inspection job correctly writes PASSED or FAILED based on steps.det_check.outcome (lines 161-165). However, generate_evidence.cjs (invoked at line 304) determines DET-001 status purely by checking if the static-inspection directory exists and is non-empty — it never reads static-inspection.json to verify the actual outcome.

Attack scenario:

  1. ban-nondeterminism.sh exits non-zero → static-inspection.json contains "status": "FAILED"
  2. Upload step runs with if: always() → directory is uploaded
  3. generate_evidence.cjs sees the directory exists → marks DET-001 as VERIFIED
  4. validate_claims.cjs only validates structure, not outcome → passes

Net result: A failed DET-001 check produces a VERIFIED evidence claim. The entire evidence chain becomes untrustworthy.

🔥 Proposed fix — read and validate artifact content

Modify generate_evidence.cjs to read static-inspection.json and set DET-001 status based on its content:

+  const checkStaticInspection = () => {
+    try {
+      const report = JSON.parse(fs.readFileSync(
+        path.join(gatheredArtifactsDir, 'static-inspection', 'static-inspection.json'), 'utf8'));
+      return report.status === 'PASSED';
+    } catch {
+      return false;
+    }
+  };
+
   const claims = [
     {
       id: 'DET-001',
-      status: checkArtifact('static-inspection') ? 'VERIFIED' : 'UNVERIFIED',
+      status: checkStaticInspection() ? 'VERIFIED' : 'UNVERIFIED',
       evidence: { workflow, run_id: runId, commit_sha: commitSha, artifact_name: 'static-inspection' }
     },

Also applies to: 302-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/det-gates.yml around lines 156 - 165,
generate_evidence.cjs currently treats the presence of the static-inspection
directory as proof of DET-001 success; update it to open and parse
static-inspection.json (the artifact produced by the job that writes
{"claim_id":"DET-001","status":"PASSED"/"FAILED"}) and derive DET-001’s status
from the parsed "status" field (handle missing file, invalid JSON, or missing
field as a FAILED/unverified condition and surface an error), then use that
concrete status when creating the evidence claim instead of only checking
directory existence; ensure the code references the static-inspection.json
filename and DET-001 claim id so reviewers can locate the logic to change.

- name: Upload inspection artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: static-inspection
path: |
static-inspection.log
static-inspection.json

decoder-security:
name: G2 decoder security tests
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run codec tests
run: |
cargo test -p echo-scene-codec --lib cbor::tests -- --nocapture 2>&1 | tee sec-tests.log
grep -q " 0 passed" sec-tests.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Upload security artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: sec-artifacts
path: |
sec-tests.log
docs/determinism/sec-claim-map.json

perf-regression:
name: G3 perf regression (criterion)
needs: classify-changes
# Runs for ALL non-NONCRITICAL changes (both DET_CRITICAL and DET_IMPORTANT).
# G3 is staging-optional per RELEASE_POLICY.md but always executes here.
if: needs.classify-changes.outputs.run_none != 'true'
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run benchmarks
run: |
cargo bench -p warp-benches --bench materialization_hotpath -- --output-format bencher | tee perf.log
- name: Upload perf artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: perf-artifacts
path: perf.log
Comment on lines 198 to 218
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

G3 "perf regression" doesn't detect regressions — there is no baseline comparison.

The job runs Criterion benchmarks, captures output, and uploads it. That's a "benchmarks compile and run" gate. The name says "perf regression" but there is no historical baseline, no threshold, no comparison, and no failure criterion tied to performance. A 10x regression in materialization_hotpath passes this gate with a green check.

At minimum, either:

  1. Rename to "G3 perf baseline (criterion)" to set honest expectations, or
  2. Integrate critcmp or bencher.dev with a stored baseline and fail on regressions beyond a threshold.

If option 1 for now, document that regression detection is a future enhancement.

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 167 - 183, The
"perf-regression" workflow job currently only runs Criterion and uploads
perf.log without any baseline comparison; either rename the job to reflect it is
a baseline run (change job name from perf-regression to something like "G3 perf
baseline (criterion)" and update the job display name) and add a step that
echoes a comment noting that regression detection is a future enhancement, or
implement a comparison step that runs critcmp/bencher.dev against a stored
baseline and fails the job when the benchmark "materialization_hotpath"
regresses beyond the configured threshold; locate the job block labeled
perf-regression and modify the name field or append a post-processing step after
the cargo bench command to perform the comparison and set a non-zero exit on
regression.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — G3 currently gates on benchmark compilation and execution, not regression detection. True regression detection requires baseline storage (e.g., criterion's baseline comparison or GitHub Action benchmark caching). This is a known limitation and a future enhancement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 51 seconds before sending another message.


build-repro:
name: G4 build reproducibility (wasm)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Setup Rust (Global)
uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown
- name: Checkout Build 1
uses: actions/checkout@v4
with:
path: build1
- name: Build 1
run: |
cd build1
rustup target add wasm32-unknown-unknown
cargo build --release --target wasm32-unknown-unknown -p ttd-browser
sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash1.txt
cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build1.wasm
- name: Checkout Build 2
uses: actions/checkout@v4
with:
path: build2
- name: Build 2
run: |
cd build2
rustup target add wasm32-unknown-unknown
cargo build --release --target wasm32-unknown-unknown -p ttd-browser
sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash2.txt
cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build2.wasm
Comment on lines +235 to +252
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant rustup target add wasm32-unknown-unknown in both build steps — already declared in the toolchain setup.

Lines 202 and 213 both run rustup target add wasm32-unknown-unknown, but the dtolnay/rust-toolchain@stable step on lines 191–194 already specifies targets: wasm32-unknown-unknown. The target is already installed by the time these steps execute. Not a bug, but unnecessary noise in a hardening-grade workflow.

🧹 Proposed cleanup
       - name: Build 1
         run: |
           cd build1
-          rustup target add wasm32-unknown-unknown
           cargo build --release --target wasm32-unknown-unknown -p ttd-browser
           sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash1.txt
           cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build1.wasm
       - name: Checkout Build 2
         uses: actions/checkout@v4
         with:
           path: build2
       - name: Build 2
         run: |
           cd build2
-          rustup target add wasm32-unknown-unknown
           cargo build --release --target wasm32-unknown-unknown -p ttd-browser
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 199 - 216, Remove the redundant
rustup target installation lines from the "Build 1" and "Build 2" steps: delete
the `rustup target add wasm32-unknown-unknown` commands in the run blocks for
the steps named "Build 1" and "Build 2" because the toolchain step
(dtolnay/rust-toolchain@stable with targets: wasm32-unknown-unknown) already
installs that target; keep the remaining build commands (cargo build, sha256sum,
cp) unchanged.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — The rustup target add in each build step is redundant with the toolchain setup. Kept for now as a defensive measure since the steps run in separate checkout directories. Can clean up in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 58 seconds before sending another message.

- name: Compare hashes
run: |
diff hash1.txt hash2.txt || (echo "Reproducibility failure: Hashes differ!" && exit 1)
echo "Hashes match: $(cat hash1.txt)"
- name: Upload build artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: build-repro-artifacts
path: |
hash1.txt
hash2.txt
build1.wasm
build2.wasm

validate-evidence:
name: Evidence schema / claim policy
needs:
- classify-changes
- determinism-linux
- determinism-macos
- static-inspection
- decoder-security
- perf-regression
- build-repro
if: always() && needs.classify-changes.result == 'success' && needs.classify-changes.outputs.run_none != 'true'
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: gathered-artifacts
- name: Verify artifact presence
env:
RUN_FULL: ${{ needs.classify-changes.outputs.run_full }}
run: |
ls -R gathered-artifacts
# Always required (run on both full and reduced)
[ -d gathered-artifacts/sec-artifacts ] || (echo "Missing sec-artifacts" && exit 1)
[ -d gathered-artifacts/build-repro-artifacts ] || (echo "Missing build-repro-artifacts" && exit 1)
[ -d gathered-artifacts/perf-artifacts ] || (echo "Missing perf-artifacts" && exit 1)
# Only required when run_full (these jobs are skipped for run_reduced)
if [ "$RUN_FULL" = "true" ]; then
[ -d gathered-artifacts/det-linux-artifacts ] || (echo "Missing det-linux-artifacts" && exit 1)
[ -d gathered-artifacts/det-macos-artifacts ] || (echo "Missing det-macos-artifacts" && exit 1)
[ -d gathered-artifacts/static-inspection ] || (echo "Missing static-inspection" && exit 1)
fi
Comment on lines 268 to 301
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "validate_claims.cjs" 2>/dev/null | head -5

Repository: flyingrobots/echo

Length of output: 43


🏁 Script executed:

git ls-files | grep -E "(validate_claims|det-gates)" | head -20

Repository: flyingrobots/echo

Length of output: 120


🏁 Script executed:

cat scripts/validate_claims.cjs

Repository: flyingrobots/echo

Length of output: 2442


Content validation for FAILED evidence is missing — validate_claims.cjs only checks VERIFIED claims.

The validate-evidence job (lines 247–261) performs directory presence checks, not content validation. The static-inspection artifact passes this gate so long as the directory exists, regardless of whether its report contains "status": "FAILED".

The script at scripts/validate_claims.cjs (line 267) does not provide the missing safeguard. Its logic is:

for (const claim of data.claims) {
  if (claim.status === 'VERIFIED') {
    // validate required pointers...
  }
}

It exclusively validates claims marked VERIFIED and completely ignores claims with FAILED status. A FAILED static-inspection report survives the entire validation pipeline unchallenged.

Add an explicit content check in the validate-evidence job to reject evidence packs containing any failed claims, or confirm this gap is handled downstream (and document it visibly in the workflow).

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 229 - 261, The workflow's
validate-evidence job and the validate_claims.cjs logic currently only check for
directory presence and validate claims with status === 'VERIFIED', ignoring any
claims with status === 'FAILED'; update the pipeline to explicitly fail when any
FAILED claims exist by either (A) modifying validate_claims.cjs (the loop over
data.claims and the claim.status check) to detect claim.status === 'FAILED' and
throw/exit non‑zero with a clear message, or (B) add a step in the
validate-evidence job that scans the gathered-artifacts (e.g., static-inspection
JSON reports) for any "status":"FAILED" and exit 1 if found; reference the
validate-evidence job and the validate_claims.cjs data.claims/claim.status logic
to locate where to implement this check and ensure the workflow fails visibly
when failed claims are present.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — validate_claims.cjs verifies that VERIFIED claims have valid immutable pointers (workflow, run_id, commit_sha, artifact_name). FAILED/UNVERIFIED claims don't claim to have evidence, so pointer validation is not applicable. The evidence pack correctly records the status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 57 seconds before sending another message.

- name: Generate evidence pack
run: |
node scripts/generate_evidence.cjs gathered-artifacts
- name: Validate evidence pointers
run: |
node scripts/validate_claims.cjs gathered-artifacts/evidence.json
- name: Cross-check claim IDs against CLAIM_MAP
run: |
EVIDENCE_IDS=$(jq -r '.claims[].id' gathered-artifacts/evidence.json | sort)
CLAIM_MAP_IDS=$(yq -o=json docs/determinism/CLAIM_MAP.yaml | jq -r '.claims | keys[]' | sort)
EXTRA=$(comm -23 <(echo "$EVIDENCE_IDS") <(echo "$CLAIM_MAP_IDS"))
MISSING=$(comm -13 <(echo "$EVIDENCE_IDS") <(echo "$CLAIM_MAP_IDS"))
if [ -n "$EXTRA" ]; then
echo "ERROR: Claims in evidence.json but not in CLAIM_MAP.yaml:" && echo "$EXTRA" && exit 1
fi
if [ -n "$MISSING" ]; then
echo "ERROR: Claims in CLAIM_MAP.yaml but not in evidence.json:" && echo "$MISSING" && exit 1
fi
echo "All claim IDs synchronized"
- name: Verify sec-claim-map test IDs exist
run: |
MISSING=""
for tid in $(jq -r '.mappings[].test_id' docs/determinism/sec-claim-map.json); do
fn_name="${tid##*::}"
if ! grep -rq "fn ${fn_name}" crates/echo-scene-codec/src/; then
MISSING="$MISSING $tid"
fi
done
if [ -n "$MISSING" ]; then
echo "ERROR: sec-claim-map.json references non-existent tests:$MISSING" && exit 1
fi
echo "All sec-claim-map test IDs verified"
Loading