Add nightly agent CLI integration tests by jwiegley · Pull Request #602 · git-ai-project/git-ai

jwiegley · 2026-02-26T23:32:46Z

Summary

Adds .github/workflows/nightly-agent-integration.yml — a two-tier nightly workflow that installs real agent CLI binaries and verifies git-ai hook wiring and attribution end-to-end
Adds scripts/nightly/ with four helper scripts implementing the test logic
Adds NIGHTLY_INTEGRATION_PLAN.md documenting the full design rationale and open questions

Test Architecture

Tier 1 — Hook Wiring (no API keys, free)

Builds git-ai from source, installs each agent CLI (Claude Code, Codex, Gemini, Droid, OpenCode) at both stable and latest versions via a dynamic matrix, then:

Runs git-ai install and verifies the correct checkpoint commands appear in each agent's config file
Exercises the full attribution pipeline with synthetic checkpoint data (via the agent-v1 preset)

Tier 2 — Live Integration (requires API key secrets)

Runs each agent with a minimal deterministic prompt ("create hello.txt, commit it"), then verifies the file was created, a commit landed, and authorship notes are present in refs/notes/ai. Pre-release failures are non-blocking (continue-on-error: true).

Hook config paths (verified against `src/mdm/agents/*.rs`)

Agent	Config file
Claude Code	`~/.claude/settings.json`
Codex	`~/.codex/config.toml`
Gemini CLI	`~/.gemini/settings.json`
Droid	`~/.factory/settings.json`
OpenCode	`~/.config/opencode/plugin/git-ai.ts`

Secrets required (Tier 2 only)

ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, FACTORY_API_KEY, SLACK_BOT_TOKEN, SLACK_CHANNEL_ID

Tier 1 runs without any secrets.

Cost estimate

~$0.05–0.25/night (weekdays only). See NIGHTLY_INTEGRATION_PLAN.md §6 for cost management strategies.

Test plan

Verify workflow YAML parses correctly in Actions UI
Trigger workflow_dispatch with tier: tier1 to validate hook-wiring jobs (no API keys needed)
Add ANTHROPIC_API_KEY secret and trigger tier: both to validate Claude Code Tier 2 end-to-end
Review open questions in NIGHTLY_INTEGRATION_PLAN.md §13 before enabling the nightly schedule

🤖 Generated with Claude Code

git-ai-cloud-dev · 2026-02-26T23:32:51Z

Stats powered by Git AI

🧠 you    █░░░░░░░░░░░░░░░░░░░  7%
🤖 ai     ░███████████████████  93%

More stats

1.0 lines generated for every 1 accepted
11 minutes waiting for AI
Top model: claude::claude-opus-4-6 (932 accepted lines, 932 generated lines)

AI code tracked with git-ai

git-ai-cloud · 2026-02-26T23:34:19Z

Stats powered by Git AI

🧠 you    ████░░░░░░░░░░░░░░░░  22%
🤖 ai     ░░░░████████████████  78%

More stats

0.9 lines generated for every 1 accepted
4 minutes waiting for AI
Top model: claude::claude-sonnet-4-6 (263 accepted lines, 238 generated lines)

AI code tracked with git-ai

git-ai-bot-svarlamov-dev · 2026-03-04T23:02:11Z

Stats powered by Git AI

🧠 you    █░░░░░░░░░░░░░░░░░░░  7%
🤖 ai     ░███████████████████  93%

More stats

1.0 lines generated for every 1 accepted
11 minutes waiting for AI
Top model: claude::claude-opus-4-6 (932 accepted lines, 932 generated lines)

AI code tracked with git-ai

Implements a two-tier nightly GitHub Actions workflow that verifies git-ai hooks fire correctly with real agent CLI binaries (Claude Code, Codex, Gemini CLI, Droid, OpenCode) on both stable and latest releases. Tier 1 (no API keys): Installs each agent CLI, runs `git-ai install`, verifies hook config files contain the correct checkpoint commands, then exercises the full attribution pipeline with synthetic checkpoint data via the agent-v1 preset. Tier 2 (live, requires API keys): Runs each agent with a deterministic prompt in a test repo and verifies authorship notes and blame output. New files: - .github/workflows/nightly-agent-integration.yml - scripts/nightly/verify-hook-wiring.sh - scripts/nightly/test-synthetic-checkpoint.sh - scripts/nightly/test-live-agent.sh - scripts/nightly/verify-attribution.sh Hook config paths verified against src/mdm/agents/*.rs: - claude: ~/.claude/settings.json - codex: ~/.codex/config.toml - gemini: ~/.gemini/settings.json - droid: ~/.factory/settings.json - opencode: ~/.config/opencode/plugin/git-ai.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Neither file belongs in the repo: .mcp.json is local tooling config and the plan document was a design scratch pad, not a deliverable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. scripts/nightly/test-synthetic-checkpoint.sh: Fix transcript message schema in the synthetic checkpoint JSON payload. The Rust Message enum uses `#[serde(tag = "type", rename_all = "snake_case")]`, so messages require `"type"` and `"text"` fields — not `"role"` and `"content"`. The old schema caused deserialization to fail for every Tier 1 run. 2. .github/workflows/nightly-agent-integration.yml: Fix notify-on-failure condition. With `if: failure()`, GitHub Actions skips the job entirely when tier2-live-integration is skipped (e.g. when running tier1-only), silently swallowing Tier 1 failures. Replace with an explicit always() guard that checks each dependency's result individually. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add a pull_request `labeled` event trigger so the full nightly suite runs whenever someone applies the 'Integration' label to any PR — in addition to the existing nightly schedule and workflow_dispatch paths. The gate condition on the resolve-versions job ensures the downstream matrix jobs only run for the correct trigger, not for every label event. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The label is 'integration', not 'Integration'. GitHub label names are case-sensitive in Actions expressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the placeholder hello.txt smoke test with real end-to-end tests that verify git-ai's entire attribution pipeline: test-live-agent.sh: - Seeds the test repo with a real Python module (utils/math_utils.py) containing add, subtract, and is_prime functions - Runs the real agent CLI with a substantive prompt: add a fibonacci function using an iterative approach and commit it - Falls back to a manual commit if the agent wrote code but didn't commit (post-commit hook still fires and writes the authorship note as long as working log data was captured during the agent run) - Idempotent across retry attempts verify-attribution.sh: - Checks fibonacci function was actually added to the Python file - Verifies ≥3 commits exist (initial + seed + agent) - Fetches and parses the authorship note from refs/notes/ai - Asserts schema_version = "authorship/3.0.0" - Asserts at least one prompt session was recorded (hard fail) - Fuzzy-matches agent_id.tool against the agent name - Checks transcript messages were captured - Verifies utils/math_utils.py appears in the attestation section - Runs git-ai blame and checks AI attribution on fibonacci lines - Saves all artefacts (raw note, parsed metadata, blame output) to RESULTS_DIR for upload Workflow: increase Tier 2 job timeout from 25→45 min and retry timeout from 12→20 min to accommodate seeding + real agent API calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The install-scripts-local workflow does more than validate install scripts — it verifies full end-to-end hook wiring between git-ai and Claude Code. Rename the workflow and job names to reflect that. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the fake claude binary stub with real npm-installed agent CLIs and add a matrix covering all four supported agents. This makes the End-to-End tests meaningful: install.sh now runs git-ai install-hooks against actual agent binaries, which auto-detect the installed tool and write real hook configuration to each agent's config directory. Verification uses the existing verify-hook-wiring.sh script (Unix) and equivalent inline PowerShell checks (Windows) to confirm hooks were written to the correct agent-specific location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two bugs in the E2E test setup: 1. opencode npm package: the package is "opencode-ai" not "opencode". The bare "opencode" name returns a 404 from the npm registry. Fixed in both the E2E install workflow and the nightly agent integration workflow. 2. codex hook verification: grep pattern "checkpoint codex" expects a JSON-style command string, but Codex config uses a TOML array where elements are comma-separated: notify = ["<bin>", "checkpoint", "codex", ...]. Changed to grep for just "checkpoint" which appears in the array and is sufficient to confirm the hook is configured. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The same TOML array format issue that was fixed in verify-hook-wiring.sh for Unix also affects the Windows inline PowerShell check. Codex stores its hook as a TOML array (notify = ["<bin>", "checkpoint", "codex", ...]) so Select-String for "checkpoint codex" never matches. Changed to match just "checkpoint". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n verify-attribution.sh The `[ $? -eq 0 ] || fail "..."` guard was dead code under `set -euo pipefail`: if the python3 heredoc exits with code 1, `set -e` terminates the script immediately before the guard is reached, producing a silent exit with no diagnostic logged to $LOG. Replace with `if ! python3 ... <<'PYEOF' ... then fail "..." fi`, which is exempt from `set -e` and ensures the descriptive failure message is written to $LOG before exiting. Resolves Devin review comment BUG_pr-review-job-8b70596b_0002. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Tier 1 and Tier 2 nightly jobs were calling `git-ai install` to set up agent hooks, but never creating the `git` → `git-ai` symlink in the release directory. When test scripts called `git commit`, the system git ran instead of the git-ai proxy, so the post-commit hook never fired and no authorship note was written to refs/notes/ai. Add `ln -sf .../git-ai .../git` in both the Tier 1 and Tier 2 "Install git-ai hooks in test repo" steps so that all `git` invocations inside test scripts (which prepend the release dir to PATH) route through git-ai and trigger the expected hook behaviour. Resolves Devin review comment BUG_pr-review-job-bf54cac596f44273b5f8565f81a63daf_0001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous Lint (ubuntu-latest) check failed on `go-task/setup-task@v1` (not on any code change) — the same action passed on the identical commit via e2e-tests. No code changes; forcing a clean CI run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. verify-attribution.sh: guard empty-string fuzzy match `"" in "claude"` is True in Python, so a missing agent_id.tool would always report PASS. Added `if tool and (...)` to require a non-empty tool string before the fuzzy match runs. Resolves Devin BUG_pr-review-job-032b242ab75044ebac035a42020d7fe3_0001. 2. test-live-agent.sh: add `sudo` to ripgrep fallback install `apt-get install` on GitHub Actions ubuntu-latest requires root. Without `sudo` the install failed silently (2>/dev/null || true), leaving `rg` absent and potentially causing the Gemini CLI to hang. Resolves Devin BUG_pr-review-job-6b947f0c5f1e475bb3ffbeba9e6056de_0001. 3. nightly-agent-integration.yml: deduplicate stable/latest matrix entries `npm view <pkg> version` and `npm view <pkg> dist-tags.latest` return the same value, so stable and latest channels always tested the same version, doubling CI cost for zero extra coverage. Now queries `dist-tags.next` for the latest channel (pre-release/canary), falling back to stable_ver if no `next` tag exists, and skips the latest entry entirely when it would duplicate stable. Resolves Devin BUG_pr-review-job-6b947f0c5f1e475bb3ffbeba9e6056de_0002. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@latest

The previous fix queried dist-tags.next for latest_ver but still used @latest in the npm install command, which resolves to the stable release — identical to the stable channel and defeating the entire purpose of the latest matrix entry. Change the npm_pkg construction for the latest channel to use @next so the pre-release/canary version is actually installed when it exists. Resolves Devin BUG_pr-review-job-070479ba6d7041699555d4dfa9779fa3_0001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

npm view <pkg> dist-tags.next exits with code 0 and returns an empty string (or "undefined") when the tag does not exist in npm 10+, rather than raising a non-zero exit. This meant CalledProcessError was never raised, latest_ver was set to "" or "undefined", the dedup check ("" != stable_ver) didn't fire, and a matrix entry was emitted with npm_pkg="<pkg>@next" — causing npm install to fail with ETARGET. Add an explicit check after .strip(): if the result is empty or equals the string "undefined", fall back to stable_ver, triggering the same deduplication skip as the CalledProcessError path. Resolves Devin BUG_pr-review-job-874dec7614a64a5e952cf18579ebc182_0001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- install-scripts-local.yml: replace hardcoded `grep checkpoint claude` with a bash case statement matching the Windows switch, so each agent matrix entry verifies its own hook config file (claude→settings.json, codex→config.toml, gemini→settings.json, opencode→plugin file) - nightly-agent-integration.yml: pass workflow_dispatch `agents` input as AGENTS_FILTER env var and filter the Python matrix builder so that specifying e.g. `agents: "claude"` actually limits the matrix instead of unconditionally running all four npm agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The droid entry was appended unconditionally after the filtered npm-agent loop, so specifying `agents: "claude"` via workflow_dispatch would still include droid in the matrix. Wrap the append in the same filter check so droid is only included when the filter is absent, set to "all", or explicitly contains "droid". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The bare `.vscode` entry would silently hide any new files added under .vscode/ from `git status`, requiring `git add -f` to track them, and misleads contributors into thinking the whole directory should be untracked. Replace it with `.vscode/*` + `!.vscode/settings.json` so that the tracked project settings file remains visible while any other editor-local files are still ignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Every matrix cell in the E2E install workflow now runs three additional phases after verifying agent hook configuration: 1. Simulate an AI commit — create a test git repo, wire the git→git-ai proxy symlink and post-commit hook (via `git-ai install`), then feed synthetic checkpoint data through `git-ai checkpoint agent-v1` and commit, exactly as the nightly Tier 1 tests do. 2. Verify attribution tracking — new script `scripts/nightly/verify-synthetic-attribution.sh` checks: - Authorship note exists on HEAD (post-commit hook fired) - Note contains parseable JSON with schema_version = authorship/3.0.0 - At least one prompt session was recorded (prompt stored) - At least one transcript message was captured - `git-ai stats HEAD --json` shows ai_additions > 0 - Test file appears in the note's attestation section - `git-ai blame` shows AI attribution markers 3. Upload results artifact for every matrix cell (always). Windows job mirrors the Unix flow using PowerShell: copies git-ai.exe as git.exe (proxy without requiring developer mode for symlinks), builds the checkpoint JSON via ConvertTo-Json, and performs the same 8 attribution checks inline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 35 additional findings in Devin Review.

devin-ai-integration · 2026-03-10T19:49:22Z

.github/workflows/install-scripts-local.yml

+          $lines | Set-Content -Path $log
+          Write-Log "=== Synthetic attribution verification COMPLETE: $agent ==="


🟡 Windows verification log file written before final Write-Log call, losing the COMPLETE message

In the Windows "Verify attribution pipeline" step, $lines | Set-Content -Path $log at line 418 writes the log file to disk, but then Write-Log is called at line 419 which appends to $lines (via $lines.Add($msg)) after the file was already written. The "COMPLETE" message is printed to stdout via Write-Host but is missing from the log file that gets uploaded as an artifact. The pattern used for all other failure paths writes $lines | Set-Content then throws, but the happy-path final write was placed before the last log line.

Mismatched write sequence at lines 418-419

Line 418: $lines | Set-Content -Path $log (file written)
Line 419: Write-Log "=== Synthetic attribution verification COMPLETE: $agent ===" (adds to $lines AND Write-Host, but file already on disk)

Suggested change

$lines | Set-Content -Path $log

Write-Log "=== Synthetic attribution verification COMPLETE: $agent ==="

Write-Log "=== Synthetic attribution verification COMPLETE: $agent ==="

$lines | Set-Content -Path $log

Was this helpful? React with 👍 or 👎 to provide feedback.

The agent-v1 checkpoint format stores an empty messages[] because conversation transcripts are only captured by live agent hooks, not synthetic checkpoints. This is expected behaviour — downgrade the check from a hard failure to a warning, consistent with how verify-attribution.sh handles the same condition for live agent runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Same fix as the bash script — synthetic checkpoints don't store conversation messages, so this should be a warning not a failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

jwiegley added the integration label Feb 27, 2026

This comment was marked as resolved.

Sign in to view

jwiegley force-pushed the johnw/nightly-integration branch from 12b324c to 8135e62 Compare March 10, 2026 01:37

This comment was marked as resolved.

Sign in to view

jwiegley and others added 14 commits March 9, 2026 21:41

Remove .mcp.json and NIGHTLY_INTEGRATION_PLAN.md

ca7e09f

Neither file belongs in the repo: .mcp.json is local tooling config and the plan document was a design scratch pad, not a deliverable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix integration label name to lowercase

e78714c

The label is 'integration', not 'Integration'. GitHub label names are case-sensitive in Actions expressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jwiegley and others added 3 commits March 9, 2026 21:41

jwiegley force-pushed the johnw/nightly-integration branch from 8135e62 to cbdfa6e Compare March 10, 2026 04:45

This comment was marked as resolved.

Sign in to view

jwiegley and others added 2 commits March 10, 2026 09:24

devin-ai-integration bot reviewed Mar 10, 2026

View reviewed changes

jwiegley and others added 2 commits March 10, 2026 13:16

fix: downgrade Windows synthetic transcript check from fail to warn

0eb38cc

Same fix as the bash script — synthetic checkpoints don't store conversation messages, so this should be a warning not a failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

		$lines \| Set-Content -Path $log
		Write-Log "=== Synthetic attribution verification COMPLETE: $agent ==="

Conversation

jwiegley commented Feb 26, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Architecture

Hook config paths (verified against src/mdm/agents/*.rs)

Secrets required (Tier 2 only)

Cost estimate

Test plan

Uh oh!

git-ai-cloud-dev bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

git-ai-cloud bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

git-ai-bot-svarlamov-dev bot commented Mar 4, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwiegley commented Feb 26, 2026 •

edited by devin-ai-integration bot

Loading

Hook config paths (verified against `src/mdm/agents/*.rs`)

git-ai-cloud-dev bot commented Feb 26, 2026 •

edited

Loading

git-ai-cloud bot commented Feb 26, 2026 •

edited

Loading