-
Notifications
You must be signed in to change notification settings - Fork 157
Add nightly agent CLI integration tests #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
0cfe439
Add nightly agent CLI integration test workflow
jwiegley ca7e09f
Remove .mcp.json and NIGHTLY_INTEGRATION_PLAN.md
jwiegley e0ea770
Fix two bugs flagged by Devin review
jwiegley dd11f29
Trigger integration tests when 'Integration' label is applied to a PR
jwiegley e78714c
Fix integration label name to lowercase
jwiegley 881ffb0
Rewrite Tier 2 integration tests for full end-to-end coverage
jwiegley 7afb0d2
ci: rename install scripts CI jobs to end-to-end tests
jwiegley fa8fb6b
ci: use real agent CLIs in E2E tests (claude, codex, gemini, opencode)
jwiegley 8099458
fix: correct opencode npm package name and codex hook grep pattern
jwiegley 8e80a3b
fix: fix codex hook grep in Windows PowerShell verification too
jwiegley 5f0b591
fix: use if-then pattern to prevent set -e swallowing python3 error i…
jwiegley 5f706cb
fix: add git→git-ai proxy symlink before install in nightly workflow
jwiegley 4e65b2e
ci: retrigger CI (transient go-task/setup-task infrastructure flake)
jwiegley 7f1c51f
fix: address three Devin review bugs in nightly scripts
jwiegley e07f3df
fix: use @next dist-tag when installing pre-release in latest channel
jwiegley 307cd6e
fix: guard against empty/undefined dist-tags.next output from npm 10+
jwiegley cbdfa6e
fix: address two remaining Devin review bugs in CI workflows
jwiegley ef57018
fix: guard droid matrix entry against AGENTS_FILTER
jwiegley 36965d8
fix: narrow .vscode gitignore to preserve tracked settings.json
jwiegley 96b5965
feat: add AI commit E2E verification to install-scripts-local tests
jwiegley 9a17c4c
fix: downgrade synthetic transcript check from fail to warn
jwiegley 0eb38cc
fix: downgrade Windows synthetic transcript check from fail to warn
jwiegley File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # API Keys (Required to enable respective provider) | ||
| ANTHROPIC_API_KEY="your_anthropic_api_key_here" # Required: Format: sk-ant-api03-... | ||
| PERPLEXITY_API_KEY="your_perplexity_api_key_here" # Optional: Format: pplx-... | ||
| OPENAI_API_KEY="your_openai_api_key_here" # Optional, for OpenAI models. Format: sk-proj-... | ||
| GOOGLE_API_KEY="your_google_api_key_here" # Optional, for Google Gemini models. | ||
| MISTRAL_API_KEY="your_mistral_key_here" # Optional, for Mistral AI models. | ||
| XAI_API_KEY="YOUR_XAI_KEY_HERE" # Optional, for xAI AI models. | ||
| GROQ_API_KEY="YOUR_GROQ_KEY_HERE" # Optional, for Groq models. | ||
| OPENROUTER_API_KEY="YOUR_OPENROUTER_KEY_HERE" # Optional, for OpenRouter models. | ||
| AZURE_OPENAI_API_KEY="your_azure_key_here" # Optional, for Azure OpenAI models (requires endpoint in .taskmaster/config.json). | ||
| OLLAMA_API_KEY="your_ollama_api_key_here" # Optional: For remote Ollama servers that require authentication. | ||
| GITHUB_API_KEY="your_github_api_key_here" # Optional: For GitHub import/export features. Format: ghp_... or github_pat_... |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,368 @@ | ||
| name: Nightly Agent CLI Integration Tests | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: '0 4 * * 1-5' # 4 AM UTC, weekdays only | ||
| pull_request: | ||
| types: [labeled] | ||
| workflow_dispatch: | ||
| inputs: | ||
| agents: | ||
| description: 'Comma-separated agents to test (or "all")' | ||
| default: 'all' | ||
| required: false | ||
| tier: | ||
| description: 'Test tier to run' | ||
| type: choice | ||
| default: 'both' | ||
| options: [tier1, tier2, both] | ||
|
|
||
| env: | ||
| GIT_AI_DEBUG: "1" | ||
| CARGO_INCREMENTAL: "0" | ||
|
|
||
| jobs: | ||
| # ── Version Resolution ───────────────────────────────────────────────────── | ||
| resolve-versions: | ||
| name: Resolve agent CLI versions | ||
| runs-on: ubuntu-latest | ||
| # Run on schedule, manual dispatch, or when the "Integration" label is applied to a PR | ||
| if: >- | ||
| github.event_name == 'schedule' || | ||
| github.event_name == 'workflow_dispatch' || | ||
| (github.event_name == 'pull_request' && github.event.label.name == 'integration') | ||
| outputs: | ||
| matrix: ${{ steps.build-matrix.outputs.matrix }} | ||
| steps: | ||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '22' | ||
|
|
||
| - id: build-matrix | ||
| name: Build dynamic test matrix | ||
| env: | ||
| AGENTS_FILTER: ${{ github.event.inputs.agents }} | ||
| run: | | ||
| python3 - <<'PY' | ||
| import json, subprocess, os | ||
|
|
||
| agents = { | ||
| "claude": {"pkg": "@anthropic-ai/claude-code", "key": "ANTHROPIC_API_KEY"}, | ||
| "codex": {"pkg": "@openai/codex", "key": "OPENAI_API_KEY"}, | ||
| "gemini": {"pkg": "@google/gemini-cli", "key": "GEMINI_API_KEY"}, | ||
| "opencode": {"pkg": "opencode-ai", "key": "ANTHROPIC_API_KEY"}, | ||
| } | ||
|
|
||
| # Filter agents when workflow_dispatch specifies a subset | ||
| agents_filter = os.environ.get("AGENTS_FILTER", "all").strip() | ||
| if agents_filter and agents_filter != "all": | ||
| allowed = {a.strip() for a in agents_filter.split(",")} | ||
| agents = {k: v for k, v in agents.items() if k in allowed} | ||
|
|
||
| headless_cmds = { | ||
| "claude": "claude -p --dangerously-skip-permissions --max-turns 3", | ||
| "codex": "codex exec --full-auto", | ||
| "gemini": "gemini --approval-mode=yolo", | ||
| "opencode": "opencode run --command", | ||
| } | ||
|
|
||
| matrix = {"include": []} | ||
| for agent, info in agents.items(): | ||
| try: | ||
| stable_ver = subprocess.check_output( | ||
| ["npm", "view", info["pkg"], "version"], | ||
| text=True, stderr=subprocess.DEVNULL | ||
| ).strip() | ||
| # Try the "next" dist-tag for a pre-release; fall back to stable | ||
| # to avoid doubling CI cost when no canary exists | ||
| try: | ||
| latest_ver = subprocess.check_output( | ||
| ["npm", "view", info["pkg"], "dist-tags.next"], | ||
| text=True, stderr=subprocess.DEVNULL | ||
| ).strip() | ||
| # npm 10+ exits 0 with empty output or "undefined" when the | ||
| # dist-tag doesn't exist, so CalledProcessError is not raised | ||
| if not latest_ver or latest_ver == "undefined": | ||
| latest_ver = stable_ver | ||
| except subprocess.CalledProcessError: | ||
| latest_ver = stable_ver # No pre-release; skip duplicate | ||
|
jwiegley marked this conversation as resolved.
|
||
| except subprocess.CalledProcessError: | ||
| print(f"Warning: Could not resolve versions for {info['pkg']}", flush=True) | ||
| stable_ver = "latest" | ||
| latest_ver = "latest" | ||
|
|
||
| for channel in ["stable", "latest"]: | ||
| ver = stable_ver if channel == "stable" else latest_ver | ||
| # Skip the latest channel when it resolves to the same version as | ||
| # stable — no additional coverage, just wastes CI resources | ||
| if channel == "latest" and latest_ver == stable_ver: | ||
| continue | ||
| npm_pkg = f"{info['pkg']}@{ver}" if channel == "stable" else f"{info['pkg']}@next" | ||
| matrix["include"].append({ | ||
| "agent": agent, | ||
| "channel": channel, | ||
| "npm_pkg": npm_pkg, | ||
| "version": ver, | ||
| "api_key_var": info["key"], | ||
| "headless_cmd": headless_cmds[agent], | ||
| }) | ||
|
|
||
| # Droid uses curl installer (latest only, no npm version pinning) | ||
| # Respect AGENTS_FILTER so an explicit agents=claude excludes droid | ||
| if not agents_filter or agents_filter == "all" or "droid" in {a.strip() for a in agents_filter.split(",")}: | ||
| matrix["include"].append({ | ||
| "agent": "droid", | ||
| "channel": "latest", | ||
| "npm_pkg": "", | ||
| "version": "latest", | ||
| "api_key_var": "FACTORY_API_KEY", | ||
| "headless_cmd": "droid exec --auto high", | ||
| }) | ||
|
|
||
| with open(os.environ["GITHUB_OUTPUT"], "a") as f: | ||
| f.write(f"matrix={json.dumps(matrix)}\n") | ||
|
|
||
| print(f"Matrix built: {len(matrix['include'])} entries", flush=True) | ||
| PY | ||
|
jwiegley marked this conversation as resolved.
|
||
|
|
||
| # ── Tier 1: Hook Wiring Verification ────────────────────────────────────── | ||
| tier1-hook-wiring: | ||
| name: 'Tier 1: ${{ matrix.agent }} ${{ matrix.channel }}' | ||
| needs: resolve-versions | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 20 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: ${{ fromJson(needs.resolve-versions.outputs.matrix) }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - uses: dtolnay/rust-toolchain@master | ||
| with: | ||
| toolchain: "1.93.0" | ||
|
|
||
| - uses: actions/cache@v4 | ||
| with: | ||
| path: | | ||
| ~/.cargo/registry | ||
| ~/.cargo/git | ||
| target | ||
| key: ${{ runner.os }}-cargo-release-${{ hashFiles('Cargo.lock') }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-cargo-release- | ||
| ${{ runner.os }}-cargo- | ||
|
|
||
| - name: Build git-ai (release) | ||
| run: cargo build --release | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '22' | ||
|
|
||
| - name: Install agent CLI — ${{ matrix.agent }} (${{ matrix.channel }}) | ||
| run: | | ||
| if [ "${{ matrix.agent }}" = "droid" ]; then | ||
| curl -fsSL https://app.factory.ai/cli | sh | ||
| echo "$HOME/.local/bin" >> "$GITHUB_PATH" | ||
| else | ||
| npm install -g "${{ matrix.npm_pkg }}" | ||
| fi | ||
|
|
||
| - name: Verify agent binary is present | ||
| run: | | ||
| case "${{ matrix.agent }}" in | ||
| claude) claude --version ;; | ||
| codex) codex --version ;; | ||
| gemini) gemini --help 2>&1 | head -3 || true ;; | ||
| droid) droid --version ;; | ||
| opencode) opencode --version ;; | ||
| esac | ||
|
|
||
| - name: Create test repository | ||
| run: | | ||
| mkdir -p /tmp/test-repo | ||
| cd /tmp/test-repo | ||
| git init | ||
| git config user.email "ci@git-ai.test" | ||
| git config user.name "CI Test" | ||
| echo "# Integration Test Repo" > README.md | ||
| git add README.md | ||
| git commit -m "Initial commit" | ||
|
|
||
| - name: Install git-ai hooks in test repo | ||
| run: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| # Symlink git-ai as "git" so the proxy intercepts commits and fires hooks | ||
| ln -sf "$GITHUB_WORKSPACE/target/release/git-ai" "$GITHUB_WORKSPACE/target/release/git" | ||
| cd /tmp/test-repo | ||
| git-ai install | ||
|
devin-ai-integration[bot] marked this conversation as resolved.
|
||
|
|
||
| - name: Verify hook wiring | ||
| run: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| bash "$GITHUB_WORKSPACE/scripts/nightly/verify-hook-wiring.sh" "${{ matrix.agent }}" | ||
|
|
||
| - name: Synthetic checkpoint test | ||
| run: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| cd /tmp/test-repo | ||
| bash "$GITHUB_WORKSPACE/scripts/nightly/test-synthetic-checkpoint.sh" \ | ||
| "${{ matrix.agent }}" \ | ||
| /tmp/test-repo | ||
|
|
||
| - name: Upload test results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: tier1-${{ matrix.agent }}-${{ matrix.channel }} | ||
| path: /tmp/test-results/ | ||
| retention-days: 7 | ||
| if-no-files-found: warn | ||
|
|
||
| # ── Tier 2: Live Agent Integration ──────────────────────────────────────── | ||
| tier2-live-integration: | ||
| name: 'Tier 2 (live): ${{ matrix.agent }} ${{ matrix.channel }}' | ||
| needs: [resolve-versions, tier1-hook-wiring] | ||
| if: ${{ github.event.inputs.tier != 'tier1' }} | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 45 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: ${{ fromJson(needs.resolve-versions.outputs.matrix) }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - uses: dtolnay/rust-toolchain@master | ||
| with: | ||
| toolchain: "1.93.0" | ||
|
|
||
| - uses: actions/cache@v4 | ||
| with: | ||
| path: | | ||
| ~/.cargo/registry | ||
| ~/.cargo/git | ||
| target | ||
| key: ${{ runner.os }}-cargo-release-${{ hashFiles('Cargo.lock') }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-cargo-release- | ||
| ${{ runner.os }}-cargo- | ||
|
|
||
| - name: Build git-ai (release) | ||
| run: cargo build --release | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '22' | ||
|
|
||
| - name: Install agent CLI — ${{ matrix.agent }} (${{ matrix.channel }}) | ||
| run: | | ||
| if [ "${{ matrix.agent }}" = "droid" ]; then | ||
| curl -fsSL https://app.factory.ai/cli | sh | ||
| echo "$HOME/.local/bin" >> "$GITHUB_PATH" | ||
| else | ||
| npm install -g "${{ matrix.npm_pkg }}" | ||
| fi | ||
|
|
||
| - name: Create test repository | ||
| run: | | ||
| mkdir -p /tmp/test-repo | ||
| cd /tmp/test-repo | ||
| git init | ||
| git config user.email "ci@git-ai.test" | ||
| git config user.name "CI Test" | ||
| echo "# Integration Test Repo" > README.md | ||
| git add README.md | ||
| git commit -m "Initial commit" | ||
|
|
||
| - name: Install git-ai hooks in test repo | ||
| run: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| # Symlink git-ai as "git" so the proxy intercepts commits and fires hooks | ||
| ln -sf "$GITHUB_WORKSPACE/target/release/git-ai" "$GITHUB_WORKSPACE/target/release/git" | ||
| cd /tmp/test-repo | ||
| git-ai install | ||
|
|
||
| - name: Run live agent test (with retry) | ||
| uses: nick-fields/retry@v2 | ||
| with: | ||
| timeout_minutes: 20 | ||
| max_attempts: 2 | ||
| command: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| export ${{ matrix.api_key_var }}="${{ secrets[matrix.api_key_var] }}" | ||
| bash "$GITHUB_WORKSPACE/scripts/nightly/test-live-agent.sh" "${{ matrix.agent }}" | ||
| continue-on-error: ${{ matrix.channel == 'latest' }} | ||
| env: | ||
| ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} | ||
| FACTORY_API_KEY: ${{ secrets.FACTORY_API_KEY }} | ||
|
|
||
| - name: Verify attribution pipeline | ||
| run: | | ||
| export PATH="$GITHUB_WORKSPACE/target/release:$PATH" | ||
| bash "$GITHUB_WORKSPACE/scripts/nightly/verify-attribution.sh" "${{ matrix.agent }}" | ||
| continue-on-error: ${{ matrix.channel == 'latest' }} | ||
|
|
||
| - name: Upload test results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: tier2-${{ matrix.agent }}-${{ matrix.channel }} | ||
| path: /tmp/test-results/ | ||
| retention-days: 7 | ||
| if-no-files-found: warn | ||
|
|
||
| # ── Failure Notification ────────────────────────────────────────────────── | ||
| notify-on-failure: | ||
| name: Notify on failure | ||
| needs: [tier1-hook-wiring, tier2-live-integration] | ||
| if: ${{ always() && (needs.tier1-hook-wiring.result == 'failure' || needs.tier2-live-integration.result == 'failure') }} | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Notify Slack | ||
| uses: slackapi/slack-github-action@v1 | ||
| with: | ||
| channel-id: ${{ secrets.SLACK_CHANNEL_ID }} | ||
| payload: | | ||
| { | ||
| "text": ":red_circle: Nightly agent integration tests FAILED", | ||
| "blocks": [ | ||
| { | ||
| "type": "section", | ||
| "text": { | ||
| "type": "mrkdwn", | ||
| "text": "*Nightly Agent CLI Integration* failed on `${{ github.ref_name }}`\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View run>" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| env: | ||
| SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }} | ||
|
|
||
| - name: Create tracking issue | ||
| uses: actions/github-script@v7 | ||
| with: | ||
| script: | | ||
| const date = new Date().toISOString().split('T')[0]; | ||
| await github.rest.issues.create({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| title: `Nightly agent integration failure: ${date}`, | ||
| labels: ['nightly', 'integration', 'triage'], | ||
| body: [ | ||
| '## Nightly Agent CLI Integration Test Failure', | ||
| '', | ||
| `[View workflow run](${process.env.GITHUB_SERVER_URL}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId})`, | ||
| '', | ||
| '### Checklist', | ||
| '- [ ] Identify which agent(s) failed', | ||
| '- [ ] Check if agent CLI released a new version', | ||
| '- [ ] Reproduce locally', | ||
| '- [ ] Determine if git-ai needs a fix or if it is an agent regression', | ||
| ].join('\n'), | ||
| }); | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.