entireio · khaong · Feb 25, 2026 · Feb 24, 2026 · Feb 24, 2026 · Feb 24, 2026
@@ -0,0 +1,93 @@
+---
+name: debug-e2e
+description: Use when investigating E2E test failures from artifacts to diagnose bugs in the Entire CLI, or when pointed at an artifact path for root cause analysis
+---
+
+# Debug Entire CLI via E2E Artifacts
+
+Diagnose Entire CLI bugs using captured artifacts from the E2E test suite. Artifacts are written to `e2e/artifacts/` locally or downloaded from CI via GitHub Actions.
+
+## Inputs
+
+The user provides either:
+- **A test run directory:** `e2e/artifacts/{timestamp}/` — triage all failures
+- **A specific test directory:** `e2e/artifacts/{timestamp}/{TestName}-{agent}/` — debug one test
+
+## Artifact Layout
+
+```
+e2e/artifacts/{timestamp}/
+├── report.nocolor.txt          # Pass/fail/skip summary with error lines
+├── test-events.json            # Raw Go test events (NDJSON)
+├── entire-version.txt          # CLI version under test
+└── {TestName}-{agent}/
+    ├── PASS or FAIL            # Status marker
+    ├── console.log             # Full operation transcript
+    ├── git-log.txt             # git log --decorate --graph --all
+    ├── git-tree.txt            # ls-tree HEAD + checkpoint branch
+    ├── entire-logs/entire.log  # CLI structured JSON logs
+    ├── checkpoint-metadata/    # Checkpoint + session metadata
+    └── repo -> /tmp/...        # Symlink to preserved repo (E2E_KEEP_REPOS=1 only)
+```
+
+## Preserved Repo
+
+When the test run was executed with `E2E_KEEP_REPOS=1`, each test's artifact directory contains a `repo` symlink pointing to the preserved temporary git repository. This is the actual repo the test operated on — you can inspect it directly.
+
+**Navigate via the symlink** (e.g., `{artifact-dir}/repo/`) rather than resolving the `/tmp/...` path. The symlink lives inside the artifact directory so permissions and paths stay consistent.
+
+The preserved repo contains:
+- Full git history with all branches (main, `entire/checkpoints/v1`)
+- The `.entire/` directory with CLI state, config, and raw logs
+- The `.claude/` directory (if Claude Code was the agent)
+- All files the agent created or modified, in their final state
+
+This is the most powerful debugging tool — you can run `git log`, `git diff`, `git show`, inspect `.entire/` internals, and see exactly what the CLI left behind.
+
+## Debugging Workflow
+
+### 1. Triage (if given a run directory)
+
+Read `report.nocolor.txt` to identify failures and their error messages. Each entry shows the test name, agent, duration, and failure output with file:line references.
+
+### 2. Read console.log (most important)
+
+Full transcript of every operation:
+- `> claude -p "..." ...` — agent prompts with stdout/stderr
+- `> git add/commit/...` — git commands
+- `> send: ...` — interactive session inputs
+
+This tells you what happened chronologically.
+
+### 3. Read test source code
+
+Use the file:line from the report to find the test in `e2e/tests/`. Understand what the test expected to happen vs what console.log shows actually happened.
+
+### 4. Diagnose the CLI behavior
+
+Cross-reference console.log (what happened) with the test (what should have happened). Focus on CLI-level issues:
+
+| Symptom | CLI Investigation |
+|---------|-------------------|
+| Checkpoint not created / timeout | Check `entire-logs/entire.log` for hook invocations, phase transitions, errors |
+| Wrong checkpoint content | Check `git-tree.txt` for checkpoint branch files, `checkpoint-metadata/` for session info |
+| Hooks didn't fire | Check `entire.log` for missing hook entries (session-start, user-prompt-submit, stop, post-commit) |
+| Stash/unstash problems | Check `entire.log` for stash-related log lines, `git-log.txt` for commit ordering |
+| Attribution issues | Check `checkpoint-metadata/` for `files_touched`, session metadata for attribution data |
+| Strategy mismatch | Check `entire.log` for `strategy` field, verify auto-commit vs manual-commit behavior |
+
+### 5. Deep dive files
+
+- **entire-logs/entire.log**: Structured JSON logs — hook lifecycle, session phases (`active` → `idle` → `ended`), warnings, errors. Key fields: `component`, `hook`, `strategy`, `session_id`.
+- **git-log.txt**: Commit graph showing main branch, `entire/checkpoints/v1`, checkpoint initialization.
+- **git-tree.txt**: Files at HEAD vs checkpoint branch (separated by `--- entire/checkpoints/v1 ---`).
+- **checkpoint-metadata/**: `metadata.json` has `checkpoint_id`, `strategy`, `files_touched`, `token_usage`, and `sessions` array. Session subdirs have per-session details.
+
+### 6. Report findings
+
+Identify whether the issue is in:
+- **CLI hooks** (prepare-commit-msg, commit-msg, post-commit)
+- **Session management** (phase transitions, session tracking)
+- **Checkpoint creation** (branch management, metadata writing)
+- **Attribution** (file tracking, prompt correlation)
+- **Strategy logic** (auto-commit vs manual-commit behavior)
@@ -41,11 +41,18 @@ jobs:
           esac
           echo "$HOME/.local/bin" >> $GITHUB_PATH
 
+      - name: Bootstrap agent
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+        run: go run ./e2e/bootstrap
+
       - name: Run isolated test
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
           GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
           E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts
+          E2E_ENTIRE_BIN: /usr/local/bin/entire
         run: |
           mkdir -p "$E2E_ARTIFACT_DIR"
           mise run test:e2e:${{ inputs.agent }} "${{ inputs.test }}"

@@ -7,7 +7,6 @@ on:
       - main
 
 # Concurrency: only one E2E job runs at a time
-# Cancel previous runs when new one starts
 concurrency:
   group: e2e-tests
   cancel-in-progress: true
@@ -19,7 +18,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        agent: [claude, opencode]
+        agent: [claude, opencode, gemini]
 
     steps:
       - name: Checkout repository
@@ -28,16 +27,38 @@ jobs:
       - name: Setup mise
         uses: jdx/mise-action@v3
 
+      - name: Install system dependencies
+        run: sudo apt-get update && sudo apt-get install -y tmux
+
+      - name: Build entire CLI
+        run: go build -o /usr/local/bin/entire ./cmd/entire
+
       - name: Install agent CLI
         run: |
           case "${{ matrix.agent }}" in
             claude)   curl -fsSL https://claude.ai/install.sh | bash ;;
             opencode) curl -fsSL https://opencode.ai/install | bash ;;
+            gemini)   npm install -g @google/gemini-cli ;;
           esac
           echo "$HOME/.local/bin" >> $GITHUB_PATH
 
+      - name: Bootstrap agent
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+        run: go run ./e2e/bootstrap
+
       - name: Run E2E Tests
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-        run: |
-          mise run test:e2e:${{ matrix.agent }}
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+          E2E_ENTIRE_BIN: /usr/local/bin/entire
+        run: mise run test:e2e:${{ matrix.agent }}
+
+      - name: Upload artifacts
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: e2e-artifacts-${{ matrix.agent }}
+          path: e2e/artifacts/
+          retention-days: 7
@@ -36,12 +36,16 @@ go.work.sum
 
 # Binary output (only in root)
 /entire
+/testreport
 
 # Build output directory
 /dist/
 completions
 
 
+# E2E test artifacts
+e2e/artifacts/
+
 # worktrees
 .worktrees/
 test-gemini.txt

@@ -108,6 +108,7 @@ linters:
         - github.com/go-git/go-git/v6/plumbing.EncodedObject
         - github.com/go-git/go-git/v6/storage.Storer
         - github.com/go-git/go-git/v6/plumbing/storer.EncodedObjectIter
+        - github.com/entireio/cli/e2e/agents.Session
         - github.com/go-git/go-billy/v6.Filesystem
     nolintlint:
       require-explanation: true
@@ -128,6 +129,16 @@ linters:
           - gosec
           - wrapcheck
           - forbidigo
+      - path: ^e2e/
+        linters:
+          - errcheck
+          - gochecknoinits
+          - goconst
+          - gosec
+          - noctx
+          - revive
+          - usetesting
+          - wrapcheck
       - path: ^test/workloads/
         linters:
           - wrapcheck

@@ -18,7 +18,7 @@ This repo contains the CLI for Entire.
 - `entire/cli/checkpoint`: checkpoint storage abstractions (temporary and committed)
 - `entire/cli/session`: session state management
 - `entire/cli/integration_test`: integration tests (simulated hooks)
-- `entire/cli/e2e_test`: E2E tests with real agent calls (see E2E Tests section)
+- `e2e/`: E2E tests with real agent calls (see [e2e/README.md](e2e/README.md))
 
 ## Tech Stack
 
@@ -50,29 +50,22 @@ Integration tests use the `//go:build integration` build tag and are located in
 
 ### Running E2E Tests (Only When Explicitly Requested)
 
-**IMPORTANT: Do NOT run E2E tests proactively.** E2E tests make real API calls through AI agents, which consume tokens and cost money. Only run them when the user explicitly asks for E2E testing.
+**IMPORTANT: Do NOT run E2E tests proactively.** E2E tests make real API calls to agents, which consume tokens and cost money. Only run them when the user explicitly asks for E2E testing.
 
 ```bash
-# Requires the agent to be installed and authenticated
-E2E_AGENT=claude-code go test -tags=e2e ./cmd/entire/cli/e2e_test/...
-
-# Run a specific test
-E2E_AGENT=claude-code go test -tags=e2e -run TestE2E_BasicWorkflow ./cmd/entire/cli/e2e_test/...
+mise run test:e2e TestFoo           # All agents, filtered
+mise run test:e2e:claude TestFoo    # Claude Code only
+mise run test:e2e:gemini TestFoo    # Gemini CLI only
+mise run test:e2e:opencode TestFoo  # OpenCode only
 ```
 
 E2E tests:
 
 - Use the `//go:build e2e` build tag
-- Located in `cmd/entire/cli/e2e_test/`
-- Test real agent interactions (Claude Code, Gemini CLI, or OpenCode creating files, committing, etc.)
-- Validate checkpoint scenarios documented in `docs/architecture/checkpoint-scenarios.md`
-- Support multiple agents via `E2E_AGENT` env var (`claude-code`, `gemini`, `opencode`)
-
-**Environment variables:**
-
-- `E2E_AGENT` - Agent to test with (default: `claude-code`)
-- `E2E_CLAUDE_MODEL` - Claude model to use (default: `haiku` for cost efficiency)
-- `E2E_TIMEOUT` - Timeout per prompt (default: `2m`)
+- Located in `e2e/tests/`
+- Test real agent interactions (creating files, committing, checkpoints, rewind, resume)
+- Support multiple agents: Claude Code, Gemini CLI, OpenCode
+- See [`e2e/README.md`](e2e/README.md) for full documentation (structure, debugging, adding agents)
 
 ### Test Parallelization