Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
dafc1b1
feat: add consolidated E2E test suite
khaong Feb 24, 2026
bef9e36
fix: gitignore e2e artifacts, remove accidentally committed ones
khaong Feb 24, 2026
00a9940
refactor: remove old shadow-hook E2E test suite
khaong Feb 24, 2026
b4f3c4a
chore: update mise E2E tasks to use new e2e/ directory
khaong Feb 24, 2026
427c34d
ci: add tmux and entire binary build to E2E workflow
khaong Feb 24, 2026
e123b06
fix: configure git identity in test repos for CI runners
khaong Feb 24, 2026
553642e
ci: pre-seed Claude Code API key auth for interactive tests
khaong Feb 24, 2026
cf1bc68
fix: set CLAUDE_CONFIG_DIR for interactive sessions on CI
khaong Feb 24, 2026
e77e68a
fix: harden flaky tests and upload CI artifacts
khaong Feb 24, 2026
9a4e3fa
fix: resolve lint failures in e2e/ test infrastructure
khaong Feb 24, 2026
20ea180
ci: warm up opencode to avoid first-run initialization race
khaong Feb 24, 2026
4b5f569
fix: remove noisy per-line JSONL validation from transcript check
khaong Feb 24, 2026
e84122f
fix: retry opencode interactive session on empty-pane startup failure
khaong Feb 24, 2026
a8a32b3
ci: use full opencode run for warmup to prevent init races
khaong Feb 24, 2026
b0b105d
fix: close tmux session on StartSession error in Claude and Gemini
khaong Feb 24, 2026
55623cf
fix: harden multi-commit attribution prompt to prevent amend
khaong Feb 24, 2026
56555b3
fix: poll for new commits in AssertNewCommits to handle async agents
khaong Feb 24, 2026
bbfed95
ci: add Gemini CLI to E2E test matrix
khaong Feb 24, 2026
c6437ca
fix: upgrade gemini E2E model to gemini-3-flash-preview
khaong Feb 24, 2026
3245645
fix: pass --model flag in gemini interactive sessions
khaong Feb 24, 2026
5e66337
fix: increase gemini StartSession timeout to 30s for CI
khaong Feb 24, 2026
fac3272
ci: queue E2E runs instead of cancelling in-progress
khaong Feb 24, 2026
0ce1d1d
fix: unset CI env var for gemini interactive sessions
khaong Feb 24, 2026
c727723
fix: also unset GITHUB_ACTIONS for gemini interactive sessions
khaong Feb 24, 2026
4d64d21
fix: dismiss gemini auth and trust dialogs in StartSession
khaong Feb 24, 2026
372eed0
ci: pre-configure Gemini CLI auth to skip onboarding dialog
khaong Feb 24, 2026
8475b98
refactor: extract agent bootstrap from CI YAML into Go code
khaong Feb 24, 2026
e67b297
test: add integration test for resume in relocated repo
khaong Feb 24, 2026
c05bc55
docs: add e2e/README.md and update CLAUDE.md references
khaong Feb 24, 2026
c6d78a1
docs: add debug-e2e skill and reference in e2e README
khaong Feb 24, 2026
c706fe5
feat: add transient error retry for E2E agent prompts
khaong Feb 24, 2026
f80dc8c
fix: don't override CLAUDE_CONFIG_DIR locally in E2E RunPrompt
dvydra Feb 24, 2026
22a11a5
chore: remove e2e auth debug logging
dvydra Feb 25, 2026
c667f11
feat: add preflight dependency checks in TestMain
khaong Feb 25, 2026
6828755
Set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC.
toothbrush Feb 25, 2026
814e4cc
Point GIT_CONFIG_GLOBAL to /dev/null.
toothbrush Feb 25, 2026
5cef603
fix: poll for file existence in TestInteractiveMultiStep
khaong Feb 25, 2026
20fdbee
feat: add test report generation for e2e runs
khaong Feb 25, 2026
dd77333
fix: detect opencode exit-0 failures via stderr inspection
khaong Feb 25, 2026
d82064f
chore: print entire version in e2e task output
khaong Feb 25, 2026
38f2694
feat: capture tmux pane content for interactive E2E test debugging
khaong Feb 25, 2026
e4736c2
fix: capture pane before closing tmux session
khaong Feb 25, 2026
78c9a6f
fix: adapt TestResume_RelocatedRepo for manual-commit strategy
khaong Feb 25, 2026
cec2562
chore: remove auto-commit E2E test
khaong Feb 25, 2026
94cc998
chore: remove --strategy flag from e2e Enable calls
khaong Feb 25, 2026
e40db6e
feat: assert no lingering shadow branches after commits in all E2E tests
khaong Feb 25, 2026
80a19b4
feat: build entire binary from source by default in E2E tests
khaong Feb 25, 2026
c920e46
fix: harden E2E test infrastructure
khaong Feb 25, 2026
c51f1d0
fix: address bot review feedback on E2E agents
khaong Feb 25, 2026
6646753
fix: make AssertCheckpointInLastN resilient to multi-commit agent turns
khaong Feb 25, 2026
c555bc9
fix(e2e): remove incorrect shadow branch assertion from trailer remov…
khaong Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .claude/skills/debug-e2e/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
name: debug-e2e
description: Use when investigating E2E test failures from artifacts to diagnose bugs in the Entire CLI, or when pointed at an artifact path for root cause analysis
---

# Debug Entire CLI via E2E Artifacts

Diagnose Entire CLI bugs using captured artifacts from the E2E test suite. Artifacts are written to `e2e/artifacts/` locally or downloaded from CI via GitHub Actions.

## Inputs

The user provides either:
- **A test run directory:** `e2e/artifacts/{timestamp}/` — triage all failures
- **A specific test directory:** `e2e/artifacts/{timestamp}/{TestName}-{agent}/` — debug one test

## Artifact Layout

```
e2e/artifacts/{timestamp}/
├── report.nocolor.txt # Pass/fail/skip summary with error lines
├── test-events.json # Raw Go test events (NDJSON)
├── entire-version.txt # CLI version under test
└── {TestName}-{agent}/
├── PASS or FAIL # Status marker
├── console.log # Full operation transcript
├── git-log.txt # git log --decorate --graph --all
├── git-tree.txt # ls-tree HEAD + checkpoint branch
├── entire-logs/entire.log # CLI structured JSON logs
├── checkpoint-metadata/ # Checkpoint + session metadata
└── repo -> /tmp/... # Symlink to preserved repo (E2E_KEEP_REPOS=1 only)
```

## Preserved Repo

When the test run was executed with `E2E_KEEP_REPOS=1`, each test's artifact directory contains a `repo` symlink pointing to the preserved temporary git repository. This is the actual repo the test operated on — you can inspect it directly.

**Navigate via the symlink** (e.g., `{artifact-dir}/repo/`) rather than resolving the `/tmp/...` path. The symlink lives inside the artifact directory so permissions and paths stay consistent.

The preserved repo contains:
- Full git history with all branches (main, `entire/checkpoints/v1`)
- The `.entire/` directory with CLI state, config, and raw logs
- The `.claude/` directory (if Claude Code was the agent)
- All files the agent created or modified, in their final state

This is the most powerful debugging tool — you can run `git log`, `git diff`, `git show`, inspect `.entire/` internals, and see exactly what the CLI left behind.

## Debugging Workflow

### 1. Triage (if given a run directory)

Read `report.nocolor.txt` to identify failures and their error messages. Each entry shows the test name, agent, duration, and failure output with file:line references.

### 2. Read console.log (most important)

Full transcript of every operation:
- `> claude -p "..." ...` — agent prompts with stdout/stderr
- `> git add/commit/...` — git commands
- `> send: ...` — interactive session inputs

This tells you what happened chronologically.

### 3. Read test source code

Use the file:line from the report to find the test in `e2e/tests/`. Understand what the test expected to happen vs what console.log shows actually happened.

### 4. Diagnose the CLI behavior

Cross-reference console.log (what happened) with the test (what should have happened). Focus on CLI-level issues:

| Symptom | CLI Investigation |
|---------|-------------------|
| Checkpoint not created / timeout | Check `entire-logs/entire.log` for hook invocations, phase transitions, errors |
| Wrong checkpoint content | Check `git-tree.txt` for checkpoint branch files, `checkpoint-metadata/` for session info |
| Hooks didn't fire | Check `entire.log` for missing hook entries (session-start, user-prompt-submit, stop, post-commit) |
| Stash/unstash problems | Check `entire.log` for stash-related log lines, `git-log.txt` for commit ordering |
| Attribution issues | Check `checkpoint-metadata/` for `files_touched`, session metadata for attribution data |
| Strategy mismatch | Check `entire.log` for `strategy` field, verify auto-commit vs manual-commit behavior |

### 5. Deep dive files

- **entire-logs/entire.log**: Structured JSON logs — hook lifecycle, session phases (`active` → `idle` → `ended`), warnings, errors. Key fields: `component`, `hook`, `strategy`, `session_id`.
- **git-log.txt**: Commit graph showing main branch, `entire/checkpoints/v1`, checkpoint initialization.
- **git-tree.txt**: Files at HEAD vs checkpoint branch (separated by `--- entire/checkpoints/v1 ---`).
- **checkpoint-metadata/**: `metadata.json` has `checkpoint_id`, `strategy`, `files_touched`, `token_usage`, and `sessions` array. Session subdirs have per-session details.

### 6. Report findings

Identify whether the issue is in:
- **CLI hooks** (prepare-commit-msg, commit-msg, post-commit)
- **Session management** (phase transitions, session tracking)
- **Checkpoint creation** (branch management, metadata writing)
- **Attribution** (file tracking, prompt correlation)
- **Strategy logic** (auto-commit vs manual-commit behavior)
7 changes: 7 additions & 0 deletions .github/workflows/e2e-isolated.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,18 @@ jobs:
esac
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Bootstrap agent
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: go run ./e2e/bootstrap

- name: Run isolated test
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts
E2E_ENTIRE_BIN: /usr/local/bin/entire
run: |
mkdir -p "$E2E_ARTIFACT_DIR"
mise run test:e2e:${{ inputs.agent }} "${{ inputs.test }}"
Expand Down
29 changes: 25 additions & 4 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ on:
- main

# Concurrency: only one E2E job runs at a time
# Cancel previous runs when new one starts
concurrency:
group: e2e-tests
cancel-in-progress: true
Expand All @@ -19,7 +18,7 @@ jobs:
strategy:
fail-fast: false
matrix:
agent: [claude, opencode]
agent: [claude, opencode, gemini]

steps:
- name: Checkout repository
Expand All @@ -28,16 +27,38 @@ jobs:
- name: Setup mise
uses: jdx/mise-action@v3

- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y tmux

- name: Build entire CLI
run: go build -o /usr/local/bin/entire ./cmd/entire

- name: Install agent CLI
run: |
case "${{ matrix.agent }}" in
claude) curl -fsSL https://claude.ai/install.sh | bash ;;
opencode) curl -fsSL https://opencode.ai/install | bash ;;
gemini) npm install -g @google/gemini-cli ;;
esac
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Bootstrap agent
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: go run ./e2e/bootstrap

- name: Run E2E Tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
mise run test:e2e:${{ matrix.agent }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
E2E_ENTIRE_BIN: /usr/local/bin/entire
run: mise run test:e2e:${{ matrix.agent }}

- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: e2e-artifacts-${{ matrix.agent }}
path: e2e/artifacts/
retention-days: 7
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,16 @@ go.work.sum

# Binary output (only in root)
/entire
/testreport

# Build output directory
/dist/
completions


# E2E test artifacts
e2e/artifacts/

# worktrees
.worktrees/
test-gemini.txt
Expand Down
11 changes: 11 additions & 0 deletions .golangci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ linters:
- github.com/go-git/go-git/v6/plumbing.EncodedObject
- github.com/go-git/go-git/v6/storage.Storer
- github.com/go-git/go-git/v6/plumbing/storer.EncodedObjectIter
- github.com/entireio/cli/e2e/agents.Session
- github.com/go-git/go-billy/v6.Filesystem
nolintlint:
require-explanation: true
Expand All @@ -128,6 +129,16 @@ linters:
- gosec
- wrapcheck
- forbidigo
- path: ^e2e/
linters:
- errcheck
- gochecknoinits
- goconst
- gosec
- noctx
- revive
- usetesting
- wrapcheck
- path: ^test/workloads/
linters:
- wrapcheck
Expand Down
27 changes: 10 additions & 17 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This repo contains the CLI for Entire.
- `entire/cli/checkpoint`: checkpoint storage abstractions (temporary and committed)
- `entire/cli/session`: session state management
- `entire/cli/integration_test`: integration tests (simulated hooks)
- `entire/cli/e2e_test`: E2E tests with real agent calls (see E2E Tests section)
- `e2e/`: E2E tests with real agent calls (see [e2e/README.md](e2e/README.md))

## Tech Stack

Expand Down Expand Up @@ -50,29 +50,22 @@ Integration tests use the `//go:build integration` build tag and are located in

### Running E2E Tests (Only When Explicitly Requested)

**IMPORTANT: Do NOT run E2E tests proactively.** E2E tests make real API calls through AI agents, which consume tokens and cost money. Only run them when the user explicitly asks for E2E testing.
**IMPORTANT: Do NOT run E2E tests proactively.** E2E tests make real API calls to agents, which consume tokens and cost money. Only run them when the user explicitly asks for E2E testing.

```bash
# Requires the agent to be installed and authenticated
E2E_AGENT=claude-code go test -tags=e2e ./cmd/entire/cli/e2e_test/...

# Run a specific test
E2E_AGENT=claude-code go test -tags=e2e -run TestE2E_BasicWorkflow ./cmd/entire/cli/e2e_test/...
mise run test:e2e TestFoo # All agents, filtered
mise run test:e2e:claude TestFoo # Claude Code only
mise run test:e2e:gemini TestFoo # Gemini CLI only
mise run test:e2e:opencode TestFoo # OpenCode only
```

E2E tests:

- Use the `//go:build e2e` build tag
- Located in `cmd/entire/cli/e2e_test/`
- Test real agent interactions (Claude Code, Gemini CLI, or OpenCode creating files, committing, etc.)
- Validate checkpoint scenarios documented in `docs/architecture/checkpoint-scenarios.md`
- Support multiple agents via `E2E_AGENT` env var (`claude-code`, `gemini`, `opencode`)

**Environment variables:**

- `E2E_AGENT` - Agent to test with (default: `claude-code`)
- `E2E_CLAUDE_MODEL` - Claude model to use (default: `haiku` for cost efficiency)
- `E2E_TIMEOUT` - Timeout per prompt (default: `2m`)
- Located in `e2e/tests/`
- Test real agent interactions (creating files, committing, checkpoints, rewind, resume)
- Support multiple agents: Claude Code, Gemini CLI, OpenCode
- See [`e2e/README.md`](e2e/README.md) for full documentation (structure, debugging, adding agents)

### Test Parallelization

Expand Down
Loading