garrytan · garrytan · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/.gitignore b/.gitignore
@@ -11,3 +11,4 @@ bun.lock
 .env.local
 .env.*
 !.env.example
+.gstack-sync.json
diff --git a/.gstack-sync.json.example b/.gstack-sync.json.example
@@ -0,0 +1,5 @@
+{
+  "supabase_url": "https://YOUR_PROJECT.supabase.co",
+  "supabase_anon_key": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.YOUR_ANON_KEY_HERE",
+  "team_slug": "your-team-name"
+}
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,42 @@
 # Changelog
 
+## 0.3.11 — 2026-03-15
+
+### Added
+- **Contributor mode** — set `gstack_contributor: true` in `~/.gstack/config.yaml` and Claude Code automatically files field reports to `~/.gstack/contributor-logs/` when gstack itself misbehaves. Reports include what you were doing, what went wrong, annoyance level (1-5), repro steps, and raw output. Opens the report for review. Max 3 per session, deduped by slug.
+- **Concurrent session tracking** — gstack detects how many sessions are active in a 2-hour window. When 3+ sessions are running simultaneously, all skills enter "ELI16 mode": every AskUserQuestion re-grounds the user on project, branch, current task, and the specific question — because context-switching is real.
+- **Universal RECOMMENDATION format** — every AskUserQuestion across all skills now follows: context → question → `RECOMMENDATION: Choose X because ___` → options. Consistent everywhere. Plan-review skills reference this baseline and add their own rules on top.
+- **Enum & Value Completeness** review category — new CRITICAL check in `/review` that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the class of bugs where a value is added but not handled in all case/switch chains, allowlists, or frontend-backend contracts.
+
+### Changed
+- Renamed `{{UPDATE_CHECK}}` placeholder to `{{PREAMBLE}}` across all 10 skill templates. The preamble now includes update check, session tracking, contributor mode, and AskUserQuestion format in a single startup block.
+- DRY'd plan-ceo-review and plan-eng-review AskUserQuestion formatting rules to reference the preamble baseline instead of duplicating instructions.
+- Rewrote CONTRIBUTING.md with contributor workflow, cross-project testing guide, and Conductor workspace docs.
+- Added vendored symlink awareness section to CLAUDE.md.
+
+## 0.3.10 — 2026-03-15
+
+### Added
+- **Team sync via Supabase (optional)** — shared data store for eval results, retro snapshots, QA reports, ship logs, and Greptile triage across team members. All sync operations are non-fatal and non-blocking — skills never wait on network. Offline queue with automatic retry (up to 5 attempts). Zero impact when not configured: without `.gstack-sync.json`, everything works locally as before. See `docs/designs/TEAM_COORDINATION_STORE.md` for architecture and setup.
+- **Supabase migration SQL** — 4 migration files in `supabase/migrations/` for teams, eval_runs, data tables (retros, QA, ships, Greptile), and eval costs. Row-level security policies ensure team members can only access their own team's data.
+- **Sync config + auth** — `.gstack-sync.json` for project-level config (Supabase URL, anon key, team slug). `~/.gstack/auth.json` for user-level tokens (keyed by Supabase URL for multi-team support). `GSTACK_SUPABASE_ACCESS_TOKEN` env var for CI/automation. Token refresh built in.
+- **`gstack sync` CLI** — `status`, `push`, `pull`, `drain`, `login`, `logout` subcommands for managing team sync.
+- **Universal eval format** — `StandardEvalResult` schema with validation, normalization, and bidirectional legacy conversion. Any language can produce JSON matching this format and push via `gstack eval push`.
+- **Unified eval CLI** — `gstack eval list|compare|summary|trend|push|cost|cache|watch` consolidating all eval tools into one entry point.
+- **Per-model cost tracking** — eval results now include `costs[]` with exact per-model token usage (input, output, cache read, cache creation) and API-reported cost. Extracted from `resultLine.modelUsage` in the `claude -p` NDJSON stream. `computeCosts()` prefers exact `cost_usd` over MODEL_PRICING estimates (~4x more accurate with prompt caching).
+- **LLM judge caching** — SHA-based caching for LLM-as-judge eval calls via `eval-cache.ts`. Cache keyed by `model:prompt`, so unchanged SKILL.md content skips API calls entirely. ~$0.18/run savings. Set `EVAL_CACHE=0` to force re-run.
+- **Dynamic model selection** — `EVAL_JUDGE_TIER` env var controls which Claude model runs judge evals (haiku/sonnet/opus, default: sonnet). `EVAL_TIER` pins the E2E test model via `--model` flag to `claude -p`.
+- **`bun run eval:trend`** — per-test pass rate tracking over last N runs. Classifies tests as stable-pass, stable-fail, flaky, improving, or degrading. Sparkline table with `--limit`, `--tier`, `--test` filters.
+- **Shared utilities** — `lib/util.ts` extracted with `atomicWriteJSON`, `readJSON`, `getGitInfo`, `getRemoteSlug`, `listEvalFiles`, `loadEvalResults`, `formatTimestamp`, and path constants.
+- 52+ new tests across eval cache, cost, format, tier, trend, sync config, sync client, and LLM judge integration.
+
+### Changed
+- `callJudge()` and `judge()` now return `{ result, meta }` with `JudgeMeta` (model, tokens, cached flag). `outcomeJudge()` retains simple return type for E2E callers.
+- `EvalCollector.finalize()` aggregates per-test `costs[]` into result-level cost breakdown and attempts team sync (non-blocking).
+- `cli-eval.ts` main block guarded with `import.meta.main` to prevent execution on import.
+- `eval:summary` now hints to run `eval:trend` when flaky tests are detected.
+- All 8 LLM eval test sites updated from hard-coded `cost_usd: 0.02` to real API-reported costs.
+
 ## 0.3.9 — 2026-03-15
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -15,6 +15,7 @@ bun run dev:skill    # watch mode: auto-regen + validate on change
 bun run eval:list    # list all eval runs from ~/.gstack-dev/evals/
 bun run eval:compare # compare two eval runs (auto-picks most recent)
 bun run eval:summary # aggregate stats across all eval runs
+bun run eval:trend   # per-test pass rate trends (flaky detection)
 ```
 
 `test:evals` requires `ANTHROPIC_API_KEY`. E2E tests stream progress in real-time
@@ -71,6 +72,24 @@ When you need to interact with a browser (QA, dogfooding, cookie setup), use the
 `mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this
 project uses.
 
+## Vendored symlink awareness
+
+When developing gstack, `.claude/skills/gstack` may be a symlink back to this
+working directory (gitignored). This means skill changes are **live immediately** —
+great for rapid iteration, risky during big refactors where half-written skills
+could break other Claude Code sessions using gstack concurrently.
+
+**Check once per session:** Run `ls -la .claude/skills/gstack` to see if it's a
+symlink or a real copy. If it's a symlink to your working directory, be aware that:
+- Template changes + `bun run gen:skill-docs` immediately affect all gstack invocations
+- Breaking changes to SKILL.md.tmpl files can break concurrent gstack sessions
+- During large refactors, remove the symlink (`rm .claude/skills/gstack`) so the
+  global install at `~/.claude/skills/gstack/` is used instead
+
+**For plan reviews:** When reviewing plans that modify skill templates or the
+gen-skill-docs pipeline, consider whether the changes should be tested in isolation
+before going live (especially if the user is actively using gstack in other windows).
+
 ## Deploying to the active skill
 
 The active skill lives at `~/.claude/skills/gstack/`. After making changes:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -20,9 +20,44 @@ Now edit any `SKILL.md`, invoke it in Claude Code (e.g. `/review`), and see your
 bin/dev-teardown               # deactivate — back to your global install
 ```
 
-## How dev mode works
+## Contributor mode
 
-`bin/dev-setup` creates a `.claude/skills/` directory inside the repo (gitignored) and fills it with symlinks pointing back to your working tree. Claude Code sees the local `skills/` first, so your edits win over the global install.
+Contributor mode is for people who want to fix gstack when it annoys them. Enable it
+and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you
+work — what you were doing, what went wrong, repro steps, raw output.
+
+```bash
+~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
+```
+
+The logs are for **you**. When something bugs you enough to fix, the report is
+already written. Fork gstack, symlink your fork into the project where you hit
+the issue, fix it, and open a PR.
+
+### The contributor workflow
+
+1. **Hit friction while using gstack** — contributor mode logs it automatically
+2. **Check your logs:** `ls ~/.gstack/contributor-logs/`
+3. **Fork and clone gstack** (if you haven't already)
+4. **Symlink your fork into the project where you hit the bug:**
+   ```bash
+   # In your core project (the one where gstack annoyed you)
+   ln -sfn /path/to/your/gstack-fork .claude/skills/gstack
+   cd .claude/skills/gstack && bun install && bun run build
+   ```
+5. **Fix the issue** — your changes are live immediately in this project
+6. **Test by actually using gstack** — do the thing that annoyed you, verify it's fixed
+7. **Open a PR from your fork**
+
+This is the best way to contribute: fix gstack while doing your real work, in the
+project where you actually felt the pain.
+
+## Working on gstack inside the gstack repo
+
+When you're editing gstack skills and want to test them by actually using gstack
+in the same repo, `bin/dev-setup` wires this up. It creates `.claude/skills/`
+symlinks (gitignored) pointing back to your working tree, so Claude Code uses
+your local edits instead of the global install.
 
 ```
 gstack/                          <- your working tree
@@ -134,6 +169,8 @@ When E2E tests run, they produce machine-readable artifacts in `~/.gstack-dev/`:
 bun run eval:list            # list all eval runs
 bun run eval:compare         # compare two runs (auto-picks most recent)
 bun run eval:summary         # aggregate stats across all runs
+bun run eval:trend           # per-test pass rate over last N runs (flaky detection)
+bun run eval:cache stats     # check LLM judge cache hit rate
 ```
 
 Artifacts are never cleaned up — they accumulate in `~/.gstack-dev/` for post-mortem debugging and trend analysis.
@@ -152,7 +189,8 @@ Each dimension is scored 1-5. Threshold: every dimension must score **≥ 4**. T
 # Needs ANTHROPIC_API_KEY in .env — included in bun run test:evals
 ```
 
-- Uses `claude-sonnet-4-6` for scoring stability
+- Model defaults to `claude-sonnet-4-6`; override with `EVAL_JUDGE_TIER=haiku|opus`
+- Results are SHA-cached — unchanged SKILL.md content skips API calls ($0 on repeat runs). Set `EVAL_CACHE=0` to force re-run.
 - Tests live in `test/skill-llm-eval.test.ts`
 - Calls the Anthropic API directly (not `claude -p`), so it works from anywhere including inside Claude Code
 
@@ -205,69 +243,42 @@ When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It d
 - **`.env` propagates across worktrees.** Set it once in the main repo, all Conductor workspaces get it.
 - **`.claude/skills/` is gitignored.** The symlinks never get committed.
 
-## Testing a branch in another repo
-
-When you're developing gstack in one workspace and want to test your branch in a
-different project (e.g. testing browse changes against your real app), there are
-two cases depending on how gstack is installed in that project.
+## Testing your changes in a real project
 
-### Global install only (no `.claude/skills/gstack/` in the project)
-
-Point your global install at the branch:
+**This is the recommended way to develop gstack.** Symlink your gstack checkout
+into the project where you actually use it, so your changes are live while you
+do real work:
 
 ```bash
-cd ~/.claude/skills/gstack
-git fetch origin
-git checkout origin/<branch>        # e.g. origin/v0.3.2
-bun install                         # in case deps changed
-bun run build                       # rebuild the binary
+# In your core project
+ln -sfn /path/to/your/gstack-checkout .claude/skills/gstack
+cd .claude/skills/gstack && bun install && bun run build
 ```
 
-Now open Claude Code in the other project — it picks up skills from
-`~/.claude/skills/` automatically. To go back to main when you're done:
+Now every gstack skill invocation in this project uses your working tree. Edit a
+template, run `bun run gen:skill-docs`, and the next `/review` or `/qa` call picks
+it up immediately.
+
+**To go back to the stable global install**, just remove the symlink:
 
 ```bash
-cd ~/.claude/skills/gstack
-git checkout main && git pull
-bun run build
+rm .claude/skills/gstack
 ```
 
-### Vendored project copy (`.claude/skills/gstack/` checked into the project)
-
-Some projects vendor gstack by copying it into the repo (no `.git` inside the
-copy). Project-local skills take priority over global, so you need to update
-the vendored copy too. This is a three-step process:
+Claude Code falls back to `~/.claude/skills/gstack/` automatically.
 
-1. **Update your global install to the branch** (so you have the source):
-   ```bash
-   cd ~/.claude/skills/gstack
-   git fetch origin
-   git checkout origin/<branch>      # e.g. origin/v0.3.2
-   bun install && bun run build
-   ```
-
-2. **Replace the vendored copy** in the other project:
-   ```bash
-   cd /path/to/other-project
+### Alternative: point your global install at a branch
 
-   # Remove old skill symlinks and vendored copy
-   for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do
-     rm -f .claude/skills/$s
-   done
-   rm -rf .claude/skills/gstack
+If you don't want per-project symlinks, you can switch the global install:
 
-   # Copy from global install (strips .git so it stays vendored)
-   cp -Rf ~/.claude/skills/gstack .claude/skills/gstack
-   rm -rf .claude/skills/gstack/.git
-
-   # Rebuild binary and re-create skill symlinks
-   cd .claude/skills/gstack && ./setup
-   ```
-
-3. **Test your changes** — open Claude Code in that project and use the skills.
+```bash
+cd ~/.claude/skills/gstack
+git fetch origin
+git checkout origin/<branch>
+bun install && bun run build
+```
 
-To revert to main later, repeat steps 1-2 with `git checkout main && git pull`
-instead of `git checkout origin/<branch>`.
+This affects all projects. To revert: `git checkout main && git pull && bun run build`.
 
 ## Shipping your changes
 

diff --git a/README.md b/README.md
@@ -629,6 +629,12 @@ bun run eval:watch            # live dashboard during E2E runs
 
 E2E tests stream real-time progress, write machine-readable diagnostics, and persist partial results that survive kills. See CONTRIBUTING.md for the full eval infrastructure.
 
+### Team sync (optional)
+
+For teams, gstack can sync eval results, retro snapshots, QA reports, and ship logs to a shared Supabase store. Without this, everything works locally as before — sync is purely additive.
+
+To set up: copy `.gstack-sync.json.example` to `.gstack-sync.json`, create a Supabase project, run the migrations in `supabase/migrations/`, and fill in your credentials. See `docs/designs/TEAM_COORDINATION_STORE.md` for the full guide.
+
 ## License
 
 MIT
diff --git a/SKILL.md b/SKILL.md
@@ -16,15 +16,63 @@ allowed-tools:
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
 
-## Update Check (run first)
+## Preamble (run first)
 
 ```bash
 _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
 [ -n "$_UPD" ] && echo "$_UPD" || true
+mkdir -p ~/.gstack/sessions
+touch ~/.gstack/sessions/"$PPID"
+_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
+find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
+_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
 ```
 
 If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
 
+## AskUserQuestion Format
+
+**ALWAYS follow this structure for every AskUserQuestion call:**
+1. Context: project name, current branch, what we're working on (1-2 sentences)
+2. The specific question or decision point
+3. `RECOMMENDATION: Choose [X] because [one-line reason]`
+4. Lettered options: `A) ... B) ... C) ...`
+
+If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes.
+
+Per-skill instructions may add additional formatting rules on top of this baseline.
+
+## Contributor Mode
+
+If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
+
+**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff.
+**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
+
+**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
+
+```
+# {Title}
+
+Hey gstack team — ran into this while using /{skill-name}:
+
+**What I was trying to do:** {what the user/agent was attempting}
+**What happened instead:** {what actually happened}
+**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
+
+## Steps to reproduce
+1. {step}
+
+## Raw output
+(wrap any error messages or unexpected output in a markdown code block)
+
+**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
+```
+
+Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`
+
+Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
+
 # gstack browse: QA Testing & Dogfooding
 
 Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.

diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl
@@ -14,7 +14,7 @@ allowed-tools:
 
 ---
 
-{{UPDATE_CHECK}}
+{{PREAMBLE}}
 
 # gstack browse: QA Testing & Dogfooding
-Original file line number
+Diff line change
@@ Expand Up / @@ -14,7 +14,7 @@ allowed-tools: @@
     ---
-    {{UPDATE_CHECK}}
+    {{PREAMBLE}}
     # gstack browse: QA Testing & Dogfooding
@@ Expand Down @@