From 91df5026c9002cf1839eba017865ca0093ab094e Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sun, 15 Mar 2026 17:29:34 -0500 Subject: [PATCH 1/8] feat: contributor mode, session awareness, universal RECOMMENDATION format MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates - Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions) - ELI16 mode when 3+ concurrent sessions detected (re-ground user on context) - Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/ - Universal AskUserQuestion format: context → question → RECOMMENDATION → options - Update plan-ceo-review and plan-eng-review to reference preamble baseline - Add vendored symlink awareness section to CLAUDE.md - Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing - Add tests for contributor mode and session awareness in generated output - Add E2E eval for contributor mode report filing Co-Authored-By: Claude Opus 4.6 --- CLAUDE.md | 18 +++++ CONTRIBUTING.md | 112 +++++++++++++++------------- SKILL.md | 50 ++++++++++++- SKILL.md.tmpl | 2 +- browse/SKILL.md | 50 ++++++++++++- browse/SKILL.md.tmpl | 2 +- plan-ceo-review/SKILL.md | 58 ++++++++++++-- plan-ceo-review/SKILL.md.tmpl | 10 +-- plan-eng-review/SKILL.md | 68 ++++++++++++++--- plan-eng-review/SKILL.md.tmpl | 20 ++--- qa/SKILL.md | 50 ++++++++++++- qa/SKILL.md.tmpl | 2 +- retro/SKILL.md | 50 ++++++++++++- retro/SKILL.md.tmpl | 2 +- review/SKILL.md | 52 ++++++++++++- review/SKILL.md.tmpl | 4 +- scripts/gen-skill-docs.ts | 56 +++++++++++++- setup-browser-cookies/SKILL.md | 50 ++++++++++++- setup-browser-cookies/SKILL.md.tmpl | 2 +- ship/SKILL.md | 58 ++++++++++++-- ship/SKILL.md.tmpl | 10 +-- test/gen-skill-docs.test.ts | 16 ++++ test/skill-e2e.test.ts | 46 ++++++++++++ 23 files changed, 670 insertions(+), 118 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 6546233..4ff9941 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -72,6 +72,24 @@ When you need to interact with a browser (QA, dogfooding, cookie setup), use the `mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this project uses. +## Vendored symlink awareness + +When developing gstack, `.claude/skills/gstack` may be a symlink back to this +working directory (gitignored). This means skill changes are **live immediately** — +great for rapid iteration, risky during big refactors where half-written skills +could break other Claude Code sessions using gstack concurrently. + +**Check once per session:** Run `ls -la .claude/skills/gstack` to see if it's a +symlink or a real copy. If it's a symlink to your working directory, be aware that: +- Template changes + `bun run gen:skill-docs` immediately affect all gstack invocations +- Breaking changes to SKILL.md.tmpl files can break concurrent gstack sessions +- During large refactors, remove the symlink (`rm .claude/skills/gstack`) so the + global install at `~/.claude/skills/gstack/` is used instead + +**For plan reviews:** When reviewing plans that modify skill templates or the +gen-skill-docs pipeline, consider whether the changes should be tested in isolation +before going live (especially if the user is actively using gstack in other windows). + ## Deploying to the active skill The active skill lives at `~/.claude/skills/gstack/`. After making changes: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9b590c8..abe1cf1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -20,9 +20,44 @@ Now edit any `SKILL.md`, invoke it in Claude Code (e.g. `/review`), and see your bin/dev-teardown # deactivate — back to your global install ``` -## How dev mode works +## Contributor mode -`bin/dev-setup` creates a `.claude/skills/` directory inside the repo (gitignored) and fills it with symlinks pointing back to your working tree. Claude Code sees the local `skills/` first, so your edits win over the global install. +Contributor mode is for people who want to fix gstack when it annoys them. Enable it +and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you +work — what you were doing, what went wrong, repro steps, raw output. + +```bash +~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true +``` + +The logs are for **you**. When something bugs you enough to fix, the report is +already written. Fork gstack, symlink your fork into the project where you hit +the issue, fix it, and open a PR. + +### The contributor workflow + +1. **Hit friction while using gstack** — contributor mode logs it automatically +2. **Check your logs:** `ls ~/.gstack/contributor-logs/` +3. **Fork and clone gstack** (if you haven't already) +4. **Symlink your fork into the project where you hit the bug:** + ```bash + # In your core project (the one where gstack annoyed you) + ln -sfn /path/to/your/gstack-fork .claude/skills/gstack + cd .claude/skills/gstack && bun install && bun run build + ``` +5. **Fix the issue** — your changes are live immediately in this project +6. **Test by actually using gstack** — do the thing that annoyed you, verify it's fixed +7. **Open a PR from your fork** + +This is the best way to contribute: fix gstack while doing your real work, in the +project where you actually felt the pain. + +## Working on gstack inside the gstack repo + +When you're editing gstack skills and want to test them by actually using gstack +in the same repo, `bin/dev-setup` wires this up. It creates `.claude/skills/` +symlinks (gitignored) pointing back to your working tree, so Claude Code uses +your local edits instead of the global install. ``` gstack/ <- your working tree @@ -207,69 +242,42 @@ When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It d - **`.env` propagates across worktrees.** Set it once in the main repo, all Conductor workspaces get it. - **`.claude/skills/` is gitignored.** The symlinks never get committed. -## Testing a branch in another repo - -When you're developing gstack in one workspace and want to test your branch in a -different project (e.g. testing browse changes against your real app), there are -two cases depending on how gstack is installed in that project. +## Testing your changes in a real project -### Global install only (no `.claude/skills/gstack/` in the project) - -Point your global install at the branch: +**This is the recommended way to develop gstack.** Symlink your gstack checkout +into the project where you actually use it, so your changes are live while you +do real work: ```bash -cd ~/.claude/skills/gstack -git fetch origin -git checkout origin/ # e.g. origin/v0.3.2 -bun install # in case deps changed -bun run build # rebuild the binary +# In your core project +ln -sfn /path/to/your/gstack-checkout .claude/skills/gstack +cd .claude/skills/gstack && bun install && bun run build ``` -Now open Claude Code in the other project — it picks up skills from -`~/.claude/skills/` automatically. To go back to main when you're done: +Now every gstack skill invocation in this project uses your working tree. Edit a +template, run `bun run gen:skill-docs`, and the next `/review` or `/qa` call picks +it up immediately. + +**To go back to the stable global install**, just remove the symlink: ```bash -cd ~/.claude/skills/gstack -git checkout main && git pull -bun run build +rm .claude/skills/gstack ``` -### Vendored project copy (`.claude/skills/gstack/` checked into the project) - -Some projects vendor gstack by copying it into the repo (no `.git` inside the -copy). Project-local skills take priority over global, so you need to update -the vendored copy too. This is a three-step process: +Claude Code falls back to `~/.claude/skills/gstack/` automatically. -1. **Update your global install to the branch** (so you have the source): - ```bash - cd ~/.claude/skills/gstack - git fetch origin - git checkout origin/ # e.g. origin/v0.3.2 - bun install && bun run build - ``` - -2. **Replace the vendored copy** in the other project: - ```bash - cd /path/to/other-project +### Alternative: point your global install at a branch - # Remove old skill symlinks and vendored copy - for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do - rm -f .claude/skills/$s - done - rm -rf .claude/skills/gstack +If you don't want per-project symlinks, you can switch the global install: - # Copy from global install (strips .git so it stays vendored) - cp -Rf ~/.claude/skills/gstack .claude/skills/gstack - rm -rf .claude/skills/gstack/.git - - # Rebuild binary and re-create skill symlinks - cd .claude/skills/gstack && ./setup - ``` - -3. **Test your changes** — open Claude Code in that project and use the skills. +```bash +cd ~/.claude/skills/gstack +git fetch origin +git checkout origin/ +bun install && bun run build +``` -To revert to main later, repeat steps 1-2 with `git checkout main && git pull` -instead of `git checkout origin/`. +This affects all projects. To revert: `git checkout main && git pull && bun run build`. ## Shipping your changes diff --git a/SKILL.md b/SKILL.md index 9cc9acd..b362e82 100644 --- a/SKILL.md +++ b/SKILL.md @@ -16,15 +16,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # gstack browse: QA Testing & Dogfooding Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command. diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl index c827f5f..7f2e11d 100644 --- a/SKILL.md.tmpl +++ b/SKILL.md.tmpl @@ -14,7 +14,7 @@ allowed-tools: --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # gstack browse: QA Testing & Dogfooding diff --git a/browse/SKILL.md b/browse/SKILL.md index 22c5d88..28e976d 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -16,15 +16,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # browse: QA Testing & Dogfooding Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command. diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index 62e52cd..6ce2063 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -14,7 +14,7 @@ allowed-tools: --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # browse: QA Testing & Dogfooding diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 7bb5dad..c82753a 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -16,15 +16,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # Mega Plan Review Mode ## Philosophy @@ -365,16 +413,13 @@ Evaluate: **STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. ## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. - -## For Each Issue You Find +Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews: * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. * Describe the problem concretely, with file and line references. * Present 2-3 options, including "do nothing" where reasonable. * For each option: effort, risk, and maintenance burden in one line. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. * **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference. -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). +* Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). * **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. ## Required Outputs @@ -465,7 +510,6 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default. ## Formatting Rules * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). * Label with NUMBER + LETTER (e.g., "3A", "3B"). -* Recommended option always listed first. * One sentence max per option. * After each section, pause and wait for feedback. * Use **CRITICAL GAP** / **WARNING** / **OK** for scannability. diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl index 09fa627..ef14a28 100644 --- a/plan-ceo-review/SKILL.md.tmpl +++ b/plan-ceo-review/SKILL.md.tmpl @@ -14,7 +14,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # Mega Plan Review Mode @@ -356,16 +356,13 @@ Evaluate: **STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. ## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. - -## For Each Issue You Find +Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews: * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. * Describe the problem concretely, with file and line references. * Present 2-3 options, including "do nothing" where reasonable. * For each option: effort, risk, and maintenance burden in one line. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. * **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference. -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). +* Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). * **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. ## Required Outputs @@ -456,7 +453,6 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default. ## Formatting Rules * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). * Label with NUMBER + LETTER (e.g., "3A", "3B"). -* Recommended option always listed first. * One sentence max per option. * After each section, pause and wait for feedback. * Use **CRITICAL GAP** / **WARNING** / **OK** for scannability. diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 6f4b74b..819ef07 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -16,15 +16,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # Plan Review Mode Review this plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give me an opinionated recommendation, and ask for my input before assuming a direction. @@ -138,18 +186,15 @@ Evaluate: **STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. ## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options. - -## For each issue you find -For every specific issue (bug, smell, design concern, or risk): +Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews: * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. * Describe the problem concretely, with file and line references. -* Present 2–3 options, including "do nothing" where that's reasonable. +* Present 2-3 options, including "do nothing" where that's reasonable. * For each option, specify in one line: effort, risk, and maintenance burden. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. * **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.). -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). +* Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). * **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. +* **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options. ## Required outputs @@ -201,10 +246,9 @@ At the end of the review, fill in and display this summary so the user can see a Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic. ## Formatting rules -* NUMBER issues (1, 2, 3...) and give LETTERS for options (A, B, C...). -* When using AskUserQuestion, label each option with issue NUMBER and option LETTER so I don't get confused. -* Recommended option is always listed first. -* Keep each option to one sentence max. I should be able to pick in under 5 seconds. +* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). +* Label with NUMBER + LETTER (e.g., "3A", "3B"). +* One sentence max per option. Pick in under 5 seconds. * After each review section, pause and ask for feedback before moving on. ## Unresolved decisions diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl index 006d184..410b072 100644 --- a/plan-eng-review/SKILL.md.tmpl +++ b/plan-eng-review/SKILL.md.tmpl @@ -14,7 +14,7 @@ allowed-tools: - Bash --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # Plan Review Mode @@ -129,18 +129,15 @@ Evaluate: **STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved. ## CRITICAL RULE — How to ask questions -Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options. - -## For each issue you find -For every specific issue (bug, smell, design concern, or risk): +Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews: * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question. * Describe the problem concretely, with file and line references. -* Present 2–3 options, including "do nothing" where that's reasonable. +* Present 2-3 options, including "do nothing" where that's reasonable. * For each option, specify in one line: effort, risk, and maintenance burden. -* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu. * **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.). -* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). +* Label with issue NUMBER + option LETTER (e.g., "3A", "3B"). * **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs. +* **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options. ## Required outputs @@ -192,10 +189,9 @@ At the end of the review, fill in and display this summary so the user can see a Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic. ## Formatting rules -* NUMBER issues (1, 2, 3...) and give LETTERS for options (A, B, C...). -* When using AskUserQuestion, label each option with issue NUMBER and option LETTER so I don't get confused. -* Recommended option is always listed first. -* Keep each option to one sentence max. I should be able to pick in under 5 seconds. +* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). +* Label with NUMBER + LETTER (e.g., "3A", "3B"). +* One sentence max per option. Pick in under 5 seconds. * After each review section, pause and ask for feedback before moving on. ## Unresolved decisions diff --git a/qa/SKILL.md b/qa/SKILL.md index d6acc6a..c11f8a6 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -20,15 +20,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # /qa: Test → Fix → Verify You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence. diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl index 2fa05d8..a3e5a9f 100644 --- a/qa/SKILL.md.tmpl +++ b/qa/SKILL.md.tmpl @@ -18,7 +18,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # /qa: Test → Fix → Verify diff --git a/retro/SKILL.md b/retro/SKILL.md index f1e92c2..28280c9 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -15,15 +15,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # /retro — Weekly Engineering Retrospective Generates a comprehensive engineering retrospective analyzing commit history, work patterns, and code quality metrics. Team-aware: identifies the user running the command, then analyzes every contributor with per-person praise and growth opportunities. Designed for a senior IC/CTO-level builder using Claude Code as a force multiplier. diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index 2de8d5c..07e0888 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -13,7 +13,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # /retro — Weekly Engineering Retrospective diff --git a/review/SKILL.md b/review/SKILL.md index b572283..740795a 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -16,15 +16,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # Pre-Landing PR Review You are running the `/review` workflow. Analyze the current branch's diff against main for structural issues that tests don't catch. @@ -84,7 +132,7 @@ Follow the output format specified in the checklist. Respect the suppressions **Always output ALL findings** — both critical and informational. The user must see every issue. -- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip). +- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip). After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed. - If only non-critical issues found: output findings. No further action needed. - If no issues found: output `Pre-Landing Review: No issues found.` diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index edc1e6c..68d8a1d 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -14,7 +14,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # Pre-Landing PR Review @@ -75,7 +75,7 @@ Follow the output format specified in the checklist. Respect the suppressions **Always output ALL findings** — both critical and informational. The user must see every issue. -- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip). +- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip). After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed. - If only non-critical issues found: output findings. No further action needed. - If no issues found: output `Pre-Landing Review: No issues found.` diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 52c3042..bafed64 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -94,15 +94,63 @@ function generateSnapshotFlags(): string { return lines.join('\n'); } -function generateUpdateCheck(): string { - return `## Update Check (run first) +function generatePreamble(): string { + return `## Preamble (run first) \`\`\`bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) \`\`\` -If output shows \`UPGRADE_AVAILABLE \`: read \`~/.claude/skills/gstack/gstack-upgrade/SKILL.md\` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If \`JUST_UPGRADED \`: tell user "Running gstack v{to} (just updated!)" and continue.`; +If output shows \`UPGRADE_AVAILABLE \`: read \`~/.claude/skills/gstack/gstack-upgrade/SKILL.md\` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If \`JUST_UPGRADED \`: tell user "Running gstack v{to} (just updated!)" and continue. + +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. \`RECOMMENDATION: Choose [X] because [one-line reason]\` +4. Lettered options: \`A) ... B) ... C) ...\` + +If \`_SESSIONS\` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If \`_CONTRIB\` is \`true\`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write \`~/.gstack/contributor-logs/{slug}.md\` with this structure: + +\`\`\` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +\`\`\` + +Then run: \`mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md\` + +Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-snapshot-ref-gap\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`; } function generateBrowseSetup(): string { @@ -405,7 +453,7 @@ Minimum 0 per category. const RESOLVERS: Record string> = { COMMAND_REFERENCE: generateCommandReference, SNAPSHOT_FLAGS: generateSnapshotFlags, - UPDATE_CHECK: generateUpdateCheck, + PREAMBLE: generatePreamble, BROWSE_SETUP: generateBrowseSetup, QA_METHODOLOGY: generateQAMethodology, }; diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md index b2f8fc6..0623024 100644 --- a/setup-browser-cookies/SKILL.md +++ b/setup-browser-cookies/SKILL.md @@ -13,15 +13,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # Setup Browser Cookies Import logged-in sessions from your real Chromium browser into the headless browse session. diff --git a/setup-browser-cookies/SKILL.md.tmpl b/setup-browser-cookies/SKILL.md.tmpl index e0d2f42..dcab374 100644 --- a/setup-browser-cookies/SKILL.md.tmpl +++ b/setup-browser-cookies/SKILL.md.tmpl @@ -11,7 +11,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # Setup Browser Cookies diff --git a/ship/SKILL.md b/ship/SKILL.md index 386299b..e023816 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -15,15 +15,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # Ship: Fully Automated Ship Workflow You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end. @@ -174,8 +222,8 @@ Review the diff for structural issues that tests don't catch. 6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with: - The problem (`file:line` + description) - - Your recommended fix - - Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip + - `RECOMMENDATION: Choose A because [one-line reason]` + - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4. 7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8. @@ -202,8 +250,8 @@ For each classified comment: **VALID & ACTIONABLE:** Use AskUserQuestion with: - The comment (file:line or [top-level] + body summary + permalink URL) -- Your recommended fix -- Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive +- `RECOMMENDATION: Choose A because [one-line reason]` +- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive - If user chooses A: apply the fix, commit the fixed files (`git add && git commit -m "fix: address Greptile review — "`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix). - If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp). diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index 81bd7e3..06ff5a0 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -13,7 +13,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # Ship: Fully Automated Ship Workflow @@ -165,8 +165,8 @@ Review the diff for structural issues that tests don't catch. 6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with: - The problem (`file:line` + description) - - Your recommended fix - - Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip + - `RECOMMENDATION: Choose A because [one-line reason]` + - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4. 7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8. @@ -193,8 +193,8 @@ For each classified comment: **VALID & ACTIONABLE:** Use AskUserQuestion with: - The comment (file:line or [top-level] + body summary + permalink URL) -- Your recommended fix -- Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive +- `RECOMMENDATION: Choose A because [one-line reason]` +- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive - If user chooses A: apply the fix, commit the fixed files (`git add && git commit -m "fix: address Greptile review — "`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix). - If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp). diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index fc85579..a2499af 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -125,10 +125,26 @@ describe('gen-skill-docs', () => { const rootTmpl = fs.readFileSync(path.join(ROOT, 'SKILL.md.tmpl'), 'utf-8'); expect(rootTmpl).toContain('{{COMMAND_REFERENCE}}'); expect(rootTmpl).toContain('{{SNAPSHOT_FLAGS}}'); + expect(rootTmpl).toContain('{{PREAMBLE}}'); const browseTmpl = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md.tmpl'), 'utf-8'); expect(browseTmpl).toContain('{{COMMAND_REFERENCE}}'); expect(browseTmpl).toContain('{{SNAPSHOT_FLAGS}}'); + expect(browseTmpl).toContain('{{PREAMBLE}}'); + }); + + test('generated SKILL.md contains contributor mode check', () => { + const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); + expect(content).toContain('Contributor Mode'); + expect(content).toContain('gstack_contributor'); + expect(content).toContain('contributor-logs'); + }); + + test('generated SKILL.md contains session awareness', () => { + const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); + expect(content).toContain('_SESSIONS'); + expect(content).toContain('RECOMMENDATION'); + expect(content).toContain('ELI16'); }); test('qa and qa-only templates use QA_METHODOLOGY placeholder', () => { diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index f0949c9..df6f50b 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -280,6 +280,52 @@ Report the exact output — either "READY: " or "NEEDS_SETUP".`, // Clean up try { fs.rmSync(nonGitDir, { recursive: true, force: true }); } catch {} }, 60_000); + + test('contributor mode files a report on gstack error', async () => { + const contribDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-contrib-')); + const logsDir = path.join(contribDir, 'contributor-logs'); + fs.mkdirSync(logsDir, { recursive: true }); + + // Extract contributor mode instructions from generated SKILL.md + const skillMd = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); + const contribStart = skillMd.indexOf('## Contributor Mode'); + const contribEnd = skillMd.indexOf('\n## ', contribStart + 1); + const contribBlock = skillMd.slice(contribStart, contribEnd > 0 ? contribEnd : undefined); + + const result = await runSkillTest({ + prompt: `You are in contributor mode (_CONTRIB=true). + +${contribBlock} + +OVERRIDE: Write contributor logs to ${logsDir}/ instead of ~/.gstack/contributor-logs/ + +Now try this browse command (it will fail — there is no binary at this path): +/nonexistent/path/browse goto https://example.com + +This is a gstack issue (the browse binary is missing/misconfigured). +File a contributor report about this issue. Then tell me what you filed.`, + workingDirectory: contribDir, + maxTurns: 8, + timeout: 60_000, + testName: 'contributor-mode', + runId, + }); + + logCost('contributor mode', result); + recordE2E('contributor mode report', 'Skill E2E tests', result); + + // Verify a contributor log was created with expected format + const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md')); + expect(logFiles.length).toBeGreaterThan(0); + + const logContent = fs.readFileSync(path.join(logsDir, logFiles[0]), 'utf-8'); + expect(logContent).toContain('Hey gstack team'); + expect(logContent).toContain('What I was trying to do'); + expect(logContent).toContain('What happened instead'); + + // Clean up + try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {} + }, 90_000); }); // --- B4: QA skill E2E --- From f9014c5d828953f59f0f96e47273d2191928b22a Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sun, 15 Mar 2026 20:41:17 -0500 Subject: [PATCH 2/8] feat: add Enum & Value Completeness to /review critical checklist New CRITICAL review category that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the class of bugs where a new value is added but not handled in all switch/case chains, allowlists, or frontend-backend contracts. Co-Authored-By: Claude Opus 4.6 --- review/SKILL.md | 4 +++- review/SKILL.md.tmpl | 4 +++- review/checklist.md | 11 +++++++++-- 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/review/SKILL.md b/review/SKILL.md index 740795a..32c597a 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -121,9 +121,11 @@ Run `git diff origin/main` to get the full diff. This includes both committed an Apply the checklist against the diff in two passes: -1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary +1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness 2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend +**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient. + Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section. --- diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index 68d8a1d..124a539 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -64,9 +64,11 @@ Run `git diff origin/main` to get the full diff. This includes both committed an Apply the checklist against the diff in two passes: -1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary +1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness 2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend +**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient. + Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section. --- diff --git a/review/checklist.md b/review/checklist.md index e321890..6052c33 100644 --- a/review/checklist.md +++ b/review/checklist.md @@ -48,6 +48,13 @@ Be terse. For each issue: one line describing the problem, one line with the fix - LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting. - Structured tool output (arrays, hashes) accepted without type/shape checks before database writes. +#### Enum & Value Completeness +When the diff introduces a new enum value, status string, tier name, or type constant: +- **Trace it through every consumer.** Read (don't just grep — READ) each file that switches on, filters by, or displays that value. If any consumer doesn't handle the new value, flag it. Common miss: adding a value to the frontend dropdown but the backend model/compute method doesn't persist it. +- **Check allowlists/filter arrays.** Search for arrays or `%w[]` lists containing sibling values (e.g., if adding "revise" to tiers, find every `%w[quick lfg mega]` and verify "revise" is included where needed). +- **Check `case`/`if-elsif` chains.** If existing code branches on the enum, does the new value fall through to a wrong default? +To do this: use Grep to find all references to the sibling values (e.g., grep for "lfg" or "mega" to find all tier consumers). Read each match. This step requires reading code OUTSIDE the diff. + ### Pass 2 — INFORMATIONAL #### Conditional Side Effects @@ -101,8 +108,8 @@ Be terse. For each issue: one line describing the problem, one line with the fix CRITICAL (blocks /ship): INFORMATIONAL (in PR body): ├─ SQL & Data Safety ├─ Conditional Side Effects ├─ Race Conditions & Concurrency ├─ Magic Numbers & String Coupling -└─ LLM Output Trust Boundary ├─ Dead Code & Consistency - ├─ LLM Prompt Issues +├─ LLM Output Trust Boundary ├─ Dead Code & Consistency +└─ Enum & Value Completeness ├─ LLM Prompt Issues ├─ Test Gaps ├─ Crypto & Entropy ├─ Time Window Safety From 10ae4cf378c6047deb1c35a2cf7d2f316f19cf9d Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 00:34:05 -0500 Subject: [PATCH 3/8] chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs Co-Authored-By: Claude Opus 4.6 --- ARCHITECTURE.md | 20 ++++++++++++++++- CHANGELOG.md | 15 +++++++++++++ VERSION | 2 +- qa-only/SKILL.md | 50 ++++++++++++++++++++++++++++++++++++++++++- qa-only/SKILL.md.tmpl | 2 +- 5 files changed, 85 insertions(+), 4 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 2b2ef36..5311c2c 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -192,7 +192,25 @@ gen-skill-docs.ts (reads source code metadata) SKILL.md (committed, auto-generated sections) ``` -Templates contain the workflows, tips, and examples that require human judgment. The `{{COMMAND_REFERENCE}}` and `{{SNAPSHOT_FLAGS}}` placeholders are filled from `commands.ts` and `snapshot.ts` at build time. This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear. +Templates contain the workflows, tips, and examples that require human judgment. Placeholders are filled from source code at build time: + +| Placeholder | Source | What it generates | +|-------------|--------|-------------------| +| `{{COMMAND_REFERENCE}}` | `commands.ts` | Categorized command table | +| `{{SNAPSHOT_FLAGS}}` | `snapshot.ts` | Flag reference with examples | +| `{{PREAMBLE}}` | `gen-skill-docs.ts` | Startup block: update check, session tracking, contributor mode, AskUserQuestion format | +| `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions | + +This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear. + +### The preamble + +Every skill starts with a `{{PREAMBLE}}` block that runs before the skill's own logic. It handles four things in a single bash command: + +1. **Update check** — calls `gstack-update-check`, reports if an upgrade is available. +2. **Session tracking** — touches `~/.gstack/sessions/$PPID` and counts active sessions (files modified in the last 2 hours). When 3+ sessions are running, all skills enter "ELI16 mode" — every question re-grounds the user on context because they're juggling windows. +3. **Contributor mode** — reads `gstack_contributor` from config. When true, the agent files casual field reports to `~/.gstack/contributor-logs/` when gstack itself misbehaves. +4. **AskUserQuestion format** — universal format: context, question, `RECOMMENDATION: Choose X because ___`, lettered options. Consistent across all skills. ### Why committed, not generated at runtime? diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ae6b4f..44980dc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,20 @@ # Changelog +## 0.4.1 — 2026-03-16 + +### For users + +- **Contributor mode.** Run `gstack-config set gstack_contributor true` and gstack starts filing field reports when it hits its own bugs. You'll see a casual "hey gstack team, this didn't work" note appear in `~/.gstack/contributor-logs/` — open your fork and fix it. +- **Session awareness.** If you're running 3+ gstack sessions at once, every question now re-grounds you: project, branch, what you're working on, what it's asking. No more "wait, which window is this?" +- **Consistent recommendations.** Every question gstack asks you now ends with a clear `RECOMMENDATION: Choose X because ___` before the options. Same format everywhere. +- **Enum completeness checks in /review.** New critical review category that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the "added a value but forgot to handle it in the switch statement" class of bugs. + +### For contributors + +- Renamed `{{UPDATE_CHECK}}` placeholder to `{{PREAMBLE}}` across all skill templates — single startup block now handles update check, session tracking, contributor mode, and question formatting. +- DRY'd plan-ceo-review and plan-eng-review question formatting to reference the preamble baseline. +- Added vendored symlink awareness docs to CLAUDE.md. + ## 0.4.0 — 2026-03-16 ### Added diff --git a/VERSION b/VERSION index 1d0ba9e..267577d 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.4.0 +0.4.1 diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 8d6f2b1..438b782 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -15,15 +15,63 @@ allowed-tools: -## Update Check (run first) +## Preamble (run first) ```bash _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) [ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) ``` If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + # /qa-only: Report-Only QA Testing You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence. **NEVER fix anything.** diff --git a/qa-only/SKILL.md.tmpl b/qa-only/SKILL.md.tmpl index 7ad936b..5d49e20 100644 --- a/qa-only/SKILL.md.tmpl +++ b/qa-only/SKILL.md.tmpl @@ -13,7 +13,7 @@ allowed-tools: - AskUserQuestion --- -{{UPDATE_CHECK}} +{{PREAMBLE}} # /qa-only: Report-Only QA Testing From 064b074c71557877025f6c5dbc45c008fff7a036 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 00:39:56 -0500 Subject: [PATCH 4/8] =?UTF-8?q?docs:=20add=20CHANGELOG=20style=20guide=20?= =?UTF-8?q?=E2=80=94=20user-facing,=20sell=20the=20feature?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 4ff9941..e724b82 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -90,6 +90,17 @@ symlink or a real copy. If it's a symlink to your working directory, be aware th gen-skill-docs pipeline, consider whether the changes should be tested in isolation before going live (especially if the user is actively using gstack in other windows). +## CHANGELOG style + +CHANGELOG.md is **for users**, not contributors. Write it like product release notes: + +- Lead with what the user can now **do** that they couldn't before. Sell the feature. +- Use plain language, not implementation details. "You can now..." not "Refactored the..." +- Put contributor/internal changes in a separate "For contributors" section at the bottom. +- Every entry should make someone think "oh nice, I want to try that." +- No jargon: say "every question now tells you which project and branch you're in" not + "AskUserQuestion format standardized across skill templates via preamble resolver." + ## Deploying to the active skill The active skill lives at `~/.claude/skills/gstack/`. After making changes: From 8a032ccc80db80ae468f3c03e7c864c0635022d9 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 00:40:48 -0500 Subject: [PATCH 5/8] docs: rewrite v0.4.1 changelog to be user-facing and sell the features Co-Authored-By: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 44980dc..57c2c1a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,18 +2,16 @@ ## 0.4.1 — 2026-03-16 -### For users - -- **Contributor mode.** Run `gstack-config set gstack_contributor true` and gstack starts filing field reports when it hits its own bugs. You'll see a casual "hey gstack team, this didn't work" note appear in `~/.gstack/contributor-logs/` — open your fork and fix it. -- **Session awareness.** If you're running 3+ gstack sessions at once, every question now re-grounds you: project, branch, what you're working on, what it's asking. No more "wait, which window is this?" -- **Consistent recommendations.** Every question gstack asks you now ends with a clear `RECOMMENDATION: Choose X because ___` before the options. Same format everywhere. -- **Enum completeness checks in /review.** New critical review category that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the "added a value but forgot to handle it in the switch statement" class of bugs. +- **gstack now notices when it screws up.** Turn on contributor mode (`gstack-config set gstack_contributor true`) and gstack automatically writes up what went wrong — what you were doing, what broke, repro steps. Next time something annoys you, the bug report is already written. Fork gstack and fix it yourself. +- **Juggling multiple sessions? gstack keeps up.** When you have 3+ gstack windows open, every question now tells you which project, which branch, and what you were working on. No more staring at a question thinking "wait, which window is this?" +- **Every question now comes with a recommendation.** Instead of dumping options on you and making you think, gstack tells you what it would pick and why. Same clear format across every skill. +- **/review now catches forgotten enum handlers.** Add a new status, tier, or type constant? /review traces it through every switch statement, allowlist, and filter in your codebase — not just the files you changed. Catches the "added the value but forgot to handle it" class of bugs before they ship. ### For contributors -- Renamed `{{UPDATE_CHECK}}` placeholder to `{{PREAMBLE}}` across all skill templates — single startup block now handles update check, session tracking, contributor mode, and question formatting. -- DRY'd plan-ceo-review and plan-eng-review question formatting to reference the preamble baseline. -- Added vendored symlink awareness docs to CLAUDE.md. +- Renamed `{{UPDATE_CHECK}}` to `{{PREAMBLE}}` across all 11 skill templates — one startup block now handles update check, session tracking, contributor mode, and question formatting. +- DRY'd plan-ceo-review and plan-eng-review question formatting to reference the preamble baseline instead of duplicating rules. +- Added CHANGELOG style guide and vendored symlink awareness docs to CLAUDE.md. ## 0.4.0 — 2026-03-16 From 17d9c7fec02e518f0c38856dc6b5556e433dbf7d Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 01:15:45 -0500 Subject: [PATCH 6/8] feat: add evals for RECOMMENDATION format, session awareness, and enum completeness Free tests (Tier 1): RECOMMENDATION format + session awareness in all preamble SKILL.md files, enum completeness checklist structure and CRITICAL classification. E2E eval: /review catches missed enum handlers when a new status value is added but not handled in case/switch and notify methods. Co-Authored-By: Claude Opus 4.6 (1M context) --- test/fixtures/review-eval-enum-diff.rb | 30 +++++++++++ test/fixtures/review-eval-enum.rb | 27 ++++++++++ test/skill-e2e.test.ts | 72 ++++++++++++++++++++++++++ test/skill-validation.test.ts | 60 +++++++++++++++++++++ 4 files changed, 189 insertions(+) create mode 100644 test/fixtures/review-eval-enum-diff.rb create mode 100644 test/fixtures/review-eval-enum.rb diff --git a/test/fixtures/review-eval-enum-diff.rb b/test/fixtures/review-eval-enum-diff.rb new file mode 100644 index 0000000..9b87c2a --- /dev/null +++ b/test/fixtures/review-eval-enum-diff.rb @@ -0,0 +1,30 @@ +# Feature branch version: adds "returned" status but misses consumers +class Order < ApplicationRecord + STATUSES = %w[pending processing shipped delivered returned].freeze + + validates :status, inclusion: { in: STATUSES } + + def display_status + case status + when 'pending' then 'Awaiting processing' + when 'processing' then 'Being prepared' + when 'shipped' then 'On the way' + when 'delivered' then 'Delivered' + # BUG: 'returned' not handled — falls through to nil + end + end + + def can_cancel? + # BUG: should 'returned' be cancellable? Not considered. + %w[pending processing].include?(status) + end + + def notify_customer + case status + when 'pending' then OrderMailer.confirmation(self).deliver_later + when 'shipped' then OrderMailer.shipped(self).deliver_later + when 'delivered' then OrderMailer.delivered(self).deliver_later + # BUG: 'returned' has no notification — customer won't know return was received + end + end +end diff --git a/test/fixtures/review-eval-enum.rb b/test/fixtures/review-eval-enum.rb new file mode 100644 index 0000000..79960a3 --- /dev/null +++ b/test/fixtures/review-eval-enum.rb @@ -0,0 +1,27 @@ +# Existing file on main: order model with status handling +class Order < ApplicationRecord + STATUSES = %w[pending processing shipped delivered].freeze + + validates :status, inclusion: { in: STATUSES } + + def display_status + case status + when 'pending' then 'Awaiting processing' + when 'processing' then 'Being prepared' + when 'shipped' then 'On the way' + when 'delivered' then 'Delivered' + end + end + + def can_cancel? + %w[pending processing].include?(status) + end + + def notify_customer + case status + when 'pending' then OrderMailer.confirmation(self).deliver_later + when 'shipped' then OrderMailer.shipped(self).deliver_later + when 'delivered' then OrderMailer.delivered(self).deliver_later + end + end +end diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index df6f50b..d36e011 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -438,6 +438,78 @@ Write your review findings to ${reviewDir}/review-output.md`, }, 120_000); }); +// --- Review: Enum completeness E2E --- + +describeE2E('Review enum completeness E2E', () => { + let enumDir: string; + + beforeAll(() => { + enumDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-enum-')); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: enumDir, stdio: 'pipe', timeout: 5000 }); + + run('git', ['init']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + + // Commit baseline on main — order model with 4 statuses + const baseContent = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-enum.rb'), 'utf-8'); + fs.writeFileSync(path.join(enumDir, 'order.rb'), baseContent); + run('git', ['add', 'order.rb']); + run('git', ['commit', '-m', 'initial order model']); + + // Feature branch adds "returned" status but misses handlers + run('git', ['checkout', '-b', 'feature/add-returned-status']); + const diffContent = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-enum-diff.rb'), 'utf-8'); + fs.writeFileSync(path.join(enumDir, 'order.rb'), diffContent); + run('git', ['add', 'order.rb']); + run('git', ['commit', '-m', 'add returned status']); + + // Copy review skill files + fs.copyFileSync(path.join(ROOT, 'review', 'SKILL.md'), path.join(enumDir, 'review-SKILL.md')); + fs.copyFileSync(path.join(ROOT, 'review', 'checklist.md'), path.join(enumDir, 'review-checklist.md')); + fs.copyFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), path.join(enumDir, 'review-greptile-triage.md')); + }); + + afterAll(() => { + try { fs.rmSync(enumDir, { recursive: true, force: true }); } catch {} + }); + + test('/review catches missing enum handlers for new status value', async () => { + const result = await runSkillTest({ + prompt: `You are in a git repo on branch feature/add-returned-status with changes against main. +Read review-SKILL.md for the review workflow instructions. +Also read review-checklist.md and apply it — pay special attention to the Enum & Value Completeness section. +Run /review on the current diff (git diff main...HEAD). +Write your review findings to ${enumDir}/review-output.md + +The diff adds a new "returned" status to the Order model. Your job is to check if all consumers handle it.`, + workingDirectory: enumDir, + maxTurns: 15, + timeout: 90_000, + testName: 'review-enum-completeness', + runId, + }); + + logCost('/review enum', result); + recordE2E('/review enum completeness', 'Review enum completeness E2E', result); + expect(result.exitReason).toBe('success'); + + // Verify the review caught the missing enum handlers + const reviewPath = path.join(enumDir, 'review-output.md'); + if (fs.existsSync(reviewPath)) { + const review = fs.readFileSync(reviewPath, 'utf-8'); + // Should mention the missing "returned" handling in at least one of the methods + const mentionsReturned = review.toLowerCase().includes('returned'); + const mentionsEnum = review.toLowerCase().includes('enum') || review.toLowerCase().includes('status'); + const mentionsCritical = review.toLowerCase().includes('critical'); + expect(mentionsReturned).toBe(true); + expect(mentionsEnum || mentionsCritical).toBe(true); + } + }, 120_000); +}); + // --- B6/B7/B8: Planted-bug outcome evals --- // Outcome evals also need ANTHROPIC_API_KEY for the LLM judge diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index a2ce421..88e9893 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -411,6 +411,66 @@ describe('TODOS-format.md reference consistency', () => { }); }); +// --- v0.4.1 feature coverage: RECOMMENDATION format, session awareness, enum completeness --- + +describe('v0.4.1 preamble features', () => { + const skillsWithPreamble = [ + 'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md', + 'qa-only/SKILL.md', + 'setup-browser-cookies/SKILL.md', + 'ship/SKILL.md', 'review/SKILL.md', + 'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md', + 'retro/SKILL.md', + ]; + + for (const skill of skillsWithPreamble) { + test(`${skill} contains RECOMMENDATION format`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('RECOMMENDATION: Choose'); + expect(content).toContain('AskUserQuestion'); + }); + + test(`${skill} contains session awareness`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('_SESSIONS'); + expect(content).toContain('ELI16'); + }); + } +}); + +describe('Enum & Value Completeness in review checklist', () => { + const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8'); + + test('checklist has Enum & Value Completeness section', () => { + expect(checklist).toContain('Enum & Value Completeness'); + }); + + test('Enum & Value Completeness is classified as CRITICAL', () => { + // It should appear under Pass 1 — CRITICAL, not Pass 2 + const pass1Start = checklist.indexOf('### Pass 1'); + const pass2Start = checklist.indexOf('### Pass 2'); + const enumStart = checklist.indexOf('Enum & Value Completeness'); + expect(enumStart).toBeGreaterThan(pass1Start); + expect(enumStart).toBeLessThan(pass2Start); + }); + + test('Enum & Value Completeness mentions tracing through consumers', () => { + expect(checklist).toContain('Trace it through every consumer'); + expect(checklist).toContain('case'); + expect(checklist).toContain('allowlist'); + }); + + test('Enum & Value Completeness is in the gate classification as CRITICAL', () => { + const gateSection = checklist.slice(checklist.indexOf('## Gate Classification')); + // The ASCII art has CRITICAL on the left and INFORMATIONAL on the right + // Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─) + const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness')); + expect(enumLine).toBeDefined(); + // It's on the left (CRITICAL) side — starts with ├─ or └─ + expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true); + }); +}); + // --- Part 7: Planted-bug fixture validation (A4) --- describe('Planted-bug fixture validation', () => { From f9c834f1f97f4f301c2ebcf5dc5adf608bd177a1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 01:19:35 -0500 Subject: [PATCH 7/8] feat: add E2E eval for session awareness ELI16 mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments branch, verifies the output re-grounds the user with project, branch, context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions. Co-Authored-By: Claude Opus 4.6 (1M context) --- test/skill-e2e.test.ts | 68 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index d36e011..23cc81e 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -326,6 +326,74 @@ File a contributor report about this issue. Then tell me what you filed.`, // Clean up try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {} }, 90_000); + + test('session awareness adds ELI16 context when _SESSIONS >= 3', async () => { + const sessionDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-session-')); + + // Set up a git repo so there's project/branch context to reference + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: sessionDir, stdio: 'pipe', timeout: 5000 }); + run('git', ['init']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + fs.writeFileSync(path.join(sessionDir, 'app.rb'), '# my app\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'init']); + run('git', ['checkout', '-b', 'feature/add-payments']); + // Add a remote so the agent can derive a project name + run('git', ['remote', 'add', 'origin', 'https://github.com/acme/billing-app.git']); + + // Extract AskUserQuestion format instructions from generated SKILL.md + const skillMd = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); + const aqStart = skillMd.indexOf('## AskUserQuestion Format'); + const aqEnd = skillMd.indexOf('\n## ', aqStart + 1); + const aqBlock = skillMd.slice(aqStart, aqEnd > 0 ? aqEnd : undefined); + + const outputPath = path.join(sessionDir, 'question-output.md'); + + const result = await runSkillTest({ + prompt: `You are running a gstack skill. The session preamble detected _SESSIONS=4 (the user has 4 gstack windows open). + +${aqBlock} + +You are on branch feature/add-payments in the billing-app project. You were reviewing a plan to add Stripe integration. + +You've hit a decision point: the plan doesn't specify whether to use Stripe Checkout (hosted) or Stripe Elements (embedded). You need to ask the user which approach to use. + +Since this is non-interactive, DO NOT actually call AskUserQuestion. Instead, write the EXACT text you would display to the user (the full AskUserQuestion content) to the file: ${outputPath} + +Remember: _SESSIONS=4, so ELI16 mode is active. The user is juggling multiple windows and may not remember what this conversation is about. Re-ground them.`, + workingDirectory: sessionDir, + maxTurns: 8, + timeout: 60_000, + testName: 'session-awareness', + runId, + }); + + logCost('session awareness', result); + recordE2E('session awareness ELI16', 'Skill E2E tests', result); + + // Verify the output contains ELI16 re-grounding context + if (fs.existsSync(outputPath)) { + const output = fs.readFileSync(outputPath, 'utf-8'); + const lower = output.toLowerCase(); + // Must mention project name + expect(lower.includes('billing') || lower.includes('acme')).toBe(true); + // Must mention branch + expect(lower.includes('payment') || lower.includes('feature')).toBe(true); + // Must mention what we're working on + expect(lower.includes('stripe') || lower.includes('checkout') || lower.includes('payment')).toBe(true); + // Must have a RECOMMENDATION + expect(output).toContain('RECOMMENDATION'); + } else { + // Check agent output as fallback + const output = result.output || ''; + expect(output).toContain('RECOMMENDATION'); + } + + // Clean up + try { fs.rmSync(sessionDir, { recursive: true, force: true }); } catch {} + }, 90_000); }); // --- B4: QA skill E2E --- From 7bbfd6ca04c921046d8f95f973e64ffa2777452d Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 01:32:24 -0500 Subject: [PATCH 8/8] fix: contributor mode eval marked FAIL due to expected browse error The test intentionally runs a nonexistent binary to trigger contributor mode. The session runner's browse error detection catches "no such file or directory...browse" and sets browseErrors, causing recordE2E to mark passed=false. Override passed to check only exitReason since the browse error is the expected scenario. Co-Authored-By: Claude Opus 4.6 (1M context) --- test/skill-e2e.test.ts | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index 23cc81e..8ce1f40 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -312,7 +312,11 @@ File a contributor report about this issue. Then tell me what you filed.`, }); logCost('contributor mode', result); - recordE2E('contributor mode report', 'Skill E2E tests', result); + // Override passed: this test intentionally triggers a browse error (nonexistent binary) + // so browseErrors will be non-empty — that's expected, not a failure + recordE2E('contributor mode report', 'Skill E2E tests', result, { + passed: result.exitReason === 'success', + }); // Verify a contributor log was created with expected format const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md'));