Thanks for wanting to make gstack better. Whether you're fixing a typo in a skill prompt or building an entirely new workflow, this guide will get you up and running fast.
gstack skills are Markdown files that Claude Code discovers from a skills/ directory. Normally they live at ~/.claude/skills/gstack/ (your global install). But when you're developing gstack itself, you want Claude Code to use the skills in your working tree — so edits take effect instantly without copying or deploying anything.
That's what dev mode does. It symlinks your repo into the local .claude/skills/ directory so Claude Code reads skills straight from your checkout.
git clone <repo> && cd gstack
bun install # install dependencies
bin/dev-setup # activate dev modeNow edit any SKILL.md, invoke it in Claude Code (e.g. /review), and see your changes live. When you're done developing:
bin/dev-teardown # deactivate — back to your global installbin/dev-setup creates a .claude/skills/ directory inside the repo (gitignored) and fills it with symlinks pointing back to your working tree. Claude Code sees the local skills/ first, so your edits win over the global install.
gstack/ <- your working tree
├── .claude/skills/ <- created by dev-setup (gitignored)
│ ├── gstack -> ../../ <- symlink back to repo root
│ ├── review -> gstack/review
│ ├── ship -> gstack/ship
│ └── ... <- one symlink per skill
├── review/
│ └── SKILL.md <- edit this, test with /review
├── ship/
│ └── SKILL.md
├── browse/
│ ├── src/ <- TypeScript source
│ └── dist/ <- compiled binary (gitignored)
└── ...
# 1. Enter dev mode
bin/dev-setup
# 2. Edit a skill
vim review/SKILL.md
# 3. Test it in Claude Code — changes are live
# > /review
# 4. Editing browse source? Rebuild the binary
bun run build
# 5. Done for the day? Tear down
bin/dev-teardown# 1. Copy .env.example and add your API key
cp .env.example .env
# Edit .env → set ANTHROPIC_API_KEY=sk-ant-...
# 2. Install deps (if you haven't already)
bun installBun auto-loads .env — no extra config. Conductor workspaces inherit .env from the main worktree automatically (see "Conductor workspaces" below).
| Tier | Command | Cost | What it tests |
|---|---|---|---|
| 1 — Static | bun test |
Free | Command validation, snapshot flags, SKILL.md correctness |
| 2 — E2E | bun run test:e2e |
~$0.50 | Full skill execution via Agent SDK |
| 3 — LLM eval | bun run test:eval |
~$0.03 | Doc quality scoring via LLM-as-judge |
bun test # Tier 1 only (runs on every commit, <5s)
bun run test:eval # Tier 3: LLM-as-judge (needs ANTHROPIC_API_KEY in .env)
bun run test:e2e # Tier 2: E2E (needs SKILL_E2E=1, can't run inside Claude Code)
bun run test:all # Tier 1 + Tier 2Runs automatically with bun test. No API keys needed.
- Skill parser tests (
test/skill-parser.test.ts) — Extracts every$Bcommand from SKILL.md bash code blocks and validates against the command registry inbrowse/src/commands.ts. Catches typos, removed commands, and invalid snapshot flags. - Skill validation tests (
test/skill-validation.test.ts) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds. - Generator tests (
test/gen-skill-docs.test.ts) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g.-d <N>not just-d), enriched descriptions for key commands (e.g.islists valid states,presslists key examples).
Spawns a real Claude Code session, invokes /qa or /browse, and scans tool results for errors. This is the closest thing to "does this skill actually work end-to-end?"
# Must run from a plain terminal — can't nest inside Claude Code or Conductor
SKILL_E2E=1 bun test test/skill-e2e.test.ts- Gated by
SKILL_E2E=1env var (prevents accidental expensive runs) - Auto-skips if it detects it's running inside Claude Code (Agent SDK can't nest)
- Saves full conversation transcripts on failure for debugging
- Tests live in
test/skill-e2e.test.ts, runner logic intest/helpers/session-runner.ts
Uses Claude Haiku to score generated SKILL.md docs on three dimensions:
- Clarity — Can an AI agent understand the instructions without ambiguity?
- Completeness — Are all commands, flags, and usage patterns documented?
- Actionability — Can the agent execute tasks using only the information in the doc?
Each dimension is scored 1-5. Threshold: every dimension must score ≥ 4. There's also a regression test that compares generated docs against the hand-maintained baseline from origin/main — generated must score equal or higher.
# Needs ANTHROPIC_API_KEY in .env
bun run test:eval- Uses
claude-haiku-4-5for cost efficiency - Tests live in
test/skill-llm-eval.test.ts - Calls the Anthropic API directly (not Agent SDK), so it works from anywhere including inside Claude Code
A GitHub Action (.github/workflows/skill-docs.yml) runs bun run gen:skill-docs --dry-run on every push and PR. If the generated SKILL.md files differ from what's committed, CI fails. This catches stale docs before they merge.
Tests run against the browse binary directly — they don't require dev mode.
SKILL.md files are generated from .tmpl templates. Don't edit the .md directly — your changes will be overwritten on the next build.
# 1. Edit the template
vim SKILL.md.tmpl # or browse/SKILL.md.tmpl
# 2. Regenerate
bun run gen:skill-docs
# 3. Check health
bun run skill:check
# Or use watch mode — auto-regenerates on save
bun run dev:skillTo add a browse command, add it to browse/src/commands.ts. To add a snapshot flag, add it to SNAPSHOT_FLAGS in browse/src/snapshot.ts. Then rebuild.
If you're using Conductor to run multiple Claude Code sessions in parallel, conductor.json wires up workspace lifecycle automatically:
| Hook | Script | What it does |
|---|---|---|
setup |
bin/dev-setup |
Copies .env from main worktree, installs deps, symlinks skills |
archive |
bin/dev-teardown |
Removes skill symlinks, cleans up .claude/ directory |
When Conductor creates a new workspace, bin/dev-setup runs automatically. It detects the main worktree (via git worktree list), copies your .env so API keys carry over, and sets up dev mode — no manual steps needed.
First-time setup: Put your ANTHROPIC_API_KEY in .env in the main repo (see .env.example). Every Conductor workspace inherits it automatically.
- SKILL.md files are generated. Edit the
.tmpltemplate, not the.md. Runbun run gen:skill-docsto regenerate. - Browse source changes need a rebuild. If you touch
browse/src/*.ts, runbun run build. - Dev mode shadows your global install. Project-local skills take priority over
~/.claude/skills/gstack.bin/dev-teardownrestores the global one. - Conductor workspaces are independent. Each workspace is its own git worktree.
bin/dev-setupruns automatically viaconductor.json. .envpropagates across worktrees. Set it once in the main repo, all Conductor workspaces get it..claude/skills/is gitignored. The symlinks never get committed.
When you're developing gstack in one workspace and want to test your branch in a different project (e.g. testing browse changes against your real app), there are two cases depending on how gstack is installed in that project.
Point your global install at the branch:
cd ~/.claude/skills/gstack
git fetch origin
git checkout origin/<branch> # e.g. origin/v0.3.2
bun install # in case deps changed
bun run build # rebuild the binaryNow open Claude Code in the other project — it picks up skills from
~/.claude/skills/ automatically. To go back to main when you're done:
cd ~/.claude/skills/gstack
git checkout main && git pull
bun run buildSome projects vendor gstack by copying it into the repo (no .git inside the
copy). Project-local skills take priority over global, so you need to update
the vendored copy too. This is a three-step process:
-
Update your global install to the branch (so you have the source):
cd ~/.claude/skills/gstack git fetch origin git checkout origin/<branch> # e.g. origin/v0.3.2 bun install && bun run build
-
Replace the vendored copy in the other project:
cd /path/to/other-project # Remove old skill symlinks and vendored copy for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s done rm -rf .claude/skills/gstack # Copy from global install (strips .git so it stays vendored) cp -Rf ~/.claude/skills/gstack .claude/skills/gstack rm -rf .claude/skills/gstack/.git # Rebuild binary and re-create skill symlinks cd .claude/skills/gstack && ./setup
-
Test your changes — open Claude Code in that project and use the skills.
To revert to main later, repeat steps 1-2 with git checkout main && git pull
instead of git checkout origin/<branch>.
When you're happy with your skill edits:
/shipThis runs tests, reviews the diff, bumps the version, and opens a PR. See ship/SKILL.md for the full workflow.