Add /backdate-program and /review-program commands#84
Merged
Conversation
New orchestration command that coordinates multi-agent workflows to: - Research historical regulatory sources (parallel PDF discovery, prep, extraction) - Audit reference quality (broken URLs, generic statutes, session law migration) - Review formula correctness (unused params, zero-sentinel anti-patterns) - Implement parameter backdating (YAML date entries, reference fixes) - Run built-in /review-pr and /audit-state-tax as validation phases - Generate comprehensive tests (transition boundaries, all dimensions) Key design decisions: - Main Claude only orchestrates; all work delegated to agents (context protection) - Agent-to-agent communication via SendMessage (no Main Claude relay) - Data flows through files on disk; Main Claude reads only short summaries - Works for any state program (TANF, SNAP, Medicaid, etc.), not just TANF - Incorporates lessons learned from Utah and Connecticut backdating runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…age rule - Replace 14 generic agents with 10 specialized plugin agents: document-collector, reference-validator, program-reviewer, parameter-architect, rules-engineer, test-creator, edge-case-generator, implementation-validator, ci-fixer, pr-pusher - Only 6 general-purpose agents remain (PDF rendering, research, consolidation) - Add agent summary table documenting why each type was chosen - Add towncrier changelog format (changelog.d/<branch>.<type>.md) - Add global PDF page number rule (#page=XX required on all PDF refs, except single-page PDFs) - Integrate /review-pr and /audit-state-tax as Phase 6 built-in review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default remains 300 DPI. Use --600dpi for scanned docs, poor-quality PDFs, or dense tables that agents struggle to read at 300 DPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combines code validation (4 plugin agents) and PDF audit (2-5 agents) into a single command with PDF acquisition always on by default. Updates /backdate-program Phase 6 to invoke /review-program instead of two separate commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both commands now split large PDFs across multiple parallel agents (~40 pages max per agent). Main Claude decides agent count using only the page count number from the manifest/prep agent — never reads PDF content itself. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORCHESTRATOR ONLY section with explicit MUST NOT / DO rules - Phase 1: delegate diff analysis to general-purpose agent (writes context summary to disk); Main Claude only runs gh commands + saves diff to file - Phase 3: Main Claude reads only two short summaries (context + manifest) - Phase 5C: delegate 600 DPI mismatch verification to agents - Phase 5D: delegate page number verification to agents - Phase 7: use gh pr comment --body-file (no file read into context); local mode uses display-agent to present report - Fix Explore → general-purpose for agents that need Write tool - Fix stale task table reference in backdate-program Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 6 now runs /review-program and fixes critical issues in a loop until zero critical issues remain (max 3 rounds). Round 2+ asks user before continuing. Each round: full review → fix criticals → run tests → re-review. Catches regressions from fixes and cascading issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gh pr diff fetches from GitHub remote API, so local-only commits are invisible. Each fix round now commits AND pushes so the next review round sees the updated code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Phase 0B: issue-manager finds/creates tracking issue + draft PR (runs in parallel with inventory) - Phase 6: review-fix loop now commits + pushes between rounds (gh pr diff reads from remote, needs pushed code) - Phase 7B: reporter writes PR description with unresolved items section for human decision-making - Phase 7C: gh pr edit --body-file updates PR description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Group both agents under a single step with explicit "spawn in one message" instruction. Move the results collection to after both agents complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main Claude reads only a 10-line summary with counts and the program path. The full inventory with all file paths stays on disk for agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skills have 0% coverage for these critical patterns. Embed concrete examples from CT TFA directly into the parameter-architect (Tier A/B) and rules-engineer (Tier B/C) prompts: - Pattern 1: in_effect boolean for provisions with a start date (parameter side + variable side with if p.flag:) - Pattern 2: regional_in_effect for region-based variation (parameter side + variable side with select()) - Explains when to use if p.flag: (scalar) vs where() (vectorized) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…le skills Both skills previously only covered the flat_applies transition pattern. Now include provision gating (in_effect) and regional variation (regional_in_effect) patterns with real CT TFA production code examples and a comparison table showing when to use each of the three boolean toggle approaches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Incorporates the mismatch verification approach from PR #71 (audit-state-tax Phase 5.5). Before 600 DPI visual verification, a code-path verifier traces whether the flagged parameter is actually reachable in the target year's computation. This filters false positives from parameters gated by in_effect booleans, deprecated branches, or overriding parameters. Phase 5 now has two-stage verification: - Step 5C: Code-path tracing (CONFIRMED/REJECTED/INCONCLUSIVE) - Step 5D: 600 DPI visual verification (only for CONFIRMED/INCONCLUSIVE) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three-layer learning system:
Layer 1 (session): Fix agents in the review-fix loop append to a session
checklist (/tmp/{st}-{prog}-checklist.md). Subsequent fix agents read it
to avoid repeating the same mistakes within a single run.
Layer 2 (persistent): After the workflow completes, a lesson-extractor
agent generalizes session fixes into reusable rules and appends to
~/.claude/projects/.../memory/agent-lessons.md (max 50 entries, pruned).
Layer 3 (shared): New lessons are proposed as a PR to policyengine-claude
repo (lessons/agent-lessons.md). Only one open lessons PR at a time —
multiple runs append to the same PR until a maintainer merges.
Implementation agents (parameter-architect, rules-engineer) now load
lessons files on startup to prevent known mistakes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…m, and skill patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added to both country-models and complete plugin command arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rogram - reference-validator: add Write (needs to write findings to /tmp/) - program-reviewer: add Write (needs to write audit reports to /tmp/) - edge-case-generator: add Edit (needs to edit existing test files) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
review-program: - Fix Global Rules numbering gap (missing rule 6) - Fix agent count in summary (no Explore agents used) backdate-program: - Fix Phase 5 Quick Audit: Explore agent → general-purpose (needs Write) - Fix Files on Disk table reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Step 5C push-to-remote before Phase 6 review-fix loop - Add --local-diff flag to /review-program for unpushed work - Add --skip-pdf flag for infrastructure/refactoring PRs - Add /tmp cleanup at start of both commands - Make /review-program work for any PR type (scope-aware agent selection) - Use temporary clone in Phase 8C instead of modifying plugin directory - Add fork-based fallback when user lacks push access to plugin repo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a boolean parameter (in_effect, regional_in_effect, flat_applies) changes value at date D, the gated parameters must have entries covering that date. Without this, PolicyEngine silently backward-extrapolates a later value, producing incorrect historical amounts (e.g., CT TFA FY2023 gap where regional_in_effect flipped but statewide amount.yaml started 15 months later). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plugin agents have the Skill tool but workflow prompts never told them to load relevant skills. Added explicit Load skills instructions to: review-program: - Validator 1 (program-reviewer): variable-patterns, parameter-patterns - Validator 2 (reference-validator): parameter-patterns - Validator 3 (implementation-validator): variable-patterns, parameter-patterns, code-style, period-patterns - Validator 4 (edge-case-generator): testing-patterns, period-patterns backdate-program: - edge-case-generator (Phase 4B): testing-patterns, period-patterns - implementation-validator (Phase 5A): variable-patterns, parameter-patterns, code-style, period-patterns - review-fixer rules-engineer (Phase 6C): explicit skills replacing vague "Load appropriate skills" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PDF audit agents, code-path verifiers, and visual mismatch verifiers
are general-purpose agents with inline prompts. Without skills loaded,
they lack knowledge of PolicyEngine parameter structure, variable
patterns, period handling, and boolean toggle patterns — making them
more likely to produce false positives or miss real issues.
- pdf-audit-{topic}: parameter-patterns, period-patterns
- verifier-codepath-{N}: variable-patterns, parameter-patterns, period-patterns, code-style
- verifier-mismatch-{N}: parameter-patterns, period-patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PolicyEngine-US has hundreds of existing variables for common concepts (fpg, smi, tanf_fpg, is_tanf_enrolled, ssi, etc.). Agents should search the codebase before creating new non-program-specific variables. Implementation side (backdate-program): - parameter-architect, rules-engineer, review-fixer: "Grep the codebase before creating ANY non-program-specific variable" Validation side (both commands): - program-reviewer: flag reinvented variables as CRITICAL - implementation-validator: duplicate variable detection via Grep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two new multi-agent commands and updates skills with new patterns. Version bump to 3.11.0.
New Commands
/review-program— Consolidated PR ReviewReplaces the need to run
/review-prand/audit-state-taxseparately. Single command that runs code validation + PDF audit in one pass.in_effectgates, deprecated branches)--body-fileor display locallyMain Claude is a pure orchestrator — reads only short summary files (≤30 lines), never touches diffs, PDFs, or agent reports.
/backdate-program— Historical Parameter BackdatingMulti-agent workflow to add historical date entries, fix reference quality, review formula correctness, and improve test coverage.
in_effect/regional_in_effectpatterns)/review-program --local --full, fixes criticals, commit+push, re-review (max 3 rounds)~/.claude/projects/.../memory/agent-lessons.md(max 50 entries)lessons/agent-lessons.md, one open PR at a timeSkill Updates
in_effectboolean (provision gating) andregional_in_effectboolean (regional variation) patterns with CT TFA production code examplesif p.in_effect:,if p.regional_in_effect:withselect()) and comparison table of all three boolean toggle approachesAgent Tool Fixes
reference-validator: Added Write tool (needs to write findings to/tmp/)program-reviewer: Added Write tool (needs to write audit reports to/tmp/)edge-case-generator: Added Edit tool (needs to edit existing test files)Other Changes
Files Changed
commands/review-program.mdcommands/backdate-program.mdskills/.../policyengine-parameter-patterns-skill/SKILL.mdskills/.../policyengine-variable-patterns-skill/SKILL.mdagents/reference-validator.mdagents/country-models/program-reviewer.mdagents/country-models/edge-case-generator.md.claude-plugin/marketplace.jsonCHANGELOG.md🤖 Generated with Claude Code