From 4efe1ffc4c6c57bdd69fb049e46f4311250cb84c Mon Sep 17 00:00:00 2001
From: PavelMakarchuk
- - This is the story of how we turned a general-purpose AI into a - domain expert. - -
Date: Wed, 25 Feb 2026 11:31:52 -0500
Subject: [PATCH 2/5] Fix timeline order, update dates, and polish copy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Reorder milestones chronologically: Feb → Apr → Aug → Oct → Dec → Feb
- Move first experiments to Apr 2025, skills/agents to Aug 2025
- "rough" → "mixed", "state regulation" → "primary sources"
- Remove "—not existing code", rephrase plugin discovery line
Co-Authored-By: Claude Opus 4.6
Anthropic releases Claude Code, a CLI coding agent. We adopt it
immediately and discover it’s remarkably useful for
- research—like having the Claude web chat but right in your
- terminal. We stop switching to the browser and start doing everything
- from the command line.
+ research—having the Claude web chat but right in your
+ terminal. We slowly migrate to the command line from the browser.
- For coding, the early wins are the repetitive tasks nobody wants to
- do by hand: renaming files, updating import paths, bulk reformatting.
+ For coding, the early wins are the repetitive tasks: renaming
+ files, updating import paths, bulk reformatting.
Claude handles them instantly. It’s not writing policy logic
yet—but it’s already saving hours a week.
- Anthropic adds an open plugin architecture to Claude Code. Plugins
- let you package domain knowledge, custom agents, and automated
- workflows into something Claude loads at runtime. No fine-tuning, no
- training data.
-
- We see the potential immediately. PolicyEngine models tax and benefit
- policy across 40+ repositories, thousands of parameters, and dozens
- of government programs. Every implementation has to match real
- legislation. We need an AI that can navigate all of it.
-
We point Claude at our codebase and start asking it to implement
- government programs. The results are rough. It hardcodes dollar
+ government programs. The results are mixed. It hardcodes dollar
amounts that should live in parameter files. It mixes up federal and
state logic. It skips regulatory sources and guesses at eligibility
rules.
@@ -160,19 +139,18 @@ const milestones: Milestone[] = [
A typical failure: Claude implements a state TANF program by copying
patterns from another state, changing a few numbers, and calling it
done. The code compiles, the tests pass—but the income
- thresholds are wrong because it never read the actual state
- regulation.
+ thresholds are wrong because it never read the primary sources.
The model is capable. The problem is context. Claude doesn’t
know how PolicyEngine is structured, what our conventions are, or
- that the law is the source of truth—not existing code.
+ that the law is the source of truth.
+ Anthropic adds an open plugin architecture to Claude Code. Plugins
+ let you package domain knowledge, custom agents, and automated
+ workflows into something Claude loads at runtime. No fine-tuning, no
+ training data.
+
+ This changes how we work. PolicyEngine models tax and benefit
+ policy across 40+ repositories, thousands of parameters, and dozens
+ of government programs. Every implementation has to match real
+ legislation. We need an AI that can navigate all of it.
+ The story of turning a general-purpose AI into a policy expert
+ The story of turning a general-purpose AI into a policy expert
+
See how you can use what we built →
diff --git a/src/components/plugin-blog/TimelineSection.tsx b/src/components/plugin-blog/TimelineSection.tsx
index c36e1b1..bf539a6 100644
--- a/src/components/plugin-blog/TimelineSection.tsx
+++ b/src/components/plugin-blog/TimelineSection.tsx
@@ -1,5 +1,5 @@
-import type { ReactNode } from 'react';
-import { AnimatedSection } from '../common/AnimatedSection';
+import type { ReactNode } from "react";
+import { AnimatedSection } from "../common/AnimatedSection";
/* ---------- sub-components ---------- */
@@ -12,7 +12,7 @@ const PipelineStep = ({
}) => (
<>
{label}
@@ -24,26 +24,26 @@ const PipelineStep = ({
const ideas = [
{
- title: 'Documentation Pointers, Not Stale Examples',
+ title: "Documentation Pointers, Not Stale Examples",
paragraphs: [
"Most knowledge bases rot. You write example code, the codebase evolves, and now your examples are wrong.",
- 'We solved this by having skills point to live code in the active repository instead of hardcoding examples. When Claude reads a skill, it gets instructions like \u201clook at the current implementation in the relevant country model repo\u201d\u2014always fresh, always branch-aware, zero maintenance.',
+ "We solved this by having skills point to live code in the active repository instead of hardcoding examples. When Claude reads a skill, it gets instructions like \u201clook at the current implementation in the relevant country model repo\u201d\u2014always fresh, always branch-aware, zero maintenance.",
],
},
{
- title: 'Legal Code Is the Source of Truth',
+ title: "Legal Code Is the Source of Truth",
paragraphs: [
"When implementing a government benefit program, the temptation is to copy what another jurisdiction does and tweak it. Our agents are instructed differently: read the actual regulation first, understand what the law says, then implement exactly that. Pattern-matching across jurisdictions is a tool, not a shortcut.",
],
},
{
- title: 'Zero Hard-Coding',
+ title: "Zero Hard-Coding",
paragraphs: [
- 'Every dollar amount, every threshold, every phase-out rate lives in a parameter file\u2014never as a magic number in code. This is what makes PolicyEngine work: you can simulate policy reforms by changing parameters alone. Our agents enforce this automatically.',
+ "Every dollar amount, every threshold, every phase-out rate lives in a parameter file\u2014never as a magic number in code. This is what makes PolicyEngine work: you can simulate policy reforms by changing parameters alone. Our agents enforce this automatically.",
],
},
{
- title: 'Claude Policing Claude',
+ title: "Claude Policing Claude",
paragraphs: [
"We use Claude Code hooks\u2014prompts that run before or after tool calls\u2014to enforce architectural rules. When Claude writes a file, another Claude prompt checks whether tax logic ended up somewhere it shouldn\u2019t be. If it did, the write gets blocked.",
"The same mechanism auto-detects which PolicyEngine repo you\u2019re in and routes you to the right specialized agents. Open Claude Code in any PolicyEngine repository and it knows which agents to load\u2014country models get the rules-engineer, the API repos get the api-reviewer, the frontend gets the app-reviewer.",
@@ -53,43 +53,43 @@ const ideas = [
const lessons = [
{
- bold: 'Structure beats volume.',
- text: ' A well-organized 200-line skill file is worth more than a 2,000-line knowledge dump. Claude works best when knowledge is modular and clearly scoped.',
+ bold: "Structure beats volume.",
+ text: " A well-organized 200-line skill file is worth more than a 2,000-line knowledge dump. Claude works best when knowledge is modular and clearly scoped.",
},
{
- bold: 'Agents need constraints, not just capabilities.',
+ bold: "Agents need constraints, not just capabilities.",
text: " The most impactful additions weren\u2019t new features\u2014they were guardrails. Regulatory checkpoints. Architecture enforcement. The rule that agents must read the law before writing code.",
},
{
- bold: 'Plugins are prompt engineering at scale.',
+ bold: "Plugins are prompt engineering at scale.",
text: " You\u2019re not training a model. You\u2019re building a structured context that makes a general-purpose model behave like a domain expert. That\u2019s powerful and accessible\u2014anyone can do it.",
},
];
const researcherCapabilities = [
{
- title: 'Population-Level Impact Analysis',
+ title: "Population-Level Impact Analysis",
paragraphs: [
- 'The analysis-tools plugin turns Claude into a microsimulation analyst. Point it at any tax or benefit reform and it runs population-level analysis using PolicyEngine\u2019s weighted survey data\u2014covering income, demographics, and household structure for the entire US population.',
- 'The result: cost estimates, revenue projections, and counts of who wins and who loses under a proposed change\u2014all generated from a plain-English description of a policy reform.',
+ "The analysis-tools plugin turns Claude into a microsimulation analyst. Point it at any tax or benefit reform and it runs population-level analysis using PolicyEngine\u2019s weighted survey data\u2014covering income, demographics, and household structure for the entire US population.",
+ "The result: cost estimates, revenue projections, and counts of who wins and who loses under a proposed change\u2014all generated from a plain-English description of a policy reform.",
],
},
{
- title: 'Distributional and Inequality Analysis',
+ title: "Distributional and Inequality Analysis",
paragraphs: [
- 'Beyond aggregate numbers, Claude breaks down impacts by income decile, calculates changes to the Gini coefficient, and measures effects on poverty rates. You get the full distributional picture\u2014who bears the cost and who receives the benefit\u2014without writing a single line of analysis code.',
+ "Beyond aggregate numbers, Claude breaks down impacts by income decile, calculates changes to the Gini coefficient, and measures effects on poverty rates. You get the full distributional picture\u2014who bears the cost and who receives the benefit\u2014without writing a single line of analysis code.",
],
},
{
- title: 'Congressional District Analysis',
+ title: "Congressional District Analysis",
paragraphs: [
- 'Using geographic microdata from HuggingFace datasets, Claude can map reform impacts to every congressional district. This turns abstract national estimates into localized numbers that matter for legislative strategy and constituent communication.',
+ "Using geographic microdata from HuggingFace datasets, Claude can map reform impacts to every congressional district. This turns abstract national estimates into localized numbers that matter for legislative strategy and constituent communication.",
],
},
{
- title: 'Dashboards and Visualizations',
+ title: "Dashboards and Visualizations",
paragraphs: [
- 'Claude doesn\u2019t just compute numbers\u2014it builds interactive tools. Streamlit dashboards, Plotly charts, and household calculators that let stakeholders explore reform scenarios themselves. The analysis becomes a shareable, interactive product.',
+ "Claude doesn\u2019t just compute numbers\u2014it builds interactive tools. Streamlit dashboards, Plotly charts, and household calculators that let stakeholders explore reform scenarios themselves. The analysis becomes a shareable, interactive product.",
],
},
];
@@ -104,28 +104,28 @@ interface Milestone {
const milestones: Milestone[] = [
{
- date: 'Feb 2025',
- title: 'Claude Code first release \u2014 we start using it',
+ date: "Feb 2025",
+ title: "Claude Code first release \u2014 we start using it",
body: (
<>
Anthropic releases Claude Code, a CLI coding agent. We adopt it
immediately and discover it’s remarkably useful for
- research—having the Claude web chat but right in your
- terminal. We slowly migrate to the command line from the browser.
+ research—having the Claude web chat but right in your terminal.
+ We slowly migrate to the command line from the browser.
- For coding, the early wins are the repetitive tasks: renaming
- files, updating import paths, bulk reformatting.
- Claude handles them instantly. It’s not writing policy logic
- yet—but it’s already saving hours a week.
+ For coding, the early wins are the repetitive tasks: renaming files,
+ updating import paths, bulk reformatting. Claude handles them
+ instantly. It’s not writing policy logic yet—but
+ it’s already saving hours a week.
@@ -143,78 +143,79 @@ const milestones: Milestone[] = [
The model is capable. The problem is context. Claude doesn’t
- know how PolicyEngine is structured, what our conventions are, or
- that the law is the source of truth.
+ know how PolicyEngine is structured, what our conventions are, or that
+ the law is the source of truth.
We start writing skill files—structured
documents Claude reads at runtime. Variable naming conventions.
- Parameter file structures. How PolicyEngine’s tax-benefit
- logic is organized. How to properly use
- Then come specialized agents. A{' '}
+ Then come specialized agents. A{" "}
- The difference is immediate. Claude stops guessing. It reads the
- skill file, understands the convention, and follows the pattern.
- Error rates on parameter structure drop to near zero.
+ The difference is immediate. Claude stops guessing. It reads the skill
+ file, understands the convention, and follows the pattern. Error rates
+ on parameter structure drop dramatically.
- Anthropic adds an open plugin architecture to Claude Code. Plugins
- let you package domain knowledge, custom agents, and automated
- workflows into something Claude loads at runtime. No fine-tuning, no
- training data.
+ Anthropic adds an open plugin architecture to Claude Code. Plugins let
+ you package domain knowledge, custom agents, and automated workflows
+ into something Claude loads at runtime. No fine-tuning, no training
+ data.
- This changes how we work. PolicyEngine models tax and benefit
- policy across 40+ repositories, thousands of parameters, and dozens
- of government programs. Every implementation has to match real
- legislation. We need an AI that can navigate all of it.
+ This changes everything. The skills and agents we’ve been
+ building since August can now be packaged as a proper
+ plugin—portable, versioned, and shareable. We consolidate our
+ scattered skill files into a single plugin that loads automatically in
+ any PolicyEngine repository.
Individual agents are useful, but the real power comes from chaining
- them. We build orchestrated commands—multi-agent
- pipelines that run end-to-end workflows.{' '}
+ them. We build orchestrated commands
+ —multi-agent pipelines that run end-to-end workflows.{" "}
- We also build
- Opus 4.6 launched with agent teams—multiple agents
- collaborating in parallel within a single session. With{' '}
-
- Agent teams solved this by splitting the work: a discovery agent
- finds historical PDFs, prep agents download and render them, and
- multiple research agents read different documents in
- parallel—communicating directly with each other, not through
- a central coordinator. This made
- These numbers keep growing. We’re still building new skills,
- agents, and commands every week—and not just for coding
- policy. New workflows cover writing policy analysis, building
- interactive dashboards, and generating content. The plugin is
- expanding beyond implementation into every part of how we work.
+ We’re still building new skills, agents, and commands every
+ week—and not just for coding policy. New workflows cover writing
+ policy analysis, building interactive dashboards, and generating
+ content. The plugin is expanding beyond implementation into every part
+ of how we work.
- Our most complex workflow chains 6+ agents across
- 8 phases—from reading the law to pushing a validated PR.{' '}
+ Our most complex workflow chains 6+ agents across 8
+ phases—from reading the law to pushing a validated PR.{" "}
@@ -346,14 +348,14 @@ export const TimelineSection = () => {
Beyond building the plugin, we focused on what policy researchers
- actually need from an AI assistant. See our{' '}
+ actually need from an AI assistant. See our{" "}
multi-agent AI workflow post
- {' '}
+
- We built this for ourselves and we’re making it public.
- See what the plugin can do, or explore the source code.
+
+ We built this for ourselves and we’re making it public. See
+ what the plugin can do, or explore the source code.
- Beyond building the plugin, we focused on what policy researchers
- actually need from an AI assistant. See our{" "}
-
- multi-agent AI workflow post
- {" "}
- for a deep dive into how these capabilities work in practice.
- {p} {p}
- {l.bold}
- {l.text}
-
See how you can use what we built →
diff --git a/src/components/plugin-blog/TimelineSection.tsx b/src/components/plugin-blog/TimelineSection.tsx
index 9161d8e..885d687 100644
--- a/src/components/plugin-blog/TimelineSection.tsx
+++ b/src/components/plugin-blog/TimelineSection.tsx
@@ -337,7 +337,7 @@ export const TimelineSection = () => {
}}
>
See what it can do
diff --git a/src/pages/PluginBlog.tsx b/src/pages/PluginBlog.tsx
index d0f40d4..3445882 100644
--- a/src/pages/PluginBlog.tsx
+++ b/src/pages/PluginBlog.tsx
@@ -6,6 +6,6 @@ export const PluginBlog = () => (
How We Built AI-Powered Policy Analysis
adds and{' '}
- subtracts annotations. Which patterns to follow and
- which to avoid.
+ Parameter file structures. How PolicyEngine’s tax-benefit logic
+ is organized. How to properly use adds and{" "}
+ subtracts annotations. Which patterns to follow and which
+ to avoid.
rules-engineer that knows how to implement tax
- variables—it reads the regulation, creates parameter files
- with proper metadata, and writes vectorized formulas. A{' '}
+ variables—it reads the regulation, creates parameter files with
+ proper metadata, and writes vectorized formulas. A{" "}
test-creator that builds integration tests from real
- household scenarios. A document-collector that
- researches government regulations and extracts the specific
- provisions needed before anyone writes a line of code.
+ household scenarios. A document-collector that researches
+ government regulations and extracts the specific provisions needed
+ before anyone writes a line of code.
/encode-policy is the flagship: it takes a government
program from legal text to working, tested code.
/review-pr for automated code review,{' '}
- /fix-pr to resolve CI failures, and{' '}
+ We also build /review-pr for automated code review,{" "}
+ /fix-pr to resolve CI failures, and{" "}
/audit-state-tax for tax implementation audits. Each
command encodes a workflow our team runs daily—now automated
with guardrails built in.
@@ -223,39 +224,38 @@ const milestones: Milestone[] = [
),
},
{
- date: 'Feb 2026',
- title: 'A new level',
+ date: "Feb 2026",
+ title: "Agent teams unlock parallel research",
body: (
<>
/encode-policy already working well, agent teams
- unlocked workflows that weren’t feasible before. The biggest:
- backdating historical policy.
+ Opus 4.6 launched with agent teams—multiple agents collaborating
+ in parallel within a single session. With /encode-policy{" "}
+ already working well, agent teams unlocked workflows that
+ weren’t feasible before. The biggest: backdating historical
+ policy.
/backdate-policy{' '}
- possible and dramatically expanded what a single session could
- accomplish.
+ Agent teams solved this by splitting the work: a discovery agent finds
+ historical PDFs, prep agents download and render them, and multiple
+ research agents read different documents in
+ parallel—communicating directly with each other, not through a
+ central coordinator. This made /backdate-policy possible
+ and dramatically expanded what a single session could accomplish.
What We Built
+ The Pipeline
What Researchers Can Do
Try it yourself
- What Researchers Can Do
- What Made It Work
{cap.title}
- {cap.paragraphs.map((p, i) => (
- The Ideas That Made It Work
- {idea.title}
- {idea.paragraphs.map((p, i) => (
- What We Learned
+