TheWorkshop Open Source Edition is a skill for Codex and Claude Code that runs mixed coding and non-coding work in a structured, auditable way.
It turns ambiguous requests into a living execution workflow:
Project -> Workstreams -> Jobs
with explicit gates, orchestration, monitoring, lessons learned, and spend visibility.
When this repo says "project OS", it means:
- a repeatable workflow system for the agent
- not an operating system
- implemented as a skill the agent runs inside Codex/Claude Code
The diagram below reflects the current OSS baseline:
- planning metadata and agreement live in the control plane
- native Codex subagents are the execution runtime
- telemetry, staged learning, and explicit closeout keep delegation truthful
The repo now ships with a self-contained explainer page at:
It is designed to open locally after install and covers:
- the broad Codex subagent model
- when delegation is worth the coordination cost
- built-in roles versus repo-scoped TheWorkshop roles
- dispatch, manual, and loop execution paths
- staged lessons and durable memory promotion
- explicit manual closeout through
theworkshop agent-closeout
The repo also includes an optional outer-loop harness in autoresearch/ for refining the skill surface without opening up the whole repository to autonomous mutation.
- control file:
autoresearch/program.md - benchmark packs:
autoresearch/benchmark-pack.fast.json,autoresearch/benchmark-pack.full.json - evaluator:
python3 scripts/skill_autoresearch_eval.py - scored contract benchmark:
python3 scripts/skill_surface_contract_score.py --repo .
The default writable surface is limited to operator-facing docs and repo-local agent definitions, and results can be logged to the ignored state/autoresearch/results.tsv path. Benchmark-maintenance commits should be kept separate from scored skill-surface experiments, because the scope guard intentionally rejects changes outside that writable surface.
Current scored seams include:
- delegated-role grounding to the current job plan and verification path
- durable blocker evidence instead of hidden status narration
- truthful manual/external delegation telemetry and exactly-once closeout
- staged learning with curator-only durable promotion
- context-lock propagation so delegated and looped work reopen
context_ref, honor locked decisions, and keep deferred ideas out of scope
- A Codex/Claude Code skill, not a standalone app
- A structured runtime for mixed coding and non-coding projects
- Agreement-gated before execution starts
- Truth-gated and reward-gated before completion claims
- Parallel-orchestration aware (native Codex subagents for independent bounded jobs)
- Dashboard-first monitoring with token/spend telemetry
- A repo-owned
WORKFLOW.mdcontract for unattended local execution
- A replacement for human strategic ownership
- A generic code framework or web app product
- A system that marks work complete on artifact presence alone
This repository is the public OSS baseline for TheWorkshop.
- It defines the portable local framework and the optional adapters that ship in the public repo.
- It does not claim to contain or standardize private/custom operator workflows.
- Richer local/custom versions can exist separately, but they are outside the contract of this repository.
The repo remains Codex-first in top-level docs, but the public package is intentionally usable as a portable local framework.
Portable/core path:
- project/workstream/job lifecycle and gates
- dashboard build/projector and local monitoring
- workflow contract and workflow runner
- schemas, examples, and regression suite
Optional adapters:
- Codex session telemetry / CodexBar spend
- Gemini / OpenAI council planning
- Apple Keychain credential path
- imagegen skill bridge
- GitHub mirroring
Capability matrix:
| Capability | Portable core | Codex-enhanced | Optional adapter |
|---|---|---|---|
| Lifecycle / gates | Yes | Yes | No |
| Dashboard / runtime | Yes | Yes | No |
| Workflow runner | Yes | Yes | No |
| Billing / spend telemetry | Fallback / unknown | Yes | Codex session logs / CodexBar |
| Council planning | Dry-run only | Yes | Gemini / OpenAI |
| Image generation | No | Yes | imagegen skill / keychain |
| GitHub mirror | No | Yes | gh |
Project: top-level outcome and success definitionWorkstream: coherent thread in support of project goalJob(Work Item): smallest executable/verifiable unitWave(optional): timeboxed grouping across workstreams
Completion promises are explicit:
<promise>{ID}-DONE</promise>
A job can only transition to done when all gates pass:
- Agreement gate (scope accepted before execution)
- Dependency/freshness gate (inputs are current)
- TruthGate (verification of correctness)
- Reward gate (meets
reward_target)
Execution-context discipline:
- ambiguous work should be locked with
theworkshop discussbefore execution - jobs that require that lock should carry
context_required: trueandcontext_ref - delegated and looped work should reopen the context file, treat locked decisions as binding, and keep deferred ideas out of scope until the lock is refreshed
Execution quality defaults:
job_start.pyauto-applies ranked lessons into# Relevant Lessons Learned(override:--no-apply-lessons).plan_check.pywarns on weak placeholder content forplannedjobs and hard-fails weak content forin_progress/donejobs.
# One command from the repo root
git clone https://github.com/CongressionalInsights/theworkshop.git
mkdir -p "$CODEX_HOME/skills"
cp -R theworkshop "$CODEX_HOME/skills/theworkshop"Typical destination:
$CODEX_HOME/skills/theworkshop- usually
~/.codex/skills/theworkshop
To update later:
cd "$CODEX_HOME/skills/theworkshop" && git pull origin main# create project
python3 scripts/project_new.py --name "Workshop Demo"
# inspect the generated execution contract
python3 scripts/workflow_check.py --project /path/to/project
# add workstream + job
python3 scripts/workstream_add.py --project /path/to/project --title "Research"
python3 scripts/job_add.py --project /path/to/project --workstream WS-YYYYMMDD-001 --title "Draft options memo"
python3 scripts/job_add.py --project /path/to/project --workstream WS-YYYYMMDD-001 --title "Attribution sweep" --job-profile investigation_attribution
python3 scripts/job_add.py --project /path/to/project --workstream WS-YYYYMMDD-001 --title "Entity resolution" --job-profile identity_resolution
python3 scripts/discuss.py --project /path/to/project --work-item-id WI-YYYYMMDD-001 --decision "Use concise format" --required --no-interactive
# validate and orchestrate
python3 scripts/plan_check.py --project /path/to/project
python3 scripts/schema_validate.py --project /path/to/project
python3 scripts/optimize_plan.py --project /path/to/project
python3 scripts/orchestrate_plan.py --project /path/to/project
python3 scripts/dispatch_orchestration.py --project /path/to/project --dry-run
python3 scripts/council_plan.py --project /path/to/project --dry-run
python3 scripts/workflow_runner.py --project /path/to/project --once
python3 scripts/workflow_runner.py --project /path/to/project --detach
# execute one job
python3 scripts/job_start.py --project /path/to/project --work-item-id WI-YYYYMMDD-001
python3 scripts/job_start.py --project /path/to/project --work-item-id WI-YYYYMMDD-001 --lessons-limit 5 --lessons-include-global
python3 scripts/verify_work.py --project /path/to/project --work-item-id WI-YYYYMMDD-001
python3 scripts/job_complete.py --project /path/to/project --work-item-id WI-YYYYMMDD-001 --cascade
# optional utility lanes
python3 scripts/health.py --project /path/to/project --repair
python3 scripts/quick.py --project /path/to/project --title "One-off patch" --command "echo done"
python3 scripts/dashboard_server.py --project /path/to/project --openEvery project now gets a WORKFLOW.md file at the project root. It is the repo-owned execution
contract for unattended runs: polling cadence, dispatch defaults, pre/post cycle hooks, and the
shared execution-policy prompt prepended to delegated work-item prompts.
When parallel work is justified, TheWorkshop treats native Codex subagents as the default delegation runtime. TheWorkshop orchestration decides which jobs are safe to delegate; the parent thread stays responsible for planning, integration, and final synthesis.
TheWorkshop also treats learning capture as a first-class runtime concern:
- shared cross-repo agents can live in
~/.codex/agents/ - repo-specific workshop agents live in
.codex/agents/ - delegated and looped work may read durable memory, but should stage new durable findings in:
.theworkshop/memory-proposals/*.json.theworkshop/lessons-candidates/*.json
- only curator agents or the parent thread should promote those staged findings into:
$CODEX_HOME/memories/projects/*.mdnotes/lessons-learned.md
- manual/external delegated runs should use
theworkshop agent-logfor intermediate telemetry andtheworkshop agent-closeoutonce for terminal closeout plus staged learning promotion
Expected core outputs:
outputs/dashboard.htmloutputs/dashboard.jsonoutputs/dashboard.mdoutputs/<date>-task-tracker.csvlogs/execution.jsonlartifacts/truth-report.jsonnotes/context/<WS-or-WI>-CONTEXT.mdoutputs/uat/<run-id>-UAT.mdoutputs/uat/<run-id>-UAT.jsonoutputs/health.jsonquick/<id>-<slug>/plan.mdquick/<id>-<slug>/summary.md
- Dashboard auto-opens best-effort once per session at execution start (unless disabled)
- Auto-refresh supports stale detection and pause/resume
- Optional local live transport:
python3 scripts/dashboard_server.py --project /path/to/project- serves
dashboard.htmloverhttp://127.0.0.1:* - publishes
/eventsSSE updates so the page can switch from poll mode to live mode
- serves
monitor_runtime.pyowns dashboard open/watch/serve/stop/cleanup so repeated lifecycle events reuse the same runtime instead of spawning fresh browser/server state- Project terminal closeout prunes transient runtime artifacts while preserving canonical outputs, logs, and dashboard artifacts
- Cost display is billing-aware when the Codex telemetry adapter is available:
subscription_auth: billed cost shown as$0marginal, API-equivalent shown secondarilymetered_api: billed cost from exact telemetry when availableunknown: estimate-first fallback
Use work-item scoped image generation through the optional imagegen adapter:
python3 scripts/imagegen_job.py --project /path/to/project --work-item-id WI-YYYYMMDD-002
python3 scripts/imagegen_job.py --project /path/to/project --work-item-id WI-YYYYMMDD-002 --credential-provider env
python3 scripts/imagegen_job.py --project /path/to/project --work-item-id WI-YYYYMMDD-002 --credential-provider keychain --approve ttl:1hSet one provider before first run:
export THEWORKSHOP_IMAGEGEN_API_KEY=...Compatibility for existing local setups:
export OPENAI_API_KEY=...Optional legacy keychain flow:
export THEWORKSHOP_IMAGEGEN_CREDENTIAL_SOURCE=keychain
export THEWORKSHOP_KEYCHAIN_SERVICE=OPENAI_KEYThe apple-keychain skill remains optional and cross-platform fallback behavior is env-first.
python3 scripts/doctor.py --profile codex
python3 scripts/doctor.py --profile portable
cd scripts && for t in *_test.py; do python3 "$t"; done./scripts/install_skill.sh --forceUse --link for a symlinked dev install.
- Contribution guidelines: CONTRIBUTING.md
- Support boundaries: SUPPORT.md
- Security reporting: SECURITY.md
- Stable
v0.1.0baseline for Project -> Workstreams -> Jobs control plane - TruthGate + stale invalidation + orchestration artifacts
- Billing-aware spend in dashboard
- More robust synthetic scenario suite for document-quality outcomes
- Additional dashboard drilldowns for truth/reward failure analysis
- GitHub mirror ergonomics and dry-run diagnostics
- Optional docs site for deeper operators manual
- Broader template library for non-coding domains
- Extended export/report bundles for stakeholder handoff
theworkshop/
README.md
SKILL.md
CHANGELOG.md
scripts/
references/
examples/
docs/subagents.html
docs/assets/
.github/
MIT. See LICENSE.

