Skip to content

feat(ce-work): add Codex delegation mode#328

Open
mvanhorn wants to merge 5 commits intoEveryInc:mainfrom
mvanhorn:feat/codex-delegation-mode
Open

feat(ce-work): add Codex delegation mode#328
mvanhorn wants to merge 5 commits intoEveryInc:mainfrom
mvanhorn:feat/codex-delegation-mode

Conversation

@mvanhorn
Copy link
Contributor

Summary

Adds an optional "Codex Delegation" section to ce:work, parallel to the existing Swarm Mode section. When the user says "use codex for this work", implementation tasks are delegated to codex exec while planning, review, and git operations stay in the current agent.

Why

Users on Max20 plans burn significant context on code implementation. Codex Pro plans include free compute that can handle the writing work. This keeps Claude focused on planning and review.

Changes

One file changed: plugins/compound-engineering/skills/ce-work/SKILL.md (+53 lines)

The new section follows the same structure as Swarm Mode:

  • Decision table (when to use vs standard mode)
  • Conversational trigger ("use codex for this work")
  • 7-step workflow: availability check, credential extraction, prompt assembly, delegation via stdin pipe, diff review, commit, per-task error fallback

Technical details from a battle-tested implementation (open-source-contributor runs this pattern in production):

  • command -v codex for health check (graceful fallback if not installed)
  • echo "$PROMPT" | codex exec ... - to pipe long prompts via stdin (avoids shell quoting issues)
  • model_reasoning_effort config key (verified from ~/.codex/config.toml)
  • GH token extraction needed because sandbox blocks keychain access
  • Per-task fallback (one Codex failure doesn't disable it for remaining tasks)

Relation to PR #223

PR #223 (lfg-ext) replaces the entire work pipeline with external delegates in parallel worktrees. This PR adds a mode within the existing ce:work flow - same relationship as Swarm Mode has to standard mode. The two are complementary.

Testing

  • bun run release:validate passes (29 agents, 45 skills, 1 MCP server - no count changes)
  • Section is gated behind conversational trigger - standard ce:work is untouched
  • Verified codex exec invocation pattern works in production via open-source-contributor commit 588cf99

This contribution was developed with AI assistance (Claude Code).

Add an optional "Codex Delegation" section to ce:work, parallel to the
existing Swarm Mode section. When triggered by saying "use codex for this
work", ce:work delegates implementation tasks to codex exec while keeping
planning, review, and git operations in the current agent.

Uses stdin piping for long prompts, per-task fallback on failure, and
graceful degradation when Codex CLI is not installed.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5b9194975

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- Remove GH token injection from prompt (not needed for code writing,
  avoids credential exposure in model input)
- Replace shell chaining (cd && echo | codex) with discrete steps
  using a temp file, compliant with AGENTS.md shell rules
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a70a23fd7b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@tmchow
Copy link
Collaborator

tmchow commented Mar 20, 2026

@mvanhorn love this. Is it possible to refactor into a work-beta skill for time being while we test it?

@mvanhorn
Copy link
Contributor Author

@tmchow funny enough I had it in beta, but to simplify my PR I took it out. let me get it in there

@tmchow
Copy link
Collaborator

tmchow commented Mar 20, 2026

Another interesting thing I wonder is how we might add a config for this so it would use Claude vs codex by default for this phase. But it would have to be careful which coding env you're in and respect the flag. Just thinking out loud.

mvanhorn and others added 2 commits March 20, 2026 16:15
… and enforcement to ce:work

- ce:plan-beta Phase 1.1b: detect Codex delegation intent from user signals
- ce:plan: add Execution Posture step for Codex delegation detection
- ce:work: recognize plan-level Execution target signals as delegation trigger
- ce:work: add anti-rationalization enforcement to Codex Delegation Workflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ction

- Use mktemp for unique prompt files (prevents cross-task races)
- Pass prompt as argument instead of stdin redirection (AGENTS.md compliance)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71bce5cc15

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mvanhorn
Copy link
Contributor Author

Another interesting thing I wonder is how we might add a config for this so it would use Claude vs codex by default for this phase. But it would have to be careful which coding env you're in and respect the flag. Just thinking out loud.

interesting! I've only used CE in CC.

@tmchow
Copy link
Collaborator

tmchow commented Mar 20, 2026

interesting! I've only used CE in CC.

Even just in CC, it would make sense possibly for a setting you set with the /setup then you by default use Codex if you want. Otherwise it's on demand. Either way, it's probably fine as is, then we can add that later. Let me look at the skill work you've done.

@mvanhorn
Copy link
Contributor Author

Sounds good - I'll keep it on-demand for now and we can add a /setup config option in a follow-up. Let me know what you think after looking at the skill.

@tmchow
Copy link
Collaborator

tmchow commented Mar 22, 2026

@mvanhorn some thoughts after spending time reviewing and thinkign about this.

Environment Detection

The skill runs across harnesses (Claude Code, Codex, Gemini CLI, etc.). If someone runs ce:work from inside Codex, this will attempt to shell out to codex exec from within Codex's sandbox — likely failing silently or recursing. We need a guard that detects "am I already running inside Codex?" and falls back to standard mode with a clear message. Check for something like a CODEX_SANDBOX env var or equivalent before attempting delegation.

Strategy Integration

The Codex delegation section is bolted on after Swarm Mode as a standalone workflow, but ce:work already has a strategy selection step (Phase 1 Step 4) with three modes: inline, serial subagents, parallel subagents. The delegation section doesn't explain how it interacts with these.

Key questions the skill needs to answer:

  • If using subagent dispatch, does each subagent shell out to Codex for tagged tasks? (That's Claude Code → subagent → codex exec → GPT — three layers deep)
  • If using inline, does the Codex workflow replace the Phase 2 task loop for tagged tasks?
  • The 7-step Codex workflow has its own review + commit steps that duplicate Phase 2's incremental commit logic — two parallel "how to execute a task" code paths will drift

Suggestion: wire delegation into the existing strategy selection as a task-level modifier rather than a separate section. The strategy (inline/serial/parallel) still applies, but the implementation step within each task delegates to Codex instead of executing directly.

Plan Abstraction

The ce:plan and ce:plan-beta changes bake Execution target: codex directly into the planning abstraction. Plans should be tool-agnostic — today it's Codex, tomorrow it could be a different delegate. wonde if something like Execution target: external-delegate and resolve the specific tool at execution time based on config? This also sets up the /setup config option later we discussed earlier

Hardcoded Model

you use -m "gpt-5.3-codex" but that will go stale. This should either come from config (compound-engineering.local.md) or but i think omitted entirely to use Codex's default model?

Mixed Attribution

Phase 4 requires attribution like Generated with [MODEL] via [HARNESS]. When half the tasks are written by Codex and half by Claude, what should the attribution say? Needs guidance for mixed-model work... or should we just omit it?

Error Threshold

"One failure does not disable Codex for remaining tasks" is fine for transient errors, but not for systemic issues (API down, auth expired, rate-limited). Consider a consecutive-failure threshold (e.g., 3 failures in a row → disable delegation for remaining tasks)?

Beta Skill

Per our earlier discussion, refactor this into a ce:work-beta skill with disable-model-invocation: true so we can test it safely before promoting.

Already flagged by automated review (still open)

  • $(cat $CODEX_PROMPT_FILE) expands into argv, not stdin. ARG_MAX risk on larger prompts. Pipe via stdin as described in the step text
  • Shell chaining/redirection in examples violates AGENTS.md rules (no &&, pipes, >/dev/null 2>&1 in skill shell examples)

…, env guard, error threshold

- Rename "Codex Delegation" to "External Delegate Mode" for tool-agnostic plans
- Replace `Execution target: codex` with `Execution target: external-delegate` in plans
- Add environment guard to detect sandbox recursion (CODEX_SANDBOX, CODEX_SESSION_ID)
- Wire delegation into existing strategy selection as task-level modifier
- Remove hardcoded model flag - use delegate's default
- Add consecutive failure threshold (3 failures disables delegation)
- Add mixed-model attribution guidance for Phase 4
- Pipe prompt via stdin instead of argv expansion (ARG_MAX safety)
- Remove shell chaining/redirection from examples (AGENTS.md compliance)
- Add ce:work-beta skill wrapper with disable-model-invocation for safe testing
- Remove credential embedding from prompt (pass auth via env vars)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 289d14658b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mvanhorn
Copy link
Contributor Author

Took a stab at this - I like your thoughts. Heads up the description for Compound Engineering on GH says a Claude Code skill.

Addressed all points in 289d146:

Environment Detection - Added sandbox guard that checks CODEX_SANDBOX, CODEX_SESSION_ID, and .git/ write access before attempting delegation. Falls back to standard mode with a message.

Strategy Integration - Rewrote the section to frame delegation as a task-level modifier on the existing strategy (inline/serial/parallel), not a parallel workflow. The strategy still applies - delegation just changes how individual tagged tasks execute.

Plan Abstraction - Replaced Execution target: codex with Execution target: external-delegate across ce:plan, ce:plan-beta, and ce:work. The specific tool resolves at execution time.

Hardcoded Model - Removed -m "gpt-5.3-codex" flag. Delegates use their own default model.

Mixed Attribution - Added guidance for single-delegate, single-standard, and mixed-model scenarios in Phase 4.

Error Threshold - Added 3-consecutive-failure circuit breaker that disables delegation for remaining tasks.

Beta Skill - Created ce:work-beta with disable-model-invocation: true.

Shell compliance + stdin - Removed all shell chaining/redirection from examples. Prompt goes via stdin, not argv. Credentials via env vars, not prompt text.

@tmchow
Copy link
Collaborator

tmchow commented Mar 22, 2026

@codex provide a final review, focused on the latest changes. also check if the repo and various READMEs need to be updated.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 289d14658b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +10 to +14
This is the beta variant of `ce:work` that includes the External Delegate Mode feature for testing. Once stable, the delegation section will be promoted to `ce:work` and this beta skill will be removed.

**All behavior is identical to `ce:work`** - this skill simply re-includes the full `ce:work` content with the External Delegate Mode section active.

To use: run `/ce:work-beta [plan file]` instead of `/ce:work [plan file]`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace the ce:work-beta stub with the full beta workflow

All of our loaders/targets only register and copy each skill’s own skills/<name>/SKILL.md (src/parsers/claude.ts, src/targets/codex.ts), so invoking /ce:work-beta will load only this 20-line wrapper. Because this file does not actually contain or link in the ce:work instructions in a supported way, anyone testing the beta command will miss the real execution workflow and the new delegation steps entirely.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvanhorn in order to make the beta skill work, you have to duplicate the entire skill which incorporates your changes. You can't just do what you did. Also by having an entire encapsulated skill, it makes it easier to promote the skill to stable.

Comment on lines +439 to +441
External delegation activates when any of these conditions are met:
- The user says "use codex for this work", "delegate to codex", or "delegate mode"
- A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan-beta or ce:plan)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep delegate triggers out of the stable ce:work path

The beta rollout framework in docs/solutions/skill-design/beta-skills-framework.md only works if the stable skills stay unchanged while users opt into the -beta variant, but these trigger phrases now enable delegation from normal /ce:work sessions, and stable ce:plan also stamps Execution target: external-delegate into plans (plugins/compound-engineering/skills/ce-plan/SKILL.md:154-156). That means a regular /ce:plan -> /ce:work flow can hit the experimental delegate path without ever choosing /ce:work-beta, which defeats the safe side-by-side trial and can surprise users who merely mention Codex or token conservation.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvanhorn address this one too. see docs/solutions/skill-design/beta-skills-framework.md if you want to see how i've structured how beta skills work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants