Skip to content

Supervisor agent: one-shot workflows#363

Merged
josephjclark merged 36 commits intorelease/nextfrom
supervisor-agent
Mar 19, 2026
Merged

Supervisor agent: one-shot workflows#363
josephjclark merged 36 commits intorelease/nextfrom
supervisor-agent

Conversation

@hanna-paasivirta
Copy link
Copy Markdown
Contributor

@hanna-paasivirta hanna-paasivirta commented Jan 29, 2026

Short Description

Adds a global agent — a supervisor-style orchestration layer that sits in front of workflow_chat and job_chat. It accepts a single unified payload from the frontend and intelligently routes requests to the right subagent(s), or escalates to a multi-step planner when the task requires coordination across both.

Current testing focus: One-shot workflow generation from scratch (when the user doesn't have an existing workflow). Other scenarios (editing existing workflows, job-level chat, multi-turn conversations) have not been tested end-to-end and may not work correctly.

To test use

pytest global_agent/tests/test_planner_multistep.py -v -s

Fixes #333 (done without this code being merged)
Fixes #398
Fixes #404

Implementation Details

Architecture

The global agent uses a two-tier dispatch model:

  1. Router (router.py) — A fast, cheap Claude Haiku call classifies the user request into one of three destinations: workflow_agent, job_code_agent, or planner. Uses a constrained JSON generation trick (pre-filled assistant turn '{"destination": "') to force deterministic structured output. On any routing failure, defaults to planner.

  2. Planner (planner.py) — A Claude Sonnet tool-calling loop (up to 20 iterations) that can orchestrate multiple subagent calls in sequence. The planner sees a redacted version of the workflow YAML (job bodies replaced with # [use inspect_job_code to view]) to keep context small, and maintains a live current_yaml state that gets stitched after each subagent call.

  3. Direct routes — For simple requests (e.g., "edit this job's code"), the router bypasses the planner entirely and calls workflow_chat.main() or job_chat.main() directly as in-process Python function calls.

Pasted Graphic

Key Design Decisions

  • YAML as shared state: The full workflow_yaml string is the single state carrier between turns and between agents. The planner mutates a local copy during its loop, stitching code in after each subagent call, and returns the final state as an attachment.
  • Stateless subagents: Both call_workflow_agent and call_job_agent always pass history: []. The planner encodes all necessary context in the message field of each tool call, treating subagents as stateless specialists.
  • Direct Python invocation: All subagent calls are synchronous in-process function calls, not HTTP requests.
  • Context management: The planner uses Anthropic's context-management beta to prune old tool uses in multi-turn conversations (triggers at 20, keeps 10), preserving search_documentation results.

New Files

File Purpose
global_agent/global_agent.py Entry point — validates payload, creates router, returns structured envelope
global_agent/router.py Haiku-based routing + direct dispatch to workflow_chat/job_chat
global_agent/planner.py Sonnet tool-calling loop for multi-step tasks
global_agent/subagent_caller.py Thin wrappers around workflow_chat.main() and job_chat.main()
global_agent/config.yaml LLM model configs (Haiku for router, Sonnet for planner)
global_agent/prompts.yaml System prompts for router and planner
global_agent/tools/tool_definitions.py Claude API tool schemas (search_documentation, call_workflow_agent, call_job_code_agent, inspect_job_code)
global_agent/yaml_utils.py YAML parsing, job lookup, code stitching, body redaction
global_agent/PAYLOAD_SPEC.md Public API contract documentation
search_documentation/search_documentation.py Extracted doc search into standalone service (used by planner as a tool)

Changes to Existing Services

  • streaming_util.py: Simplified streaming utilities
  • util.py: Added sum_usage() for token aggregation across agents, search_documentation_tool() helper
  • load_adaptor_docs: Minor adjustments

Tests

Tests in global_agent/tests/ covering:

  • Planner multi-step scenarios
  • Planner-to-subagent clarification flows
  • End-to-end "good morning workflow" generation

Changes needed in Lightning

This is a new service with a new API. Changes are needed in Lightning if we want to experiment with this. See: OpenFn/lightning#4532

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark
Copy link
Copy Markdown
Collaborator

Some requests!

  1. A simple test that I (or someone else) can run which asks the same question to a) the existing workflow chat and b) the global assistant, and allow me to compare answers
  2. A readout here of the approximate token overhead of of the global assistant. Like if the workflow service today costs 10k tokens per query, how many tokens for the same query in the global assistant? This before any optimisation - I want a baseline number that we can work with

@hanna-paasivirta
Copy link
Copy Markdown
Contributor Author

At this stage, we've plugged in the workflow agent and can get a similar answer via the supervisor and the original standalone workflow_chat service. For a simple task that only requires one pass through the workflow agent, the token consumption increase in the agentic version is 2x input tokens and 3x output tokens (maybe an underestimate, given the basic prompt of the supervisor). This is our baseline cost without any optimisations.

@josephjclark
Copy link
Copy Markdown
Collaborator

josephjclark commented Mar 16, 2026

TODOs:

  1. Take the print_response_details test util and promote it to a main server util. When I'm debugging apollo responses, I want this to be available to me!
  2. Later later later compile yaml and js code with the CLI to ensure it's valid
  3. The planner does an ok job of taking the project.yaml and passing the right parts to the subagents. But if the planner is bypassed, and we call the workflow or job chat directly, the chat agent will likely perform too badly because it is given the whole workflow.yaml as an input, and not just the stuff it wants. There's no filter. We might want to add a filter. Ie in workflow chat, remove the job code. In job chat, extract the relevant step from the workflow yaml and just send that.

The weaknesses of this implementation right now are:

  1. Anything unrelated to workflow or chat code generation may perform poorly
  2. when going through the global endpoint, any "simple" workflow generation or job code generation is likely to perform poorly
  3. The planner can be very very slow

Basically planning works well for the cases we've tested, but we don't know about other cases, and because of changes to the input, non planning steps won't well.

Ways forward:

  1. Test carefully for a few days to understand the behaviour
  2. Start implementing fixes for anticipated problems
  3. Release without testing and let AI be AI
  4. Limit the capability to strictly only do what workflow chat does today.

@josephjclark
Copy link
Copy Markdown
Collaborator

josephjclark commented Mar 16, 2026

Here's the next steps:

  1. Get a release ready of this global_assistant endpoint (fix signatures and tidy up)
  2. Release!
  3. Start testing and improving here(Including better system logging and user event logging)
  4. Kick off lightning work for a minimal opt-in experimental integration (@hanna-paasivirta to kick off off with Product)

@hanna-paasivirta hanna-paasivirta changed the title Supervisor agent Supervisor agent: one-shot workflows Mar 16, 2026
@hanna-paasivirta hanna-paasivirta changed the base branch from main to release/next March 17, 2026 17:15
@hanna-paasivirta hanna-paasivirta marked this pull request as ready for review March 17, 2026 17:19
@hanna-paasivirta
Copy link
Copy Markdown
Contributor Author

Structured outputs are deprecated now, so I reverted back to Sonnet 4.5 as some changes are needed across services.

@josephjclark josephjclark merged commit 8788793 into release/next Mar 19, 2026
@josephjclark josephjclark deleted the supervisor-agent branch March 19, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

One shot workflow generation via API (no usage in app) Create global agent

2 participants