fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity) by aheritier · Pull Request #1975 · docker/docker-agent

aheritier · 2026-03-06T21:38:16Z

Summary

Fixes three interrelated bugs that prevented LSP tools from working in multi-agent configurations, discovered while testing with a real-world project (Devoxx Call-for-Papers) using 4 agents with Java (jdtls) and TypeScript (typescript-language-server) LSP servers.

Stacked on top of #1970 (LSP multiplexer).

Bugs fixed

1. Sub-agents execute with root's toolset and model (`runtime.go`)

RunStream correctly resolves the session's agent at the top of its goroutine (via sess.AgentName or r.CurrentAgent()). However, the iteration loop immediately shadows the resolved agent with a := r.CurrentAgent(), which always returns root for background agent tasks since run_background_agent never calls setCurrentAgent.

Impact: All sub-agents dispatched via run_background_agent executed with root's toolset (no LSP tools), root's model (wrong tier), and root's instructions.

2. LSP process killed by context cancellation (`lsp.go`)

The LSP server process was started with exec.CommandContext(ctx) where ctx was the caller's request or sub-session context. When that context ended, the process was killed — but StartableToolSet.started remained true, so the server was never restarted.

Also fixes a race condition in ensureInitialized() where the mutex unlock/re-lock window allowed concurrent callers to get spurious errors.

3. `codeModeTool.Start()` fails fast and can't find wrapped Startable (`codemode.go`)

Start() returned on the first error, so one broken toolset (e.g., unreachable MCP server) killed all tools for the agent. Also used direct type assertion instead of tools.As, unable to find Startable through wrapper chains.

Commits

d0d0b4c — fix(lsp): detach LSP process from caller context, fix startup race, add done channel for goroutine cleanup
7e11e4d — fix(codemode): make Start() resilient (continue on failure) and unwrap-aware (tools.As)
baf531b — fix(runtime): use session-resolved agent in RunStream iteration loop

Testing

All existing tests pass (task test, task lint)
New tests added for code mode resilience (TestCodeModeTool_Start_ContinuesOnFailure, TestCodeModeTool_Start_UnwrapsToFindStartable)
Validated with a real multi-agent project (4 agents, 7 LSP server instances): 12/14 LSP tools working across all agents

Ref #1969

When multiple LSP toolsets are configured (e.g., gopls for Go and pyright for Python), each produces tools with identical names (lsp_hover, lsp_definition, etc.), causing LLM APIs to reject the request because tool names must be unique. Add an LSPMultiplexer that combines multiple LSP backends into a single toolset. File-based tools (hover, definition, references, etc.) are routed to the correct backend by matching the file extension. Non-file tools (workspace, workspace_symbols) are broadcast to all backends and results are merged. The teamloader detects multiple LSP toolsets during agent loading and automatically wraps them in a multiplexer. Per-toolset config (tool filters, instructions, toon) is preserved through wrapping. Fixes docker#1969 Assisted-By: cagent

The LSP server process was started with exec.CommandContext(ctx) where ctx was the caller's request or sub-session context. When that context was cancelled (e.g., sub-agent session ends), the LSP process was killed. Meanwhile, the StartableToolSet.started flag remained true, so subsequent sessions never restarted the server — ensureInitialized() returned early on a dead process. Use context.WithoutCancel(ctx) so the LSP process outlives the caller, matching how MCP tools handle process lifecycle. The process is now explicitly terminated only by stop(). Also fix a race condition in ensureInitialized(): the method unlocked h.mu before calling start() (which re-locks it), creating a window where concurrent goroutines could race and get a spurious 'LSP server already running' error. Extract startLocked() that assumes the caller holds the lock and is idempotent (returns nil if already running). Replace the context-based readNotifications goroutine signal with an explicit done channel closed by stop(). Assisted-By: docker-agent

codeModeTool.Start() had two issues: 1. It failed fast on the first toolset error, returning immediately. Since code_mode_tools wraps ALL toolsets into a single codeModeTool, one broken toolset (e.g., an unreachable MCP server) would prevent every other toolset from starting. The runtime then skips the entire code mode wrapper, leaving the agent with zero tools. 2. Both Start() and Stop() used direct type assertion t.(tools.Startable) instead of tools.As[tools.Startable](t). This cannot find Startable through the Unwrapper chain. When an LSP toolset has tool filters, instructions, or toon configured, the wrapping layers hide the Startable interface and the LSP server is silently never started. Fix Start() to log a warning and continue on failure, matching the runtime's own resilient behavior. Fix both Start() and Stop() to use tools.As for proper interface discovery through wrappers. Assisted-By: docker-agent

RunStream resolves the correct agent at the top of its goroutine using sess.AgentName (for background agents) or r.CurrentAgent() (for transfer_task). However, the iteration loop immediately shadowed this with 'a := r.CurrentAgent()', which always returns the root agent for background agent tasks since run_background_agent never calls setCurrentAgent. This caused all sub-agents dispatched via run_background_agent to execute with root's toolset and model — LSP tools, sub-agent-specific MCP servers, and configured models never reached the LLM. Remove the re-assignment so the loop uses the already-resolved agent. Add session_agent to the RunStream debug log for easier diagnosis. Assisted-By: docker-agent

aheritier · 2026-03-06T21:41:37Z

/review

docker-agent

Review Summary

Assessment: 🟢 APPROVE

This PR correctly fixes three critical bugs in multi-agent LSP setups:

Runtime agent resolution (runtime.go): Removes the variable shadowing that caused sub-agents to execute with root's toolset and model. The removed a := r.CurrentAgent() inside the loop was overwriting the correctly-resolved session agent.
LSP process lifecycle (lsp.go): Detaches LSP server process from caller context using context.WithoutCancel, preventing premature termination when sub-session contexts end. Also fixes race condition in ensureInitialized().
Code mode resilience (codemode.go): Makes Start() continue on individual toolset failures and uses tools.As to properly unwrap and find Startable implementations through wrapper chains.

The implementation is clean, well-tested (new tests added), and the changes are minimal and focused on the stated bug fixes. All modified code is correct.

Findings

No issues found in the changed code. The drafter initially flagged a potential undefined variable issue in runtime.go, but verification confirmed this is the correct fix — the variable a is properly defined earlier in the function scope (line 997) and removing the shadowing assignment is intentional.

Automated review by cagent PR reviewer

dgageot and others added 4 commits March 6, 2026 18:13

aheritier mentioned this pull request Mar 6, 2026

Multiple LSP toolsets in a single agent cause 'Tool names must be unique' error #1969

Open

docker-agent bot reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975

fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975
aheritier wants to merge 4 commits intodocker:mainfrom
aheritier:fix/lsp-tools-multi-agent

aheritier commented Mar 6, 2026

Uh oh!

aheritier commented Mar 6, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aheritier commented Mar 6, 2026

Summary

Bugs fixed

1. Sub-agents execute with root's toolset and model (runtime.go)

2. LSP process killed by context cancellation (lsp.go)

3. codeModeTool.Start() fails fast and can't find wrapped Startable (codemode.go)

Commits

Testing

Uh oh!

aheritier commented Mar 6, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟢 APPROVE

Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Sub-agents execute with root's toolset and model (`runtime.go`)

2. LSP process killed by context cancellation (`lsp.go`)

3. `codeModeTool.Start()` fails fast and can't find wrapped Startable (`codemode.go`)