Skip to content

fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975

Draft
aheritier wants to merge 4 commits intodocker:mainfrom
aheritier:fix/lsp-tools-multi-agent
Draft

fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975
aheritier wants to merge 4 commits intodocker:mainfrom
aheritier:fix/lsp-tools-multi-agent

Conversation

@aheritier
Copy link
Contributor

Summary

Fixes three interrelated bugs that prevented LSP tools from working in multi-agent configurations, discovered while testing with a real-world project (Devoxx Call-for-Papers) using 4 agents with Java (jdtls) and TypeScript (typescript-language-server) LSP servers.

Stacked on top of #1970 (LSP multiplexer).

Bugs fixed

1. Sub-agents execute with root's toolset and model (runtime.go)

RunStream correctly resolves the session's agent at the top of its goroutine (via sess.AgentName or r.CurrentAgent()). However, the iteration loop immediately shadows the resolved agent with a := r.CurrentAgent(), which always returns root for background agent tasks since run_background_agent never calls setCurrentAgent.

Impact: All sub-agents dispatched via run_background_agent executed with root's toolset (no LSP tools), root's model (wrong tier), and root's instructions.

2. LSP process killed by context cancellation (lsp.go)

The LSP server process was started with exec.CommandContext(ctx) where ctx was the caller's request or sub-session context. When that context ended, the process was killed — but StartableToolSet.started remained true, so the server was never restarted.

Also fixes a race condition in ensureInitialized() where the mutex unlock/re-lock window allowed concurrent callers to get spurious errors.

3. codeModeTool.Start() fails fast and can't find wrapped Startable (codemode.go)

Start() returned on the first error, so one broken toolset (e.g., unreachable MCP server) killed all tools for the agent. Also used direct type assertion instead of tools.As, unable to find Startable through wrapper chains.

Commits

  • d0d0b4cfix(lsp): detach LSP process from caller context, fix startup race, add done channel for goroutine cleanup
  • 7e11e4dfix(codemode): make Start() resilient (continue on failure) and unwrap-aware (tools.As)
  • baf531bfix(runtime): use session-resolved agent in RunStream iteration loop

Testing

  • All existing tests pass (task test, task lint)
  • New tests added for code mode resilience (TestCodeModeTool_Start_ContinuesOnFailure, TestCodeModeTool_Start_UnwrapsToFindStartable)
  • Validated with a real multi-agent project (4 agents, 7 LSP server instances): 12/14 LSP tools working across all agents

Ref #1969

dgageot and others added 4 commits March 6, 2026 18:13
When multiple LSP toolsets are configured (e.g., gopls for Go and
pyright for Python), each produces tools with identical names
(lsp_hover, lsp_definition, etc.), causing LLM APIs to reject the
request because tool names must be unique.

Add an LSPMultiplexer that combines multiple LSP backends into a single
toolset. File-based tools (hover, definition, references, etc.) are
routed to the correct backend by matching the file extension.
Non-file tools (workspace, workspace_symbols) are broadcast to all
backends and results are merged.

The teamloader detects multiple LSP toolsets during agent loading and
automatically wraps them in a multiplexer. Per-toolset config (tool
filters, instructions, toon) is preserved through wrapping.

Fixes docker#1969

Assisted-By: cagent
The LSP server process was started with exec.CommandContext(ctx) where
ctx was the caller's request or sub-session context. When that context
was cancelled (e.g., sub-agent session ends), the LSP process was
killed. Meanwhile, the StartableToolSet.started flag remained true, so
subsequent sessions never restarted the server — ensureInitialized()
returned early on a dead process.

Use context.WithoutCancel(ctx) so the LSP process outlives the caller,
matching how MCP tools handle process lifecycle. The process is now
explicitly terminated only by stop().

Also fix a race condition in ensureInitialized(): the method unlocked
h.mu before calling start() (which re-locks it), creating a window
where concurrent goroutines could race and get a spurious 'LSP server
already running' error. Extract startLocked() that assumes the caller
holds the lock and is idempotent (returns nil if already running).

Replace the context-based readNotifications goroutine signal with an
explicit done channel closed by stop().

Assisted-By: docker-agent
codeModeTool.Start() had two issues:

1. It failed fast on the first toolset error, returning immediately.
   Since code_mode_tools wraps ALL toolsets into a single codeModeTool,
   one broken toolset (e.g., an unreachable MCP server) would prevent
   every other toolset from starting. The runtime then skips the entire
   code mode wrapper, leaving the agent with zero tools.

2. Both Start() and Stop() used direct type assertion t.(tools.Startable)
   instead of tools.As[tools.Startable](t). This cannot find Startable
   through the Unwrapper chain. When an LSP toolset has tool filters,
   instructions, or toon configured, the wrapping layers hide the
   Startable interface and the LSP server is silently never started.

Fix Start() to log a warning and continue on failure, matching the
runtime's own resilient behavior. Fix both Start() and Stop() to use
tools.As for proper interface discovery through wrappers.

Assisted-By: docker-agent
RunStream resolves the correct agent at the top of its goroutine using
sess.AgentName (for background agents) or r.CurrentAgent() (for
transfer_task). However, the iteration loop immediately shadowed this
with 'a := r.CurrentAgent()', which always returns the root agent for
background agent tasks since run_background_agent never calls
setCurrentAgent.

This caused all sub-agents dispatched via run_background_agent to
execute with root's toolset and model — LSP tools, sub-agent-specific
MCP servers, and configured models never reached the LLM.

Remove the re-assignment so the loop uses the already-resolved agent.
Add session_agent to the RunStream debug log for easier diagnosis.

Assisted-By: docker-agent
@aheritier
Copy link
Contributor Author

/review

Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟢 APPROVE

This PR correctly fixes three critical bugs in multi-agent LSP setups:

  1. Runtime agent resolution (runtime.go): Removes the variable shadowing that caused sub-agents to execute with root's toolset and model. The removed a := r.CurrentAgent() inside the loop was overwriting the correctly-resolved session agent.

  2. LSP process lifecycle (lsp.go): Detaches LSP server process from caller context using context.WithoutCancel, preventing premature termination when sub-session contexts end. Also fixes race condition in ensureInitialized().

  3. Code mode resilience (codemode.go): Makes Start() continue on individual toolset failures and uses tools.As to properly unwrap and find Startable implementations through wrapper chains.

The implementation is clean, well-tested (new tests added), and the changes are minimal and focused on the stated bug fixes. All modified code is correct.

Findings

No issues found in the changed code. The drafter initially flagged a potential undefined variable issue in runtime.go, but verification confirmed this is the correct fix — the variable a is properly defined earlier in the function scope (line 997) and removing the shadowing assignment is intentional.


Automated review by cagent PR reviewer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants