fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975
fix: LSP tools broken in multi-agent setups (process lifecycle, code mode resilience, sub-agent identity)#1975aheritier wants to merge 4 commits intodocker:mainfrom
Conversation
When multiple LSP toolsets are configured (e.g., gopls for Go and pyright for Python), each produces tools with identical names (lsp_hover, lsp_definition, etc.), causing LLM APIs to reject the request because tool names must be unique. Add an LSPMultiplexer that combines multiple LSP backends into a single toolset. File-based tools (hover, definition, references, etc.) are routed to the correct backend by matching the file extension. Non-file tools (workspace, workspace_symbols) are broadcast to all backends and results are merged. The teamloader detects multiple LSP toolsets during agent loading and automatically wraps them in a multiplexer. Per-toolset config (tool filters, instructions, toon) is preserved through wrapping. Fixes docker#1969 Assisted-By: cagent
The LSP server process was started with exec.CommandContext(ctx) where ctx was the caller's request or sub-session context. When that context was cancelled (e.g., sub-agent session ends), the LSP process was killed. Meanwhile, the StartableToolSet.started flag remained true, so subsequent sessions never restarted the server — ensureInitialized() returned early on a dead process. Use context.WithoutCancel(ctx) so the LSP process outlives the caller, matching how MCP tools handle process lifecycle. The process is now explicitly terminated only by stop(). Also fix a race condition in ensureInitialized(): the method unlocked h.mu before calling start() (which re-locks it), creating a window where concurrent goroutines could race and get a spurious 'LSP server already running' error. Extract startLocked() that assumes the caller holds the lock and is idempotent (returns nil if already running). Replace the context-based readNotifications goroutine signal with an explicit done channel closed by stop(). Assisted-By: docker-agent
codeModeTool.Start() had two issues: 1. It failed fast on the first toolset error, returning immediately. Since code_mode_tools wraps ALL toolsets into a single codeModeTool, one broken toolset (e.g., an unreachable MCP server) would prevent every other toolset from starting. The runtime then skips the entire code mode wrapper, leaving the agent with zero tools. 2. Both Start() and Stop() used direct type assertion t.(tools.Startable) instead of tools.As[tools.Startable](t). This cannot find Startable through the Unwrapper chain. When an LSP toolset has tool filters, instructions, or toon configured, the wrapping layers hide the Startable interface and the LSP server is silently never started. Fix Start() to log a warning and continue on failure, matching the runtime's own resilient behavior. Fix both Start() and Stop() to use tools.As for proper interface discovery through wrappers. Assisted-By: docker-agent
RunStream resolves the correct agent at the top of its goroutine using sess.AgentName (for background agents) or r.CurrentAgent() (for transfer_task). However, the iteration loop immediately shadowed this with 'a := r.CurrentAgent()', which always returns the root agent for background agent tasks since run_background_agent never calls setCurrentAgent. This caused all sub-agents dispatched via run_background_agent to execute with root's toolset and model — LSP tools, sub-agent-specific MCP servers, and configured models never reached the LLM. Remove the re-assignment so the loop uses the already-resolved agent. Add session_agent to the RunStream debug log for easier diagnosis. Assisted-By: docker-agent
|
/review |
There was a problem hiding this comment.
Review Summary
Assessment: 🟢 APPROVE
This PR correctly fixes three critical bugs in multi-agent LSP setups:
-
Runtime agent resolution (
runtime.go): Removes the variable shadowing that caused sub-agents to execute with root's toolset and model. The removeda := r.CurrentAgent()inside the loop was overwriting the correctly-resolved session agent. -
LSP process lifecycle (
lsp.go): Detaches LSP server process from caller context usingcontext.WithoutCancel, preventing premature termination when sub-session contexts end. Also fixes race condition inensureInitialized(). -
Code mode resilience (
codemode.go): MakesStart()continue on individual toolset failures and usestools.Asto properly unwrap and findStartableimplementations through wrapper chains.
The implementation is clean, well-tested (new tests added), and the changes are minimal and focused on the stated bug fixes. All modified code is correct.
Findings
No issues found in the changed code. The drafter initially flagged a potential undefined variable issue in runtime.go, but verification confirmed this is the correct fix — the variable a is properly defined earlier in the function scope (line 997) and removing the shadowing assignment is intentional.
Automated review by cagent PR reviewer
Summary
Fixes three interrelated bugs that prevented LSP tools from working in multi-agent configurations, discovered while testing with a real-world project (Devoxx Call-for-Papers) using 4 agents with Java (
jdtls) and TypeScript (typescript-language-server) LSP servers.Stacked on top of #1970 (LSP multiplexer).
Bugs fixed
1. Sub-agents execute with root's toolset and model (
runtime.go)RunStreamcorrectly resolves the session's agent at the top of its goroutine (viasess.AgentNameorr.CurrentAgent()). However, the iteration loop immediately shadows the resolved agent witha := r.CurrentAgent(), which always returnsrootfor background agent tasks sincerun_background_agentnever callssetCurrentAgent.Impact: All sub-agents dispatched via
run_background_agentexecuted with root's toolset (no LSP tools), root's model (wrong tier), and root's instructions.2. LSP process killed by context cancellation (
lsp.go)The LSP server process was started with
exec.CommandContext(ctx)wherectxwas the caller's request or sub-session context. When that context ended, the process was killed — butStartableToolSet.startedremained true, so the server was never restarted.Also fixes a race condition in
ensureInitialized()where the mutex unlock/re-lock window allowed concurrent callers to get spurious errors.3.
codeModeTool.Start()fails fast and can't find wrapped Startable (codemode.go)Start()returned on the first error, so one broken toolset (e.g., unreachable MCP server) killed all tools for the agent. Also used direct type assertion instead oftools.As, unable to findStartablethrough wrapper chains.Commits
d0d0b4c—fix(lsp): detach LSP process from caller context, fix startup race, adddonechannel for goroutine cleanup7e11e4d—fix(codemode): makeStart()resilient (continue on failure) and unwrap-aware (tools.As)baf531b—fix(runtime): use session-resolved agent in RunStream iteration loopTesting
task test,task lint)TestCodeModeTool_Start_ContinuesOnFailure,TestCodeModeTool_Start_UnwrapsToFindStartable)Ref #1969