Skip to content

Fix SSH agent forwarding goroutine leak via missing half-close#49

Merged
JAORMX merged 2 commits intomainfrom
fix/agent-forwarding-half-close
Mar 20, 2026
Merged

Fix SSH agent forwarding goroutine leak via missing half-close#49
JAORMX merged 2 commits intomainfrom
fix/agent-forwarding-half-close

Conversation

@JAORMX
Copy link
Contributor

@JAORMX JAORMX commented Mar 20, 2026

Summary

  • Root cause: proxyAgentConnection doesn't call channel.CloseWrite() after the guest process disconnects from the agent socket. Both the guest proxy and host handler goroutines deadlock waiting on reads that will never complete, leaking a maxAgentConns semaphore slot per connection.
  • After 8 leaked connections, all new agent socket connections are rejected, breaking SSH agent forwarding for the rest of the session. Symptom: git operations succeed for the first few minutes then fail with Permission denied (publickey).
  • Fix: one-line channel.CloseWrite() after the guest→host io.Copy returns, signaling EOF to the host so ServeAgent returns and the full cleanup chain completes.
  • Tests: new TestAgentProxyCleanup opens more sessions than maxAgentConns on a single connection — fails without the fix, passes with it. TestAgentProxyConcurrent covers multiple agent queries within a single session.

Test plan

  • TestAgentProxyCleanup — opens 10 sequential sessions (> maxAgentConns=8), verifying semaphore release
  • TestAgentProxyConcurrent — multiple agent socket checks within a single session
  • Full test suite passes (task test)
  • Lint clean (task lint)
  • End-to-end soak test with brood-box: 15-minute git fetch loop every 2 minutes — 8/8 succeeded (was 2/8 before fix)

🤖 Generated with Claude Code

JAORMX and others added 2 commits March 20, 2026 11:33
When a guest process disconnects from the agent socket,
io.Copy(channel, unixConn) returns but the channel write side
stays open. The host's ServeAgent blocks reading the channel
waiting for a request that never comes, and the guest's reverse
copy blocks waiting for a response. Neither side ever closes,
leaking both goroutines and a maxAgentConns semaphore slot.

After 8 leaked connections (maxAgentConns), all new agent socket
connections are rejected, breaking SSH agent forwarding for the
rest of the session. Symptoms: git operations succeed for the
first few minutes then fail with "Permission denied (publickey)".

Fix: call channel.CloseWrite() after the guest->host copy
finishes, signaling EOF to the host so ServeAgent returns and
the full cleanup chain completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that the agent proxy goroutines and semaphore slots are
released after each session, preventing exhaustion of
maxAgentConns. The TestAgentProxyCleanup test opens more sessions
than maxAgentConns on a single connection, which would fail
without the CloseWrite half-close fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX merged commit f740f50 into main Mar 20, 2026
7 checks passed
@JAORMX JAORMX deleted the fix/agent-forwarding-half-close branch March 20, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants