fix(mcp): resolve poll/getline FILE* buffering mismatch causing tools/list hang#99
fix(mcp): resolve poll/getline FILE* buffering mismatch causing tools/list hang#99halindrome wants to merge 5 commits intoDeusData:mainfrom
Conversation
- Use O_NONBLOCK + clearerr() in Phase 2 fgetc probe to preserve the 60s idle eviction timeout when both kernel fd and FILE* buffer are empty (fgetc on a blocking fd would otherwise block indefinitely, bypassing Phase 3 poll timeout and preventing cbm_mcp_server_evict_idle) - Add #include <fcntl.h> for fcntl()/O_NONBLOCK - Fix comment: "two-phase" → "three-phase" (implementation has 3 phases) - Improve Python integration test: verify id:1 (initialize) and id:2 (tools/list) response IDs are both present, not just "tools" substring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QA Round 1 — Opus 4.6Verdict: PASS WITH MINOR FINDINGS
Findings addressed in
|
- Add explicit fallback path when fcntl(F_GETFL) fails: skip the FILE* peek and fall through directly to blocking poll so idle eviction still fires on timeout (Finding 1) - Strengthen C unit test: verify id:1 (initialize) and id:2 (tools/list) response IDs are both present, not just a substring match on "tools" (Finding 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QA Round 2 — Opus 4.6Verdict: PASS WITH MINOR FINDINGS
Findings addressed in
|
QA Round 3 (Final) — Opus 4.6Verdict: PASS — no new findings
All findings from rounds 1 and 2 have been resolved. The three-phase poll loop is correct and well-structured. No further issues identified. This PR is ready for review. |
Closes #98
Root Cause
cbm_mcp_server_runmixespoll()on the raw fd withgetline()on a bufferedFILE*. When a client sends multiple messages in rapid succession,getline()over-reads the kernel buffer into libc'sFILE*buffer on the first call. Subsequentpoll()calls see an empty kernel fd and block forSTORE_IDLE_TIMEOUT_S(60 seconds) even though the next messages are already in theFILE*buffer.Triggered reliably by Claude Code 2.1.80, which sends
initialize+notifications/initialized+tools/listas a single burst with no inter-message delays. The MCP spec does not require delays, so this is a server-side bug.The comment at the original call site claimed "mixing poll() on the raw fd with getline() on the buffered FILE is safe in practice"* — this is incorrect when multiple messages arrive in one kernel receive event.
Fix
Three-phase event loop in
cbm_mcp_server_run:poll(timeout=0)— fast path for data in the kernel fdfgetc(in)+ungetc()peek — detects data already in theFILE*buffer from a prior over-readinggetline(); if found, skips the blocking poll entirelypoll(STORE_IDLE_TIMEOUT_S * 1000)for idle evictionPOSIX-portable. Does not require non-blocking fd (avoids
EAGAINcomplexity ingetline()) or GNU-only__fpending().The inaccurate comment is corrected to document the actual hazard.
Files Changed
src/mcp/mcp.c— three-phase poll loop (+41 / -6 lines)tests/test_mcp.c—mcp_server_run_rapid_messagesunit test (pipe + alarm(5))scripts/test_mcp_rapid_init.py— Python integration test (simultaneous send, 5s deadline)Test Results
scripts/test.sh: 2043/2043 passscripts/test_mcp_rapid_init.py: PASS against built binary and installed binarytools/listnow responds immediately on startupQA
Minimum 3 QA rounds per CONTRIBUTING.md guidelines. Reports will be posted as PR comments.