bug(mcp): tools/list hangs 60s when client sends initialize + notifications/initialized + tools/list in rapid succession

## Summary

`codebase-memory-mcp` hangs for 60 seconds before responding to `tools/list` when an MCP client sends the three standard initialization messages without artificial delays between them. This manifests as a \"connecting...\" state in Claude Code that resolves only after `STORE_IDLE_TIMEOUT_S` (60s) elapses.

## Root Cause

The MCP event loop in `cbm_mcp_server_run` (`src/mcp/mcp.c`) mixes `poll()` on the raw file descriptor with `getline()` on a buffered `FILE*`. These two abstractions operate at different layers of the I/O stack, and the combination creates a correctness hazard:

1. The client sends three messages back-to-back with no delay between them (all arrive in the kernel receive buffer simultaneously)
2. `poll()` fires — data is available
3. `getline()` reads `initialize` **and over-reads** — libc's `FILE*` buffer drains the entire kernel buffer, pulling all three messages into userspace
4. `cbm_mcp_server_handle()` processes `initialize` and returns a response
5. `getline()` processes `notifications/initialized` (a notification with no `id`) — `cbm_mcp_server_handle()` returns `NULL` (correct per spec), no response written
6. The loop calls `poll()` again for the next message — **but the `tools/list` payload is already in libc's `FILE*` buffer, not the kernel fd**
7. `poll()` sees an empty kernel fd and blocks for 60 seconds
8. `tools/list` never receives a response within any reasonable timeout

The bug was reliably triggered by Claude Code 2.1.80, which sends all three initialization messages as a rapid burst (no inter-message delay). Earlier client versions or clients that insert delays between messages may never observe the bug.

**Reproduction:**

```python
import subprocess, json, time

binary = \"codebase-memory-mcp\"
proc = subprocess.Popen([binary], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)

msgs = [
    {\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2025-11-25\",\"capabilities\":{},\"clientInfo\":{\"name\":\"test\",\"version\":\"1.0\"}},\"jsonrpc\":\"2.0\",\"id\":0},
    {\"method\":\"notifications/initialized\",\"jsonrpc\":\"2.0\"},
    {\"method\":\"tools/list\",\"jsonrpc\":\"2.0\",\"id\":1},
]

# Send all three with NO delay — triggers the hang
for m in msgs:
    proc.stdin.write(json.dumps(m) + \"\\n\")
proc.stdin.flush()

start = time.time()
for _ in range(2):  # expect initialize response + tools/list response
    line = proc.stdout.readline()
    print(f\"{time.time()-start:.2f}s: {line[:80]}\")
proc.terminate()
```

**Expected:** both responses arrive within ~1 second.  
**Observed (before fix):** `initialize` response arrives immediately; `tools/list` response arrives after ~60 seconds.

The comment at the original `poll()` call site stated *\"MCP is request-response (one line at a time), so mixing poll() on the raw fd with getline() on the buffered FILE* is safe in practice.\"* This assumption does not hold when multiple messages arrive in a single kernel receive event.

## Trigger Context: Claude Code 2.1.80

Claude Code 2.1.80 changed its MCP client startup to send the three initialization messages (`initialize`, `notifications/initialized`, `tools/list`) in rapid succession as part of a single write burst. This is legal behavior under the MCP specification — the protocol does not require delays between messages. The server bug was latent before this client change; 2.1.80 made it reliably reproducible.

The three messages CC 2.1.80 sends on startup (captured via spy):
```json
{"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{"roots":{},"elicitation":{"form":{},"url":{}}},"clientInfo":{"name":"claude-code","version":"2.1.80"}},"jsonrpc":"2.0","id":0}
{"method":"notifications/initialized","jsonrpc":"2.0"}
{"method":"tools/list","jsonrpc":"2.0","id":1}
```

## Fix

Replace the single blocking `poll()` call with a **three-phase approach** that correctly handles data already buffered in the `FILE*` layer:

**Phase 1:** Non-blocking `poll(timeout=0)` — fast path, catches data already in the kernel fd.

**Phase 2:** If Phase 1 returns 0 (no kernel data), peek one byte from the `FILE*` buffer using `fgetc(in)` + `ungetc()`. This detects data that a prior `getline()` over-read pulled into libc's buffer. If data is found, skip the blocking poll and fall through to `getline()`.

**Phase 3:** Only if both Phase 1 and Phase 2 confirm no data — call blocking `poll(STORE_IDLE_TIMEOUT_S * 1000)` for idle eviction.

This approach is fully POSIX-portable and does not require making the fd non-blocking (which would complicate `getline()` error handling for `EAGAIN`), nor does it rely on GNU-only extensions like `__fpending()`.

The inaccurate comment at the original call site is also corrected to document the actual hazard.

## Test Coverage

- **C unit test** (`tests/test_mcp.c`): `mcp_server_run_rapid_messages` — uses `pipe()` + `alarm(5)` to verify all three init messages are processed without hanging
- **Python integration test** (`scripts/test_mcp_rapid_init.py`): sends all three messages simultaneously via `proc.communicate()`, asserts `tools/list` response arrives within 5 seconds against the installed binary

**Test results:** 2043/2043 tests pass. Python integration test passes against built binary and installed binary.

## Affected Versions

Triggered reliably by Claude Code ≥ 2.1.80. Latent in earlier versions where client insert inter-message delays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(mcp): tools/list hangs 60s when client sends initialize + notifications/initialized + tools/list in rapid succession #98

Summary

Root Cause

Trigger Context: Claude Code 2.1.80

Fix

Test Coverage

Affected Versions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bug(mcp): tools/list hangs 60s when client sends initialize + notifications/initialized + tools/list in rapid succession #98

Description

Summary

Root Cause

Trigger Context: Claude Code 2.1.80

Fix

Test Coverage

Affected Versions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions