fix: disable TCP keepalive by default to avoid connection timeouts#276
fix: disable TCP keepalive by default to avoid connection timeouts#276
Conversation
Keepalive probes caused connection timeout errors in offline and high-concurrency modes. Dead connections are already handled by connection_lost/eof_received callbacks in the protocol layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request disables TCP keepalive for the inference endpoint client to mitigate connection timeout errors observed in high-concurrency environments. A review comment points out that disabling keepalive may lead to "zombie" connections during silent network drops, as the current implementation lacks timeouts in the read methods; it is recommended to implement application-level timeouts or use TCP_USER_TIMEOUT as a safeguard.
There was a problem hiding this comment.
Pull request overview
Disables TCP keepalive probes by default in the endpoint HTTP client to reduce connection timeout errors observed in offline and high-concurrency modes (issue #202), relying instead on protocol/connection lifecycle handling in the client.
Changes:
- Set
_SocketConfig.SO_KEEPALIVEdefault to0(disabled). - Gate application of
TCP_KEEP*tuning behindSO_KEEPALIVEbeing enabled.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Reword misleading "(disabled)" comments on TCP_KEEP* fields to "only used when SO_KEEPALIVE is enabled" - Add test asserting socket SO_KEEPALIVE option on pool connections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keepalive probes caused connection timeout errors in offline and high-concurrency modes. Dead connections are already handled by connection_lost/eof_received callbacks in the protocol layer.
attempt to close #202
What does this PR do?
Type of change
Related issues
Testing
Checklist