Skip to content

Fix abort when binding response streams to a failed socket#3250

Open
amoxic wants to merge 1 commit intoapache:masterfrom
amoxic:fix-3245-stream-bind-failure
Open

Fix abort when binding response streams to a failed socket#3250
amoxic wants to merge 1 commit intoapache:masterfrom
amoxic:fix-3245-stream-bind-failure

Conversation

@amoxic
Copy link

@amoxic amoxic commented Mar 25, 2026

What problem does this PR solve?

Issue Number: resolves #3245

Problem Summary:

When a peer disconnects after StreamAccept() succeeds but before the RPC response binds the response stream to the host socket, Socket::AddStream() can fail inside Stream::SetHostSocket(). The previous code treated this as an impossible state and aborted the server with CHECK(false).

This PR turns that path into a normal runtime failure so the current RPC/stream is failed and the server process stays alive.

What is changed and the side effects?

Changed:

  • Make Stream::SetHostSocket() return -1 with errno instead of aborting when Socket::AddStream() fails.
  • Handle response stream bind failures in SendRpcResponse() by setting controller failure, failing the affected stream(s), and returning early.
  • Keep response write completion joined through a scope guard so tracing data is still finalized on early returns.
  • Add unit tests for SetHostSocket() failure and the server-side disconnect-before-response path.

Side effects:

  • Performance effects:

    No expected steady-state performance impact. Extra work is only added on the error path.

  • Breaking backward compatibility:

    No API change. The behavior changes from process abort to RPC/stream failure when the host socket is already failed.


Check List:

@hjwsm1989
Copy link

@chenBright please help review

@chenBright
Copy link
Contributor

cc @jenrryyou

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents the server process from aborting when binding response streams to a host socket that has already failed (e.g., peer disconnects between StreamAccept() and response send), turning it into a normal RPC/stream failure path.

Changes:

  • Make Stream::SetHostSocket() return an error (with errno set) instead of CHECK(false) on Socket::AddStream() failure.
  • In SendRpcResponse(), handle response-stream bind failures by failing the controller and affected stream(s), and returning early while still joining background response-write (via a scope guard).
  • Add unit tests covering SetHostSocket() failure and the disconnect-before-response scenario.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/brpc/stream.cpp Converts host-socket bind from fatal abort to error return.
src/brpc/policy/baidu_rpc_protocol.cpp Handles stream bind failures in SendRpcResponse() and ensures response-write completion is joined on early returns.
test/brpc_streaming_rpc_unittest.cpp Adds regression tests for failed-socket bind and server-side early disconnect.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +458 to +462
StreamIds remaining_stream_ids(response_stream_ids.begin() + i,
response_stream_ids.end());
Stream::SetFailed(remaining_stream_ids, errcode,
"Fail to bind response stream to %s",
sock->description().c_str());
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same varargs-forwarding issue applies here: the formatted Stream::SetFailed(remaining_stream_ids, ..., "...%s", ...) relies on undefined behavior in Stream::SetFailed/Close when the format string contains specifiers. Please address the varargs forwarding (or pre-format the message before calling Stream::SetFailed).

Copilot uses AI. Check for mistakes.
Comment on lines 643 to 647
int error_code = 0;
std::call_once(_set_host_socket_flag, [this, host_socket, &error_code]() {
SocketUniquePtr ptr;
host_socket->ReAddress(&ptr);
// TODO add *this to host socke
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream::SetHostSocket() uses std::call_once with a stack-local error_code captured by reference. Only the thread that executes the call_once lambda can update error_code; other concurrent callers (or later callers after a failed first attempt) will see error_code == 0 and fall back to EFAILEDSOCKET, losing the real failure reason. Consider persisting the first failure code in a Stream member (e.g. _set_host_socket_ec) set inside the call_once body and using that to set errno for all callers.

Copilot uses AI. Check for mistakes.
Comment on lines +654 to +656
if (_host_socket == NULL) {
errno = error_code ? error_code : EFAILEDSOCKET;
return -1;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that SetHostSocket() can return -1 instead of aborting, it’s important that all call sites check the return value to avoid later CHECK(_host_socket != NULL)/null-deref paths (e.g. WriteToHostSocket, SetConnected). Please audit/update remaining callers that still ignore the return value.

Copilot uses AI. Check for mistakes.
Comment on lines +363 to +370
if (s->SetHostSocket(sock) != 0) {
const int errcode = sock->non_zero_error_code();
LOG_IF(WARNING, errcode != EPIPE)
<< "Fail to bind response stream=" << response_stream_id
<< " to " << sock->description() << ": "
<< berror(errcode);
cntl->SetFailed(errcode, "Fail to bind response stream to %s",
sock->description().c_str());
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On SetHostSocket() failure you’re logging/propagating sock->non_zero_error_code(), but SetHostSocket() already sets errno to the specific bind failure. Using errno from the failing call will be more accurate and avoids relying on the socket’s racy _error_code publication timing (see comment in Socket::non_zero_error_code()).

Copilot uses AI. Check for mistakes.
Comment on lines +372 to +373
"Fail to bind response stream to %s",
sock->description().c_str());
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new Stream::SetFailed(..., "...%s", ...) calls assume Stream::SetFailed correctly forwards printf-style varargs. In src/brpc/stream.cpp, Stream::SetFailed currently passes a va_list into Stream::Close(...) via ..., which is undefined behavior when the format string has specifiers. This can corrupt the failure reason or crash on some ABIs. Please fix Stream::SetFailed/Close to properly accept/forward a va_list (or pre-format the message into a string and pass it as "%s").

Suggested change
"Fail to bind response stream to %s",
sock->description().c_str());
"Fail to bind response stream to host socket");

Copilot uses AI. Check for mistakes.
@amoxic amoxic force-pushed the fix-3245-stream-bind-failure branch from 88a7a19 to b567af3 Compare March 25, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] 对端在 StreamAccept() 之后、响应发送之前断开时,Stream::SetHostSocket() 会触发 CHECK(false) 导致进程崩溃

4 participants