sd: defer SUBSCRIBE_NACK when TCP connection not yet committed (#971)#1025
Open
aki1770-del wants to merge 1 commit intoCOVESA:masterfrom
Open
sd: defer SUBSCRIBE_NACK when TCP connection not yet committed (#971)#1025aki1770-del wants to merge 1 commit intoCOVESA:masterfrom
aki1770-del wants to merge 1 commit intoCOVESA:masterfrom
Conversation
…OVESA#971) With >=2 io_context threads, accept_cbk (which writes the new TCP connection into connections_) and handle_eventgroup_subscription (which reads connections_ via is_tcp_connected) can execute concurrently with no ordering guarantee. If the SUBSCRIBE arrives over UDP before accept_cbk commits the connection, is_tcp_connected returns false and a spurious SUBSCRIBE_NACK is emitted even though the TCP handshake already completed. Fix: when is_tcp_connected returns false for a TTL>0 subscription, park the subscription in pending_tcp_subscriptions_ with a 100 ms steady_timer instead of immediately NACKing. When the timer fires, retry is_tcp_connected: if the connection is now present, complete the subscription normally; if still absent after 100 ms, send the definitive NACK. The 100 ms window covers the TCP handshake + kernel accept + accept_cbk dispatch latency by a large margin. connections_mutex_ already prevents map corruption; this fix closes the logical TOCTOU ordering gap without adding cross-layer dependencies. The change is self-contained in the SD layer (2 files). Fixes COVESA#971.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
With ≥2
io_contextthreads,accept_cbk(which writes the new TCP connection intoconnections_) andhandle_eventgroup_subscription(which readsconnections_viais_tcp_connected) run on different threads with no ordering guarantee. If the UDP SUBSCRIBE arrives beforeaccept_cbkfinishes committing the connection,is_tcp_connectedreturns false and a spurious SUBSCRIBE_NACK is sent — even though the TCP three-way handshake already completed.Maintainer confirmed: "It rarely happened."
Race window:
connections_mutex_prevents map corruption but enforces no happens-before between the two async paths.Fix
When
is_tcp_connectedreturns false for aTTL > 0subscription, park the subscription inpending_tcp_subscriptions_with a 100 mssteady_timerinstead of immediately NACKing.When the timer fires,
retry_pending_tcp_subscriptionre-checksis_tcp_connected:handle_eventgroup_subscriptionnormally → ACK sentThe 100 ms window covers TCP handshake + kernel accept +
accept_cbkdispatch latency by a large margin. A duplicate SUBSCRIBE for the same (service, instance, eventgroup, address, port) tuple cancels the previous timer before arming a new one.Scope
2 files, SD layer only — no cross-layer dependency introduced:
implementation/service_discovery/include/service_discovery_impl.hpp—pending_tcp_sub_tstruct +retry_pending_tcp_subscriptiondeclaration +pending_tcp_subscriptions_map + muteximplementation/service_discovery/src/service_discovery_impl.cpp— deferred-NACK path at bothis_tcp_connectedcall sites +retry_pending_tcp_subscriptionimplementationBehavior
accept_cbkalways runs before SUBSCRIBE on one threadFixes #971.