Avoid spamming server if it's closing connections right after they were opened#195
Avoid spamming server if it's closing connections right after they were opened#195paulocsanz wants to merge 3 commits intoyjs:masterfrom
Conversation
…ere opened This adds a exponential backoff for reconnects when the server closes the connections too fast, as opposed to only having it when the connection doesn't open at all. We just had an incident because of this, something thrashed the database, making new ws connections close right after they opened since they couldn't access auth/initial doc data. But since the connection was technically opened for a brief moment it resets the unsucessfull connections exponential backoff. Making it try to reconnect immediately, indefinitely, causing even more strain on our database that was already struggling. Spreading to more and more users. The only caveat is that we only reset this backoff if a connection lives for at least `maxBackoffIntervalOnSuccessfulConnects * 2` ms, since the default is 2500ms seems pretty reasonable.
|
Hi @dmonad could I get a review on this? It fixes a issue that took our platform down (railway.com), I would love to stop forking this repo! And I'm sure other users will benefit from it! |
|
Happy to make changes if you see fit, I understand it's not the best variable naming, but wanted to be specific on it. |
There was a problem hiding this comment.
Pull request overview
Adds an additional exponential backoff path for reconnects when a WebSocket connection technically opens but is closed shortly thereafter, to reduce rapid reconnect loops during backend instability.
Changes:
- Introduce a “successful-connect” reconnect backoff in
setupWSbased on time since last successfulonopen. - Add new provider option
maxBackoffIntervalOnSuccessfulConnects(defaulting to 2500ms) and trackwsLastConnectAt/wsSubsequentConnects. - Document the new option in
README.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/y-websocket.js | Implements the new reconnect backoff logic and adds provider configuration/state for it. |
| README.md | Documents the new reconnect-backoff option in the client configuration example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Specify the maximum amount to wait between reconnects (we use exponential backoff). | ||
| maxBackoffTime: 2500 | ||
| // Specify the maximum amount to wait between reconnects if previous connection succeeded (we use exponential backoff). | ||
| // This prevents connections that close right after their successful creation from being retried immediately. | ||
| // Backoff only resets when a connection has been open for `maxBackoffIntervalOnSuccessfulConnects * 2` ms | ||
| maxBackoffIntervalOnSuccessfulConnects: 2500 |
There was a problem hiding this comment.
The README wsOpts example is no longer valid JS because maxBackoffTime: 2500 is missing a trailing comma now that another property follows. Add the comma so users can copy/paste the snippet.
| const setupWS = (provider) => { | ||
| // Start with 100ms interval between successful connects and increase with exponential backoff | ||
| const backoff = math.min(math.pow(2, provider.wsSubsequentConnects) * 50, provider.maxBackoffIntervalOnSuccessfulConnects) | ||
| const timeout = provider.wsLastConnectAt ? Date.now() - provider.wsLastConnectAt.getTime() : backoff | ||
| if (timeout < backoff) { | ||
| setTimeout(setupWS, backoff - timeout, provider) | ||
| return | ||
| } else if (timeout > provider.maxBackoffIntervalOnSuccessfulConnects * 2) { |
There was a problem hiding this comment.
setupWS applies/schedules the successful-connect backoff before checking provider.shouldConnect and provider.ws === null. This can keep scheduling timers even after disconnect() sets shouldConnect = false (and/or while a socket already exists). Consider moving the shouldConnect/ws===null guard before the backoff scheduling so disconnect reliably stops all reconnect scheduling.
| const setupWS = (provider) => { | ||
| // Start with 100ms interval between successful connects and increase with exponential backoff | ||
| const backoff = math.min(math.pow(2, provider.wsSubsequentConnects) * 50, provider.maxBackoffIntervalOnSuccessfulConnects) | ||
| const timeout = provider.wsLastConnectAt ? Date.now() - provider.wsLastConnectAt.getTime() : backoff | ||
| if (timeout < backoff) { | ||
| setTimeout(setupWS, backoff - timeout, provider) | ||
| return | ||
| } else if (timeout > provider.maxBackoffIntervalOnSuccessfulConnects * 2) { | ||
| provider.wsSubsequentConnects = 0 | ||
| } |
There was a problem hiding this comment.
The new reconnect behavior (backoff when connections open but immediately close) isn’t covered by existing tests (current suite only checks a successful connection/sync-status). Please add a test that simulates a WebSocket that opens and then closes quickly, and asserts that reconnect attempts are delayed with exponential backoff and that the counter resets after a sufficiently long-lived connection.
| * @param {WebsocketProvider} provider | ||
| */ | ||
| const setupWS = (provider) => { | ||
| // Start with 100ms interval between successful connects and increase with exponential backoff |
There was a problem hiding this comment.
The comment says "Start with 100ms interval between successful connects" but the calculation uses math.pow(2, provider.wsSubsequentConnects) * 50 (so the first backoff after one successful connect is 100ms, but the base constant is 50ms). Please clarify the comment (or adjust the constant/counter) so it matches the actual timing behavior.
| // Start with 100ms interval between successful connects and increase with exponential backoff | |
| // Start with a 50ms interval and double it after each successful connect |
| maxBackoffTime = 2500, | ||
| disableBc = false, | ||
| maxBackoffIntervalOnSuccessfulConnects = 2500, | ||
| disableBc = false |
There was a problem hiding this comment.
Syntax error in the constructor options destructuring: there is no comma after disableBc = false before socketTimeout = .... This will prevent the module from parsing/building.
| disableBc = false | |
| disableBc = false, |
This adds a exponential backoff for reconnects when the server closes connections right after they are opened, as currently there is only a backoff when the connection doesn't open at all.
We just had an incident because of this, something thrashed the database, making new ws connections close right after they opened since they couldn't access auth/initial doc data. But since the connection was technically opened for a brief moment it resets the unsucessfull connections exponential backoff. Making it try to reconnect immediately, indefinitely, causing even more strain on our database that was already struggling. Spreading to more and more users.
The only caveat is that we only reset this backoff if a connection lives for at least
maxBackoffIntervalOnSuccessfulConnects * 2ms, since the default is 2500ms seems pretty reasonable.We could also default
maxBackoffIntervalOnSuccessfulConnectsto 0 to ensure current behavior is not altered, but it seems that everybody would benefit from this being on by default.