Skip to content

Conversation

@nathan-miller23
Copy link
Contributor

Behavior

Whenever our client host experiences transient memory contention, we note that a portion of our ConnectionMultiplexer clients get 'stuck' in state where backlog writer is in Activating state. Example error message below:

image

Further analysis showed that client returned IsConnected=true with PhysicalBridge in ConnectedEstablished state. ProcDump revealed that no StackExchange.Redis Backlog thread was present

Bug

When StartBacklogProcessor fails to start the backlog processor thread (say, when thread.Start fails with System.OutOFMemoryException) system gets stuck in state where _backlogProcessorIsRunning is true but no backlog thread is doing work.

Fix

try-catch-finally pattern to ensure we reset _backlogPRocessorIsRunning state when we fail to start thread

@nathan-miller23 nathan-miller23 changed the title Handle Backlog Process Startup Failures Handle Backlog Processor Startup Failures Feb 3, 2026
Copy link
Collaborator

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great catch, and a great bit of investigation thanks. It seems obvious, when stated, that this could fail badly in OOM scenarios. Of course, in the failing scenarios the process is already on it's knees and crying out in pain, but it would be great if, should the process survive this moment: we eventually get back into a stable state, so: great!

Thanks

@mgravell mgravell merged commit c8e3152 into StackExchange:main Feb 3, 2026
6 of 7 checks passed
@nathan-miller23
Copy link
Contributor Author

Thanks for the quick review @mgravell !

can you provide an ETA on when you'll publish a new nuget patch version bump with this fix? My team has a critical dependency on StackExchange library, and we need to know whether we should fund a bandaid workaround for this bug in our end to stabilize our PROD, or if we can wait to consume the updated nuget

@mgravell
Copy link
Collaborator

mgravell commented Feb 3, 2026

I was working on that as I left today - fighting GitHub Actions. My plan is "tomorrow am, UK time". It would have been today if Actions hadn't had a bad day.

@mgravell
Copy link
Collaborator

mgravell commented Feb 3, 2026

It looks like 2.10.13 made it to MyGet (our staging feed); I need to do a little book-keeping (validation, tagging, etc), but I'm confident we should have a build tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants