fix(ci): sequence mongodb sharded deploy to prevent mongos hang#2361
Conversation
Delay mongos startup until configsvr is fully ready to avoid a race condition in the Bitnami mongodb-sharded entrypoint that causes ~20% CI failure rate on ctst-end2end-sharded. Issue: ZENKO-5229
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
|
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
|
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue ZENKO-5229. Goodbye delthas. |
Summary
Fixes a ~20% flaky failure rate on
ctst-end2end-shardedcaused by a race conditionin the Bitnami mongodb-sharded entrypoint during mongos startup.
Problem
When
kubectl applydeploys all MongoDB sharded StatefulSets at once (configsvr,mongos, shards), mongos sometimes starts before configsvr's replica set is fully
initialized. The Bitnami mongos entrypoint does:
wait-for-porton configsvr:27017 — succeeds as soon as the port is open"Found MongoDB server listening at configsvr:27017 !"mongosh --host configsvr -u root -p <pw> adminwithdb.getUsers()toverify the node is available
Step 3 is the problem: configsvr's port may be open while the replica set is still
initializing (primary election, auth user creation). The
mongoshcall has notimeout in the entrypoint (
mongodb_execute_print_outputinlibmongodb-sharded.sh),so it blocks forever waiting for a usable authenticated session.
Meanwhile, shard0-data-0 completes its own replica set init, stops
mongod, and triesto connect to mongos to register itself as a shard. Since mongos never started, shard0
loops on
"timeout reached before the port went into state inuse". The liveness probe(
pgrep mongod,initialDelaySeconds: 60,failureThreshold: 2) kills shard0 every~2 minutes because
mongodwas stopped for reconfiguration. The 5-minute rollouttimeout on mongos expires, and the job fails.
This was confirmed across 3 consecutive CI attempts (run 23301276674, attempts 2-4),
all showing identical behavior: configsvr healthy, mongos hung after configsvr
connection, shard0 crash-looping from liveness kills.
Why it's flaky (not deterministic): the race window is narrow. 80% of the time,
configsvr completes replica set init before mongos reaches the
mongoshauth check.20% of the time, mongos wins the race and hangs.
Solution
Sequence the deploy so mongos only starts after configsvr is fully ready:
kubectl applythe full manifest (all StatefulSets created)mongoshauth succeeds = replica setfully initialized)
This ensures that when mongos starts, configsvr is guaranteed to be fully initialized,
so the
mongoshauth check succeeds immediately.Alternatives considered
Enable
startupProbeon mongos and shard data nodes: The Bitnami chart ships withstartupProbe.enabled: falsefor mongos and shards (only configsvr has it enabled).Enabling it would prevent the liveness probe from killing shard0 during init, giving
more time for the deadlock to self-resolve. However, this only treats the symptom — it
gives more time but doesn't prevent the
mongoshhang on mongos. If configsvr is slowenough, mongos would still hang indefinitely. We may still want to enable startupProbes
as a defense-in-depth measure separately.
Wrap
mongoshwithtimeoutin the entrypoint: Would fix the root cause (themissing timeout), but requires patching the Bitnami container image or injecting a
custom entrypoint script. Higher maintenance burden for a vendored chart.
Increase
MONGODB_INIT_RETRY_ATTEMPTS/MONGODB_INIT_RETRY_DELAY: These controlthe
retry_whilewrapper around themongoshcall. However, sincemongoshitselfblocks indefinitely (the first attempt never returns),
retry_whilenever gets achance to retry. These settings would have no effect on the hang.
Split the manifest and apply resources separately: Would also work, but requires
yqto filter multi-document YAML by resource kind. The scale-to-zero approach issimpler — it uses only
kubectland doesn't require parsing the manifest.