-
Notifications
You must be signed in to change notification settings - Fork 138
SolrBackup may never finish if REQUESTSTATUS returns notfound for an accepted backup #824
Copy link
Copy link
Open
Description
What happened
A SolrBackup can remain stuck in InProgress forever if:
- the backup request is accepted (operator backup submit, stable backup async ID),
- the operator suddenly become unavailable,
- the Solr async tracking entry for that backup is deleted (DELETESTATUS API, tracker deleteSingleAsyncId),
- and the operator become available again.
After that:
- the backup files already exist in Solr,
REQUESTSTATUSfor the backup request ID returnsnotfound(REQUESTSTATUS API, tracker getAsyncTaskRequestStatus),- but the
SolrBackupCR still stays ininProgress=true, - and the operator keeps polling instead of reaching a terminal state (backup reconcile loop, requeue after 5s).
Environment
- macOS
- local
kindcluster - Kubernetes / kind node version:
v1.32.1 solr-operatorversion:v0.10.0-orerekeasesolr-operatorbuilt frommasteron March 22, 2026 (ca9d3c5c37a59f29570a6b49a8da5dc614aba75e)- Solr version:
9.10.0
Steps to reproduce
- Deploy
solr-operatoron a localkindcluster. - Create a 1-node
SolrCloudwith a local backup repository, then create a
collection and start aSolrBackupfor it. - As soon as the backup first shows
inProgress=true, scale the
solr-operatordeployment down to0. - While the operator is down, wait for the Solr async request to finish, then
delete only that async status entry withDELETESTATUS. - Confirm the backup data still exists, but
REQUESTSTATUSfor that same
request ID now returnsnotfound. - Scale the operator back up to
1and observe that theSolrBackupCR never
reaches a terminal state.
The stuck status looks like:
status:
collectionBackupStatuses:
- asyncBackupStatus: notfound
inProgress: trueand it never sets finished: true or successful: true.
Expected behavior
Once an accepted backup later becomes notfound, the operator should not leave the CR in InProgress forever.
It should eventually either:
- recover, or
- mark the backup failed with a clear reason.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels