Fix metrics queue starvation on partial upload failures#636
Fix metrics queue starvation on partial upload failures#636
Conversation
| user_log!( | ||
| " ! batch {} - uploaded {} events, {} event(s) failed and were kept for retry", | ||
| total_batches, | ||
| successful_count, | ||
| failed_count | ||
| ); |
There was a problem hiding this comment.
🟡 Log message says failed events are "kept for retry" but they are immediately deleted
The user-facing log message at line 164 says "{} event(s) failed and were kept for retry", but the code at lines 178-191 deletes both successful_ids and failed_ids from the database. The comment at line 180-181 even explicitly confirms this: "Rejected events are validation failures and will not succeed on retry." The total_discarded counter correctly tracks these as discarded (line 154), and the final summary at line 222 correctly says "discarded {} rejected events". But the per-batch message tells the user the opposite of what actually happens — that those events were kept for retry — which is misleading for anyone debugging queue behavior.
| user_log!( | |
| " ! batch {} - uploaded {} events, {} event(s) failed and were kept for retry", | |
| total_batches, | |
| successful_count, | |
| failed_count | |
| ); | |
| user_log!( | |
| " ! batch {} - uploaded {} events, {} event(s) failed and were discarded (validation errors)", | |
| total_batches, | |
| successful_count, | |
| failed_count | |
| ); |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
MetricsUploadResponsefromupload_metrics_with_retryso callers can handle partial failures explicitlyflush-metrics-dbto split successful vs rejected record IDs and delete both resolved sets, preventing poison rows from blocking the queueflush-metrics-dbfromflush-logsso queued SQLite metrics are drained during regular flush runsValidation
cargo fmtcargo test --lib test_split_record_ids_by_response -- --nocapturecargo test --lib test_failed_metrics_events_for_retry -- --nocapturecargo test