Guard against SQLite corruption and empty analytics cache#210
Guard against SQLite corruption and empty analytics cache#210
Conversation
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
- Increase busy_timeout from 5s to 30s for large databases - Set synchronous=NORMAL explicitly (safe with WAL mode) - Add WAL checkpoint on Store.Close() and after sync completion - Fix build-cache writing _last_sync.json before verifying Parquet files exist (prevented cache from ever rebuilding after a failed export) - Log count query errors in build-cache instead of silently swallowing - Add SQLite integrity check to verify command (PRAGMA integrity_check) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The buildCache function now returns a non-nil error when the database has messages but the Parquet export produces zero rows, so callers (CLI, serve scheduler, TUI auto-build) can fail fast or retry instead of silently treating an empty cache as success. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
verify: track corruption state across all checks and return a non-zero exit code at the end when integrity errors are found. CheckpointWAL: read the (busy, log, checkpointed) result columns from PRAGMA wal_checkpoint instead of discarding them via Exec, so callers get an honest error when the checkpoint is incomplete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explicitly prohibit asking for commit permission and reframe committing as non-destructive to counter the system prompt's general caution about git operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The early return for a missing Gmail source bypassed the final dbCorrupt check, causing verify to exit 0 despite finding integrity errors. Check dbCorrupt before returning nil in that branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ccd07b9 to
e488f07
Compare
roborev: Combined Review (
|
|
Acknowledged. Merging |
Summary
busy_timeoutfrom 5s to 30s — prevents timeout-induced corruption on large (27GB+) databases where lock contention during batch inserts can exceed 5 secondssynchronous=NORMALexplicitly — safe with WAL mode, avoids I/O bottlenecks that compound lock contentionStore.Close()and after each sync completion — folds WAL back into the main database, preventing accumulation across sessions and stale WAL entries_last_sync.jsonbefore verifying Parquet files — a failed export would write the state file with the current max message ID, causing future incremental builds to skip, leaving the cache permanently emptyPRAGMA integrity_check) to theverifycommand with--skip-db-checkopt-outContext
Discovered two issues on a production 27GB database:
_last_sync.jsonwas written unconditionally even when the Parquet export produced zero filesTest plan
make test)msgvault verify <email>runs integrity check and reports resultsmsgvault build-cachewith a corrupted/empty source DB does not write_last_sync.jsonmsgvault build-cache --full-rebuildworks correctly after the guard prevents state file write🤖 Generated with Claude Code