ThemisDB's Backup & Recovery system provides comprehensive data protection capabilities with support for multiple backup strategies, integrity verification, and point-in-time recovery.
- Complete database snapshot using RocksDB checkpoint API
- Includes all data files and Write-Ahead Log (WAL) files
- Foundation for incremental and differential backups
- Duration: Medium (depends on database size)
- Storage: Full database size
auto result = backup_mgr->createFullBackup("/backups");
if (result) {
std::cout << "Backup created at: " << *result << std::endl;
} else {
std::cerr << "Backup failed: " << result.error().message() << std::endl;
}- Captures only changes since the last backup (full or incremental)
- Small storage footprint
- Fast execution
- Duration: Fast
- Storage: Small (only changed data)
auto result = backup_mgr->createIncrementalBackup("/backups");- Captures changes since the last full backup
- Medium storage footprint
- Faster restore than incremental chain
- Duration: Fast
- Storage: Medium (accumulated changes since full backup)
auto result = backup_mgr->createDifferentialBackup("/backups");All backups include integrity verification using:
- SHA-256 checksums for data validation
- Manifest file validation
- Structure verification (checkpoint, WAL, metadata)
- RAID5/6 shard completeness checks
auto result = backup_mgr->verifyBackup(backup_path);
if (result) {
std::cout << "Backup integrity verified" << std::endl;
}Backups can be compressed to save storage space:
// Compress a backup
auto compressed = backup_mgr->compressBackup(backup_path);
if (compressed) {
std::cout << "Compressed to: " << *compressed << std::endl;
}
// Decompress for restore
auto decompressed = backup_mgr->decompressBackup(compressed_file, dest_dir);Continuous archiving of Write-Ahead Log files for point-in-time recovery:
auto result = backup_mgr->archiveWAL("/wal_archive");- Weekly full backup
- Daily incremental backups
- Pros: Minimal backup time, small storage
- Cons: Longer restore time (need to apply all incrementals)
- Weekly full backup
- Daily differential backups
- Pros: Faster restore (only need full + latest differential)
- Cons: Larger backup sizes over time
- Daily full backups
- Pros: Simplest restore process
- Cons: Highest storage and time requirements
For RAID5/6 configurations, the backup system ensures all shards (data + parity) are included:
// Automatic RAID detection
auto raid_config = BackupManager::detectRAIDConfiguration();
// Verify RAID backup completeness
auto result = backup_mgr->isBackupComplete(backup_path, raid_config);Important: For RAID5/6, ALL shards must be backed up together to ensure complete data recovery.
auto result = backup_mgr->restoreFromBackup(backup_path);
if (result) {
std::cout << "Database restored successfully" << std::endl;
}PITR is handled by the PITRManager class (see include/storage/pitr_manager.h):
PITRManager pitr(db, changefeed, snapshot_mgr);
// Restore to specific sequence number
auto result = pitr.restoreToSequence(target_seq);
// Restore to timestamp
auto result = pitr.restoreToTimestamp(timestamp_ms);
// Restore to named snapshot
auto result = pitr.restoreToTag("before_migration");backup_dir/
├── full_20260122_120000/
│ ├── checkpoint/ # RocksDB checkpoint data
│ ├── wal/ # WAL files at checkpoint time
│ ├── raid_topology/ # RAID5/6: shard topology info
│ │ ├── shard_0/
│ │ ├── shard_1/
│ │ └── shard_parity/
│ └── MANIFEST.json # Backup metadata
├── incr_20260123_120000/
│ ├── wal/ # Incremental WAL files
│ └── MANIFEST.json
├── diff_20260124_120000/
│ ├── wal/ # Differential WAL files
│ └── MANIFEST.json
└── latest -> full_20260122_120000/ # Symlink to latest backup
The backup system uses Result<T> pattern for type-safe error handling:
auto result = backup_mgr->createFullBackup(dest_dir);
if (!result) {
auto error = result.error();
// Check error code
if (error.code() == ErrorCode::ERR_STORAGE_DISK_FULL) {
// Handle disk full
}
// Get error message with context
std::cerr << "Error: " << error.message() << std::endl;
// Get metadata for user guidance
auto metadata = error.metadata();
std::cerr << "Solution: " << metadata.solution << std::endl;
}| Error Code | Description | Severity |
|---|---|---|
ERR_BACKUP_CREATION_FAILED |
Backup creation failed | Error |
ERR_BACKUP_RESTORATION_FAILED |
Backup restoration failed | Error |
ERR_BACKUP_VERIFICATION_FAILED |
Backup integrity check failed | Warning |
ERR_BACKUP_NOT_FOUND |
Backup does not exist | Error |
ERR_BACKUP_INVALID_TYPE |
Unsupported backup type | Error |
ERR_BACKUP_INCOMPLETE |
Missing backup components | Critical |
ERR_BACKUP_COMPRESSION_FAILED |
Compression failed | Error |
ERR_BACKUP_DECOMPRESSION_FAILED |
Decompression failed | Error |
ERR_BACKUP_CHECKSUM_MISMATCH |
Checksum verification failed | Critical |
ERR_BACKUP_MANIFEST_CORRUPT |
Manifest file corrupted | Error |
ERR_BACKUP_WAL_ARCHIVE_FAILED |
WAL archiving failed | Error |
- Test restore procedures regularly
- Verify backup integrity after creation
- Practice recovery scenarios
- Keep multiple generations of backups
- Implement retention policies based on RPO/RTO requirements
- Archive old backups offsite
- Monitor backup success/failure
- Track backup sizes and durations
- Alert on backup verification failures
- Encrypt backups for sensitive data
- Secure backup storage locations
- Implement access controls
- For RAID5/6, coordinate backup across all shards
- Verify all shards are present in backups
- Test cross-shard restoration
- Full backup: ~10-50 MB/s (depends on I/O)
- Incremental/Differential: Very fast (only WAL files)
- Compression: Adds CPU overhead but saves storage
- Full restore: Similar to backup speed
- Incremental chain: Slower (need to apply all changes)
- Differential: Faster than incremental chain
- Schedule backups during low-traffic periods
- Use differential backups for faster restore
- Consider parallel backup/restore (future feature)
- Compress backups if storage is limited
- Use local SSD storage for backup destination
#include "storage/backup_manager.h"
#include "storage/rocksdb_wrapper.h"
// Initialize
auto db = std::make_shared<RocksDBWrapper>(config);
db->open();
auto backup_mgr = std::make_unique<BackupManager>(db);
// Weekly full backup
auto full_result = backup_mgr->createFullBackup("/backups");
if (!full_result) {
std::cerr << "Full backup failed: " << full_result.error().message() << std::endl;
return;
}
// Verify backup
auto verify_result = backup_mgr->verifyBackup(*full_result);
if (!verify_result) {
std::cerr << "Backup verification failed: " << verify_result.error().message() << std::endl;
return;
}
// Compress for storage
auto compress_result = backup_mgr->compressBackup(*full_result);
if (compress_result) {
std::cout << "Backup compressed: " << *compress_result << std::endl;
}
// Daily incremental backup
auto incr_result = backup_mgr->createIncrementalBackup("/backups");
// List all backups
auto backups = backup_mgr->listBackups("/backups");
for (const auto& backup : backups) {
std::cout << "Backup: " << backup << std::endl;
}
// Restore if needed
if (disaster_occurred) {
auto latest_backup = backups.back();
auto restore_result = backup_mgr->restoreFromBackup(
"/backups/" + latest_backup
);
if (restore_result) {
std::cout << "Database restored successfully" << std::endl;
}
}- Parallel backup/restore for faster performance
- Backup deduplication to reduce storage
- Cloud backup support (S3, Azure Blob, GCS)
- Backup encryption
- Automatic retention policy enforcement
- Recovery time estimation
- Backup catalog/metadata tracking
- Cross-region replication
- Snapshot-based backups
backup:
schedule:
full: "0 2 * * 0" # Weekly Sunday 2 AM
incremental: "0 2 * * 1-6" # Daily except Sunday
retention:
full: 4 # Keep 4 full backups
incremental: 30 # Keep 30 days of incrementals
compression: true
verify_after_backup: true
destinations:
- type: local
path: /backup/local
- type: s3
bucket: themis-backups
region: us-east-1# Check disk space
df -h /backups
# Clean old backups
rm -rf /backups/old_*
# Implement retention policy# Don't use this backup - it's corrupted
# Try previous backup
ls -lt /backups/
# Check for disk errors
dmesg | grep -i error# Verify all shards are accessible
# Check RAID configuration
echo $THEMIS_RAID_GROUP
echo $THEMIS_SHARDS
# Ensure all shard nodes are running
# Create new coordinated backup- RocksDB Checkpoint Documentation
- Write-Ahead Logging (WAL)
- PITR Manager:
include/storage/pitr_manager.h - Error Registry:
include/utils/error_registry.h - Test Examples:
tests/test_backup_manager_enhanced.cpp