Build:
cmake --preset linux-ninja-release && cmake --build --preset linux-ninja-release
The Maintenance module provides a centralized orchestration layer for all database maintenance operations. It allows operators to define named maintenance schedules with cron-based execution, maintenance window enforcement, task sequencing with halt-on-failure semantics, and aggregated per-module health reporting.
| Interface / File | Role |
|---|---|
include/maintenance/database_maintenance_orchestrator.h |
Primary public API |
include/maintenance/maintenance_task.h |
Task types, job struct, job state enum |
include/maintenance/maintenance_schedule.h |
Schedule entry with JSON serialization and DAG dependency model |
include/maintenance/maintenance_schedule_store.h |
RocksDB-backed schedule persistence |
include/maintenance/maintenance_health_report.h |
Health report aggregation |
include/maintenance/i_maintenance_task_handler.h |
Task handler interface |
include/maintenance/maintenance_task_handler_impls.h |
Built-in handlers: storage compaction, replica validation, MVCC cleanup, function |
include/maintenance/i_distributed_lock.h |
Distributed lock interface + InProcessDistributedLock |
src/maintenance/database_maintenance_orchestrator.cpp |
Implementation |
src/maintenance/maintenance_registry.cpp |
Default schedule bundles |
src/maintenance/maintenance_schedule_store.cpp |
MaintenanceScheduleStore implementation |
Central coordinator for all maintenance scheduling and execution.
#include "maintenance/database_maintenance_orchestrator.h"
// Construction (via dependency injection)
auto orchestrator = DatabaseMaintenanceOrchestrator(
scheduler, // TaskScheduler*
index_maintenance, // std::shared_ptr<IndexMaintenanceManager>
audit_logger, // std::shared_ptr<utils::AuditLogger>
storage // IStorageEngine* (nullptr = in-memory-only)
);
orchestrator.start();
// Create a schedule
MaintenanceScheduleEntry schedule;
schedule.id = "nightly-index-rebuild";
schedule.name = "Nightly Index Rebuild";
schedule.cron_expression = "0 2 * * *"; // 2:00 AM daily
schedule.window_start_hour = 1;
schedule.window_end_hour = 5;
schedule.tasks = { MaintenanceTaskType::INDEX_REBUILD, MaintenanceTaskType::STATISTICS_UPDATE };
schedule.halt_on_task_failure = true;
schedule.enabled = true;
auto result = orchestrator.createSchedule(schedule);
// List recent jobs
auto jobs = orchestrator.listJobs(50);
// Get aggregated health report
MaintenanceHealthReport health = orchestrator.getHealthReport();src/maintenance/maintenance_registry.cpp provides standalone free functions in
namespace themis::maintenance that return pre-built MaintenanceScheduleEntry values:
#include "maintenance/database_maintenance_orchestrator.h"
#include "maintenance/maintenance_schedule.h"
// Get default schedule entries
auto daily = themis::maintenance::defaultDailySchedule();
auto weekly = themis::maintenance::defaultWeeklySchedule();
auto monthly = themis::maintenance::defaultMonthlySchedule();
auto quarterly = themis::maintenance::defaultQuarterlySchedule();
// Register all defaults (disabled by default) and the IndexMaintenance health probe:
themis::maintenance::registerDefaultMaintenanceSetup(orchestrator, index_mgr);| Function | Frequency | Tasks |
|---|---|---|
defaultDailySchedule() |
DAILY |
METRICS_COLLECTION, FRAGMENTATION_MONITORING, QUOTA_CHECK |
defaultWeeklySchedule() |
WEEKLY |
CONSISTENCY_CHECK, REPLICA_VALIDATION, PERFORMANCE_ANALYSIS, MVCC_CLEANUP |
defaultMonthlySchedule() |
MONTHLY |
FULL_CHECKDB, BACKUP_VERIFICATION, CAPACITY_TREND_ANALYSIS, INDEX_FRAGMENTATION_REPORT |
defaultQuarterlySchedule() |
QUARTERLY |
DISASTER_RECOVERY_DRILL, BASELINE_UPDATE |
Note: There is no
MaintenanceRegistryclass and nomaintenance_registry.hheader. The free functions above are defined insrc/maintenance/maintenance_registry.cpp.
In Scope:
- Schedule CRUD (create, read, update, patch, delete, enable, disable)
- Cron-based execution via
TaskScheduler - Maintenance window enforcement (UTC hour range)
- Sequential task execution with halt-on-failure
- Explicit per-task DAG dependency graph with Kahn's topological sort and cycle detection
- RocksDB schedule persistence via
MaintenanceScheduleStore(survives server restarts) - Per-module health probe registry and aggregation
- Job lifecycle management (PENDING → RUNNING → SUCCEEDED/FAILED/CANCELLED/SKIPPED)
- 24-hour job retention with automatic pruning
- Audit logging and Prometheus-compatible metrics
IMaintenanceTaskHandlerregistry — modules wire real execution logic viaregisterTaskHandler()IDistributedLockinjection — prevents two cluster nodes from running the same schedule concurrently- Multi-tenant schedule isolation — per-tenant maintenance windows and job quotas via
TenantMaintenanceConfig
Out of Scope:
- Raft-backed distributed lock implementation (interface provided; Raft integration planned v2.1.0)
- REPLICA_VALIDATION wiring to sharding/replica module (handler class provided; startup wiring pending)
- Maintenance impact prediction (planned v3.0.0)
Defined in include/maintenance/maintenance_task.h (enum class MaintenanceTaskType):
-- Daily --
METRICS_COLLECTION FRAGMENTATION_MONITORING QUOTA_CHECK
-- Weekly --
CONSISTENCY_CHECK REPLICA_VALIDATION PERFORMANCE_ANALYSIS
MVCC_CLEANUP
-- Monthly --
FULL_CHECKDB BACKUP_VERIFICATION CAPACITY_TREND_ANALYSIS
INDEX_FRAGMENTATION_REPORT
-- Quarterly --
DISASTER_RECOVERY_DRILL BASELINE_UPDATE
-- On-demand --
INDEX_REBUILD INDEX_REORGANIZE STATISTICS_UPDATE
STORAGE_COMPACTION ORPHAN_CLEANUP VECTOR_REINDEX
15 endpoints under /api/v1/maintenance/:
POST /schedules— create scheduleGET /schedules— list all (accepts?tenant_id=filter)GET /schedules/{id}— get by IDPUT /schedules/{id}— replacePATCH /schedules/{id}— partial updateDELETE /schedules/{id}— deletePOST /schedules/{id}/enable— enablePOST /schedules/{id}/disable— disablePOST /schedules/{id}/run— trigger now (optional{"force": true}body)GET /jobs— list recent jobs (last 24 hours)GET /jobs/{id}— get job detailsPOST /jobs/{id}/cancel— cancel running jobGET /health— aggregated health reportGET /task-handlers— list registered task handlersGET /status— orchestrator status snapshot
RBAC: maintenance:read · maintenance:write · maintenance:admin
Modules can register health probes to contribute to the aggregated health report:
orchestrator.registerHealthProbe("my_module", []() -> ModuleHealthSignal {
ModuleHealthSignal signal;
signal.module_name = "my_module";
signal.status = ModuleHealthStatus::OK;
signal.message = "All systems nominal";
return signal;
});55+ unit tests in tests/test_database_maintenance_orchestrator.cpp covering:
- Schedule CRUD and validation
- JSON round-trips (
toJson()/fromJson()/applyPatch()) - Maintenance window enforcement and SKIPPED state
- Job lifecycle (SUCCEEDED, FAILED, CANCELLED)
halt_on_task_failurecascading behaviour- Health probe registration and aggregation
- Metrics collection
- RocksDB schedule persistence (restart-persistence round-trips)
- DAG dependency ordering, cycle detection, and missing predecessor rejection
- Force-run (
triggerNow(id, force=true)) IDistributedLockintegration (tryAcquire, skip on lock held by peer)- Multi-tenant isolation (MT-01..MT-15):
tenant_idround-trip, filter, window override, quota enforcement
The following peer-reviewed sources form the scientific foundation of the Maintenance module.
-
Chaudhuri, S., & Weikum, G. (2000). Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System. Proceedings of the 26th International Conference on Very Large Data Bases (VLDB), 1–10. URL: https://dl.acm.org/doi/10.5555/645926.671577
Introduces the concept of self-tuning database components that monitor and adapt internal parameters at runtime. Directly motivates the
MaintenanceOrchestratoradaptive scheduling model and the health-probe feedback loop inhealth_probe.cpp. -
Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., & Syamala, M. (2004). Database Tuning Advisor for Microsoft SQL Server 2005. Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), 1110–1121. URL: https://dl.acm.org/doi/10.5555/1316689.1316803
Describes automated index/statistics recommendation. Informs the
REINDEX_HNSWandREBUILD_SECONDARY_INDEXEStask types and thehalt_on_task_failurecascading strategy.
-
Liu, C. L., & Layland, J. W. (1973). Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of the ACM, 20(1), 46–61. DOI: 10.1145/321738.321743
Rate-Monotonic Scheduling (RMS) theory for periodic task sets. Informs the maintenance-window priority model (
CRITICAL > HIGH > MEDIUM > LOW) and themax_concurrent_tasksadmission-control bound inTaskScheduler. -
Silberschatz, A., Galvin, P. B., & Gagne, G. (2018). Operating System Concepts (10th ed.). Wiley. ISBN: 978-1-119-32091-3.
Chapter 5 (CPU Scheduling) motivates the multi-level feedback queue used for maintenance job priorities and the preemptive scheduling of
CRITICALtasks.
-
Chaudhuri, S., & Weikum, G. (2000). Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System. VLDB 2000. https://dl.acm.org/doi/10.5555/645926.671577
-
Agrawal, S., et al. (2004). Database Tuning Advisor for Microsoft SQL Server 2005. VLDB 2004. https://dl.acm.org/doi/10.5555/1316689.1316803
-
Liu, C. L., & Layland, J. W. (1973). Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of the ACM, 20(1), 46–61. https://doi.org/10.1145/321738.321743
-
Silberschatz, A., Galvin, P. B., & Gagne, G. (2018). Operating System Concepts (10th ed.). Wiley. ISBN: 978-1-119-32091-3
This module is built as part of ThemisDB. See the root CMakeLists.txt for build configuration.
The implementation files in this module are compiled into the ThemisDB library.
See ../../include/maintenance/README.md for the public API.