Analytics Module Roadmap

Version: 1.7.0 Status: 🟢 Production-Ready Last Updated: 2026-03-09 Module Path: src/analytics/

Current Status

Production-ready for core OLAP, data export, process mining, text analytics, LLM integration, CEP engine, streaming aggregation windows, incremental materialized views, real-time anomaly detection, model serving / online inference pipeline, and predictive analytics / time-series forecasting.

Completed ✅

In Progress 🚧

(none — all Phase 3 items completed)

Planned Features 📋

Short-term (Next 3-6 months)

[P] GPU-accelerated OLAP aggregations (CUDA) (Issue: #1469)
[P] Zero-copy Arrow data transfer optimizations (Issue: #1471)

Long-term (6-12 months)

CUDA geospatial distance and containment kernels (Target: Q3 2026)
- Inputs: WGS84 points/polygons, batch-size up to 1e6
- Outputs: distance matrix + containment bitset
- Constraints: deterministic FP tolerance ≤ 1e-6
- Errors: invalid geometry (NaN/Inf coordinates), polygon self-intersection, overflow during Haversine distance
- Tests: unit + property-based + GPU/CPU parity
- Perf: ≥ 8x speedup vs CPU baseline on RTX-class GPU
Federated analytics query dispatch across multiple ThemisDB clusters (Target: Q3 2026)
- Affected: src/analytics/distributed_analytics.cpp, include/analytics/distributed_analytics.h
- Expected behavior: scatter-gather with partial failure tolerance; partial results returned if <20% shards fail
- Errors: shard unreachable → skip with warning; tenant isolation violation → reject with PERMISSION_DENIED
- Tests: unit tests for scatter/gather logic + integration tests with mock shards
- Perf: fan-out latency ≤ 200 ms for 16 shards on LAN
- Per-tenant data isolation at the SourceRegistry boundary
SARIMA and Prophet-style forecasting models (Target: Q4 2026)
- Affected: src/analytics/forecasting.cpp, include/analytics/forecasting.h
- Expected behavior: extends ForecastMethod enum; fit()/predict() API unchanged
- Errors: insufficient data for seasonal period (< 2 × seasonality), NaN in input series → structured error
- Tests: unit tests for fit/predict/evaluate/serialize round-trip; parity vs Python statsmodels reference
- Perf: SARIMA fit ≤ 5 s for series of length 10 000
- Confidence intervals and decomposition retained
AutoML ONNX export and deployment pipeline (Target: Q4 2026)
- Affected: src/analytics/automl.cpp, include/analytics/automl.h
- Expected behavior: AutoMLEngine::exportONNX(path) serializes trained model; loadable by MLServingClient
- Errors: unsupported model type → UNSUPPORTED_OPERATION; serialization failure → structured error with cause
- Tests: unit test export → load → infer round-trip; ONNX opset compatibility for all supported algorithms
- Perf: export time ≤ 500 ms for any model trained on ≤ 1M samples

Phase 1: Core Analytics Engine (Status: Completed ✅)

Phase 2: Streaming & Incremental Analytics (Status: Completed ✅)

CEP full engine implementation in analytics/cep_engine.cpp
Streaming aggregation windows (tumbling/sliding/session/hopping) in analytics/streaming_window.cpp
Incremental materialized views in analytics/incremental_view.cpp

Phase 3: Distributed & ML-Augmented Analytics (Status: Completed ✅)

Columnar execution engine with vectorized operator pipeline (analytics/columnar_execution.cpp)
LLVM-JIT compilation for hot aggregation paths (analytics/jit_aggregation.cpp): hot-path detection and template-specialised aggregation dispatch; LLVM MCJIT backend reserved behind THEMIS_HAS_LLVM_JIT compile flag (Issue: #1482)
Distributed analytics sharding across cluster nodes (Issue: #1483)
Predictive analytics and time-series forecasting integration (Issue: #1484)
AutoML integration for automated model selection
Model serving and online inference pipeline (analytics/model_serving.cpp) (Issue: #1477)

Production Readiness Checklist

Known Issues & Limitations

NLP text analyzer uses rule-based approaches — not suitable as a replacement for full NLP frameworks
LLM analyzer requires external API keys; responses are non-deterministic
Arrow-dependent formats (Parquet, Feather, IPC) require compile-time flag THEMIS_HAS_ARROW
Graph analytics advanced algorithms (betweenness centrality, Louvain community detection) are now implemented as AQL functions in include/query/functions/graph_extensions.h

Breaking Changes

Arrow export format options may expand in v1.7.0 (additive, non-breaking)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analytics Module Roadmap

Current Status

Completed ✅

In Progress 🚧

Planned Features 📋

Short-term (Next 3-6 months)

Long-term (6-12 months)

Phase 1: Core Analytics Engine (Status: Completed ✅)

Phase 2: Streaming & Incremental Analytics (Status: Completed ✅)

Phase 3: Distributed & ML-Augmented Analytics (Status: Completed ✅)

Production Readiness Checklist

Known Issues & Limitations

Breaking Changes

See Also

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Analytics Module Roadmap

Current Status

Completed ✅

In Progress 🚧

Planned Features 📋

Short-term (Next 3-6 months)

Long-term (6-12 months)

Phase 1: Core Analytics Engine (Status: Completed ✅)

Phase 2: Streaming & Incremental Analytics (Status: Completed ✅)

Phase 3: Distributed & ML-Augmented Analytics (Status: Completed ✅)

Production Readiness Checklist

Known Issues & Limitations

Breaking Changes

See Also