Implements the Retrieval-Augmented Generation pipeline for ThemisDB, combining vector similarity search, LLM inference, and hybrid retrieval to answer queries from stored documents.
In scope: Vector retrieval from ThemisDB index, LLM integration for answer generation, context window management, hybrid search (vector + BM25), re-ranking.
Out of scope: LLM model management (handled by llm module), full-text index construction (handled by search module), embedding generation (handled by LLM module).
rag_pipeline.cpp— orchestrates retrieval → augmentation → generationllm_integration.cpp— LLM connector for RAGcontext_manager.cpp— context window managementretriever.cpp— vector and hybrid retrieval
Maturity: 🟡 Beta — Basic RAG pipeline with vector retrieval and LLM integration operational; hybrid search and re-ranking in progress.
Implementation files for ThemisDB's Retrieval-Augmented Generation (RAG) system providing intelligent document retrieval, quality evaluation, knowledge gap detection, and ethical compliance checking.
- rag_judge.cpp - Main orchestrator for multi-dimensional evaluation
- knowledge_gap_detector.cpp - Three-level gap detection system
- llm_integration.cpp - Bridge to LLM inference engine
- streaming_retriever.cpp - Incremental context window filling with token-budget enforcement, relevance-ordered streaming, MMR deduplication, and cancellation support
- faithfulness_evaluator.cpp - Fact-checking against sources
- relevance_evaluator.cpp - Query-answer alignment
- completeness_evaluator.cpp - Query aspect coverage
- coherence_evaluator.cpp - Structure and readability
- bias_detector.cpp - Ethical compliance checking
- claim_extractor.cpp - Extract atomic claims from answers
- response_parser.cpp - Parse LLM evaluation responses
- prompt_templates.cpp - Template and few-shot management
- judge_config.cpp - Configuration validation
- rubric_evaluator.cpp - Custom rubric evaluation
- judge_ensemble.cpp - Multi-judge voting strategies
- pairwise_comparator.cpp - Head-to-head comparisons
- cot_evaluator.cpp - Chain-of-thought evaluation
- geval_evaluator.cpp - G-Eval framework (Liu et al., 2023)
- llm_judge_integration.cpp - Judge orchestration
- llm_meta_analyzer.cpp - Performance meta-analysis
| Mode | Latency | Use Case |
|---|---|---|
| Fast | ~100ms | High-throughput production |
| Balanced | ~500ms | Standard RAG pipeline |
| Thorough | ~2s | Research, benchmarking |
./build/tests/test_rag_judge
./build/tests/test_knowledge_gap_detector
./build/tests/test_rag_streaming_retriever
./build/tests/test_rag_pipeline_integration
./build/benchmarks/bench_rag_evaluationDie Implementierung basiert auf folgenden peer-reviewten Forschungsarbeiten:
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … Kiela, D. (2020).
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474.
arXiv: 2005.11401
Grundlegendes RAG-Framework: Kombination von Dense-Retrieval (DPR) mit Seq2Seq-Generierung. Direkte Grundlage für das Retrieval → Augmentation → Generation-Muster in
rag_judge.cppundllm_integration.cpp.
-
Jiang, Z., Xu, F. F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., … Neubig, G. (2023). Active Retrieval Augmented Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7969–7992. arXiv: 2305.06983
FLARE-Ansatz: Iteratives, vorausschauendes Retrieval statt einmaligem Batch-Abruf. Motiviert das inkrementelle Füllen des Kontextfensters in
streaming_retriever.cpp: Dokumente werden schrittweise hinzugefügt, bis das Token-Budget erschöpft ist. -
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. arXiv: 2307.03172
Zeigt, dass LLMs relevante Informationen am Anfang und Ende des Kontextfensters besser verarbeiten als in der Mitte. Begründet die Relevanz-sortierte Reihenfolge (
sort_by_relevance = true) inStreamingRetrieverConfig: hochrelevante Dokumente werden zuerst in das Kontextfenster geladen.
- Carbonell, J., & Goldstein, J. (1998).
The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries.
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR), 335–336.
DOI: 10.1145/290941.291025
Ursprüngliche Formulierung von Maximal Marginal Relevance (MMR): Balance zwischen Relevanz und Diversität bei der Dokumentauswahl. Direkte Grundlage für die Jaccard-basierte MMR-Deduplizierung (
enable_mmr_deduplication,mmr_similarity_threshold) inStreamingRetriever::Impl::isDuplicate().
- Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2023).
In-Context Retrieval-Augmented Language Models.
Transactions of the Association for Computational Linguistics, 11, 1316–1331.
arXiv: 2302.00083
Untersucht die optimale Nutzung des Kontextfensters für RAG: wie viele Dokumente eingebettet werden sollen und wie das Token-Budget aufgeteilt wird. Begründet die
max_context_tokens-Konfiguration und das Token-Schätzverfahren (estimateTokens()) inContextWindowFiller.
- Headers:
../../include/rag/README.md - Documentation:
../../docs/src/rag/ - Examples:
../../examples/rag/
19 files | ~7,600 lines | MIT License
-
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://arxiv.org/abs/2005.11401
-
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., … Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint. https://arxiv.org/abs/2312.10997
-
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., … Yih, W.-t. (2020). Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of EMNLP 2020, 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
-
Ma, X., Guo, J., Zhang, R., Fan, Y., Cheng, X., & Cheng, X. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55(9), 195:1–195:35. https://doi.org/10.1145/3560815
-
Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., … Sifre, L. (2022). Improving Language Models by Retrieving from Trillions of Tokens. Proceedings of the 39th International Conference on Machine Learning (ICML), 2206–2240. https://arxiv.org/abs/2112.04426