Skip to content

RAG document Q&A with prompt engineering lab — upload PDFs/DOCX, get cited answers

License

Notifications You must be signed in to change notification settings

ChunkyTortoise/docqa-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Sponsor

docqa-engine

Upload documents, ask questions -- get cited answers with a prompt engineering lab.

CI Python Tests License Live Demo

Live Demo -- try it without installing anything.

Demo Snapshot

Demo Snapshot

What This Solves

  • RAG pipeline from upload to answer -- Ingest documents (PDF, DOCX, TXT, MD, CSV), chunk them with pluggable strategies, embed with TF-IDF, and retrieve using BM25 + dense hybrid search with Reciprocal Rank Fusion
  • Prompt engineering lab for A/B testing -- Create prompt templates, run the same question through different strategies side-by-side, compare outputs
  • Citation accuracy matters -- Faithfulness, coverage, and redundancy scoring for every generated citation

Service Mapping

  • Service 3: Custom RAG Conversational Agents
  • Service 5: Prompt Engineering and System Optimization

Certification Mapping

  • IBM Generative AI Engineering with PyTorch, LangChain & Hugging Face
  • IBM RAG and Agentic AI Professional Certificate
  • Vanderbilt ChatGPT Personal Automation
  • Duke University LLMOps Specialization

Architecture

flowchart TB
    Upload["Document Upload\n(PDF, DOCX, TXT, MD, CSV)"]
    Chunk["Chunking Engine\n(semantic, fixed, sliding window)"]
    Embed["Embedding Layer\n(TF-IDF, BM25, Dense)"]
    VStore["Vector Store\n(FAISS / in-memory)"]
    Hybrid["Hybrid Retrieval\n(BM25 + Dense + RRF fusion)"]
    Rerank["Cross-Encoder Re-Ranker"]
    QExpand["Query Expansion\n(synonym, PRF, decompose)"]
    Citation["Citation Scoring\n(faithfulness, coverage, redundancy)"]
    Answer["Answer Generation"]
    Convo["Conversation Manager\n(multi-turn context)"]
    API["REST API\n(JWT auth, rate limiting, metering)"]
    UI["Streamlit Demo UI\n(4-tab interface)"]

    Upload --> Chunk --> Embed --> VStore
    QExpand --> Hybrid
    VStore --> Hybrid --> Rerank --> Answer
    Answer --> Citation
    Answer --> Convo
    API --> Answer
    UI --> API
Loading

Key Metrics

Metric Value
Test Suite 550+ automated tests
Retrieval Accuracy Hybrid > BM25-only by 15-25%
Re-Ranking Boost +8-12% relevance improvement
Query Latency <100ms for 10K document corpus
Citation Accuracy Faithfulness + coverage scoring
API Rate Limit Configurable per-user metering

Modules

Module File Description
Ingest ingest.py Multi-format document loading (PDF, DOCX, TXT, MD, CSV)
Chunking chunking.py Pluggable chunking strategies: fixed-size, sentence-boundary, semantic
Embedder embedder.py TF-IDF embedding (5,000 features, no external API calls)
Retriever retriever.py BM25 + dense cosine + hybrid RRF fusion
Answer answer.py Context-aware answer generation with source citations
Prompt Lab prompt_lab.py Prompt versioning and A/B comparison framework
Citation Scorer citation_scorer.py Citation faithfulness, coverage, and redundancy scoring
Evaluator evaluator.py Retrieval metrics: MRR, NDCG@K, Precision@K, Recall@K, Hit Rate
Batch batch.py Parallel batch ingestion and query processing
Exporter exporter.py JSON/CSV export for results and metrics
Cost Tracker cost_tracker.py Per-query token and cost tracking
Pipeline pipeline.py End-to-end DocQAPipeline class
REST API api.py FastAPI wrapper with JWT auth, rate limiting, metering
Vector Store vector_store.py Pluggable vector store backends (FAISS, in-memory)
Re-Ranker reranker.py Cross-encoder TF-IDF re-ranking with Kendall tau
Query Expansion query_expansion.py Synonym, pseudo-relevance feedback, decomposition
Answer Quality answer_quality.py Multi-axis answer quality scoring
Summarizer summarizer.py Extractive and abstractive document summarization
Document Graph document_graph.py Cross-document entity and relationship graph
Multi-Hop multi_hop.py Multi-hop reasoning across document chains
Conversation Manager conversation_manager.py Multi-turn context tracking and query rewriting
Context Compressor context_compressor.py Token-budget context window compression
Benchmark Runner benchmark_runner.py Automated retrieval and performance benchmarking

Quick Start

git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
pip install -r requirements.txt
make test
make demo

Docker Quick Start

The fastest way to run DocQA Engine with Docker:

# Clone and start
git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
docker-compose up -d

# Open http://localhost:8501

Docker Commands

Command Description
docker-compose up -d Start demo in background
docker-compose down Stop and remove containers
docker-compose logs -f View logs
docker-compose build Rebuild image

Docker Build (Manual)

# Build the image
docker build -t docqa-engine .

# Run the container
docker run -p 8501:8501 -v ./uploads:/app/uploads docqa-engine

# Open http://localhost:8501

With API Keys (Optional)

To enable LLM-powered answer generation:

# Create .env file with your API keys
echo "ANTHROPIC_API_KEY=your_key_here" > .env

# Start with environment variables
docker-compose --env-file .env up -d

Image Size

The optimized multi-stage build produces images under 500MB:

  • Base: Python 3.11 slim (~150MB)
  • Dependencies: scikit-learn, Streamlit, etc. (~200MB)
  • Application: ~50MB

Demo Documents

Document Topic Content
python_guide.md Python Basics Variables, control flow, functions, classes, error handling
machine_learning.md ML Concepts Supervised/unsupervised, regression, classification, neural networks
startup_playbook.md Startup Advice Product-market fit, MVP, fundraising, team building, metrics

Tech Stack

Layer Technology
UI Streamlit (4 tabs)
Embeddings scikit-learn (TF-IDF)
Retrieval BM25 (Okapi) + Dense (cosine) + RRF
Document Parsing PyPDF2, python-docx
Testing pytest, pytest-asyncio (550+ tests)
CI GitHub Actions (Python 3.11, 3.12)
Linting Ruff

Project Structure

docqa-engine/
├── app.py                          # Streamlit application (4 tabs)
├── docqa_engine/
│   ├── ingest.py                   # Document loading + parsing
│   ├── chunking.py                 # Pluggable chunking strategies
│   ├── embedder.py                 # TF-IDF embedding
│   ├── retriever.py                # BM25 + Dense + Hybrid (RRF)
│   ├── answer.py                   # LLM answer generation + citations
│   ├── prompt_lab.py               # Prompt versioning + A/B testing
│   ├── citation_scorer.py          # Citation accuracy scoring
│   ├── evaluator.py                # Retrieval metrics (MRR, NDCG, P@K)
│   ├── batch.py                    # Parallel batch processing
│   ├── exporter.py                 # JSON/CSV export
│   ├── cost_tracker.py             # Token + cost tracking
│   └── pipeline.py                 # End-to-end pipeline
├── demo_docs/                      # 3 sample documents
├── tests/                          # 26 test files, 550+ tests
├── .github/workflows/ci.yml        # CI pipeline
├── Makefile                        # demo, test, lint, setup
└── requirements.txt

Architecture Decisions

ADR Title Status
ADR-0001 Hybrid Retrieval Strategy Accepted
ADR-0002 TF-IDF Local Embeddings Accepted
ADR-0003 Citation Scoring Framework Accepted
ADR-0004 REST API Wrapper Design Accepted

Testing

make test                           # Full suite (550+ tests)
python -m pytest tests/ -v          # Verbose output
python -m pytest tests/test_ingest.py  # Single module

Benchmarks

See BENCHMARKS.md for detailed performance data.

python -m benchmarks.run_all

Changelog

See CHANGELOG.md for release history.

Related Projects

  • EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
  • insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
  • ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
  • scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
  • prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
  • llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
  • Portfolio -- Project showcase and services

Deploy

Open in Streamlit

License

MIT -- see LICENSE for details.

About

RAG document Q&A with prompt engineering lab — upload PDFs/DOCX, get cited answers

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 2

  •  
  •  

Languages