docqa-engine

Upload documents, ask questions -- get cited answers with a prompt engineering lab.

Live Demo -- try it without installing anything.

Demo Snapshot

What This Solves

RAG pipeline from upload to answer -- Ingest documents (PDF, DOCX, TXT, MD, CSV), chunk them with pluggable strategies, embed with TF-IDF, and retrieve using BM25 + dense hybrid search with Reciprocal Rank Fusion
Prompt engineering lab for A/B testing -- Create prompt templates, run the same question through different strategies side-by-side, compare outputs
Citation accuracy matters -- Faithfulness, coverage, and redundancy scoring for every generated citation

Service Mapping

Service 3: Custom RAG Conversational Agents
Service 5: Prompt Engineering and System Optimization

Certification Mapping

IBM Generative AI Engineering with PyTorch, LangChain & Hugging Face
IBM RAG and Agentic AI Professional Certificate
Vanderbilt ChatGPT Personal Automation
Duke University LLMOps Specialization

Architecture

flowchart TB
    Upload["Document Upload\n(PDF, DOCX, TXT, MD, CSV)"]
    Chunk["Chunking Engine\n(semantic, fixed, sliding window)"]
    Embed["Embedding Layer\n(TF-IDF, BM25, Dense)"]
    VStore["Vector Store\n(FAISS / in-memory)"]
    Hybrid["Hybrid Retrieval\n(BM25 + Dense + RRF fusion)"]
    Rerank["Cross-Encoder Re-Ranker"]
    QExpand["Query Expansion\n(synonym, PRF, decompose)"]
    Citation["Citation Scoring\n(faithfulness, coverage, redundancy)"]
    Answer["Answer Generation"]
    Convo["Conversation Manager\n(multi-turn context)"]
    API["REST API\n(JWT auth, rate limiting, metering)"]
    UI["Streamlit Demo UI\n(4-tab interface)"]

    Upload --> Chunk --> Embed --> VStore
    QExpand --> Hybrid
    VStore --> Hybrid --> Rerank --> Answer
    Answer --> Citation
    Answer --> Convo
    API --> Answer
    UI --> API

Key Metrics

Metric	Value
Test Suite	550+ automated tests
Retrieval Accuracy	Hybrid > BM25-only by 15-25%
Re-Ranking Boost	+8-12% relevance improvement
Query Latency	<100ms for 10K document corpus
Citation Accuracy	Faithfulness + coverage scoring
API Rate Limit	Configurable per-user metering

Modules

Module	File	Description
Ingest	`ingest.py`	Multi-format document loading (PDF, DOCX, TXT, MD, CSV)
Chunking	`chunking.py`	Pluggable chunking strategies: fixed-size, sentence-boundary, semantic
Embedder	`embedder.py`	TF-IDF embedding (5,000 features, no external API calls)
Retriever	`retriever.py`	BM25 + dense cosine + hybrid RRF fusion
Answer	`answer.py`	Context-aware answer generation with source citations
Prompt Lab	`prompt_lab.py`	Prompt versioning and A/B comparison framework
Citation Scorer	`citation_scorer.py`	Citation faithfulness, coverage, and redundancy scoring
Evaluator	`evaluator.py`	Retrieval metrics: MRR, NDCG@K, Precision@K, Recall@K, Hit Rate
Batch	`batch.py`	Parallel batch ingestion and query processing
Exporter	`exporter.py`	JSON/CSV export for results and metrics
Cost Tracker	`cost_tracker.py`	Per-query token and cost tracking
Pipeline	`pipeline.py`	End-to-end DocQAPipeline class
REST API	`api.py`	FastAPI wrapper with JWT auth, rate limiting, metering
Vector Store	`vector_store.py`	Pluggable vector store backends (FAISS, in-memory)
Re-Ranker	`reranker.py`	Cross-encoder TF-IDF re-ranking with Kendall tau
Query Expansion	`query_expansion.py`	Synonym, pseudo-relevance feedback, decomposition
Answer Quality	`answer_quality.py`	Multi-axis answer quality scoring
Summarizer	`summarizer.py`	Extractive and abstractive document summarization
Document Graph	`document_graph.py`	Cross-document entity and relationship graph
Multi-Hop	`multi_hop.py`	Multi-hop reasoning across document chains
Conversation Manager	`conversation_manager.py`	Multi-turn context tracking and query rewriting
Context Compressor	`context_compressor.py`	Token-budget context window compression
Benchmark Runner	`benchmark_runner.py`	Automated retrieval and performance benchmarking

Quick Start

git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
pip install -r requirements.txt
make test
make demo

Docker Quick Start

The fastest way to run DocQA Engine with Docker:

# Clone and start
git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
docker-compose up -d

# Open http://localhost:8501

Docker Commands

Command	Description
`docker-compose up -d`	Start demo in background
`docker-compose down`	Stop and remove containers
`docker-compose logs -f`	View logs
`docker-compose build`	Rebuild image

Docker Build (Manual)

# Build the image
docker build -t docqa-engine .

# Run the container
docker run -p 8501:8501 -v ./uploads:/app/uploads docqa-engine

# Open http://localhost:8501

With API Keys (Optional)

To enable LLM-powered answer generation:

# Create .env file with your API keys
echo "ANTHROPIC_API_KEY=your_key_here" > .env

# Start with environment variables
docker-compose --env-file .env up -d

Image Size

The optimized multi-stage build produces images under 500MB:

Base: Python 3.11 slim (~150MB)
Dependencies: scikit-learn, Streamlit, etc. (~200MB)
Application: ~50MB

Demo Documents

Document	Topic	Content
`python_guide.md`	Python Basics	Variables, control flow, functions, classes, error handling
`machine_learning.md`	ML Concepts	Supervised/unsupervised, regression, classification, neural networks
`startup_playbook.md`	Startup Advice	Product-market fit, MVP, fundraising, team building, metrics

Tech Stack

Layer	Technology
UI	Streamlit (4 tabs)
Embeddings	scikit-learn (TF-IDF)
Retrieval	BM25 (Okapi) + Dense (cosine) + RRF
Document Parsing	PyPDF2, python-docx
Testing	pytest, pytest-asyncio (550+ tests)
CI	GitHub Actions (Python 3.11, 3.12)
Linting	Ruff

Project Structure

docqa-engine/
├── app.py                          # Streamlit application (4 tabs)
├── docqa_engine/
│   ├── ingest.py                   # Document loading + parsing
│   ├── chunking.py                 # Pluggable chunking strategies
│   ├── embedder.py                 # TF-IDF embedding
│   ├── retriever.py                # BM25 + Dense + Hybrid (RRF)
│   ├── answer.py                   # LLM answer generation + citations
│   ├── prompt_lab.py               # Prompt versioning + A/B testing
│   ├── citation_scorer.py          # Citation accuracy scoring
│   ├── evaluator.py                # Retrieval metrics (MRR, NDCG, P@K)
│   ├── batch.py                    # Parallel batch processing
│   ├── exporter.py                 # JSON/CSV export
│   ├── cost_tracker.py             # Token + cost tracking
│   └── pipeline.py                 # End-to-end pipeline
├── demo_docs/                      # 3 sample documents
├── tests/                          # 26 test files, 550+ tests
├── .github/workflows/ci.yml        # CI pipeline
├── Makefile                        # demo, test, lint, setup
└── requirements.txt

Architecture Decisions

ADR	Title	Status
ADR-0001	Hybrid Retrieval Strategy	Accepted
ADR-0002	TF-IDF Local Embeddings	Accepted
ADR-0003	Citation Scoring Framework	Accepted
ADR-0004	REST API Wrapper Design	Accepted

Testing

make test                           # Full suite (550+ tests)
python -m pytest tests/ -v          # Verbose output
python -m pytest tests/test_ingest.py  # Single module

Benchmarks

See BENCHMARKS.md for detailed performance data.

python -m benchmarks.run_all

Changelog

See CHANGELOG.md for release history.

Related Projects

EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
Portfolio -- Project showcase and services

Deploy

License

MIT -- see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
.streamlit		.streamlit
assets		assets
benchmarks		benchmarks
demo_data		demo_data
demo_docs		demo_docs
docqa_engine		docqa_engine
docs/adr		docs/adr
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CUSTOMIZATION.md		CUSTOMIZATION.md
DEMO_MODE.md		DEMO_MODE.md
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

docqa-engine

Demo Snapshot

What This Solves

Service Mapping

Certification Mapping

Architecture

Key Metrics

Modules

Quick Start

Docker Quick Start

Docker Commands

Docker Build (Manual)

With API Keys (Optional)

Image Size

Demo Documents

Tech Stack

Project Structure

Architecture Decisions

Testing

Benchmarks

Changelog

Related Projects

Deploy

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

ChunkyTortoise/docqa-engine

Folders and files

Latest commit

History

Repository files navigation

docqa-engine

Demo Snapshot

What This Solves

Service Mapping

Certification Mapping

Architecture

Key Metrics

Modules

Quick Start

Docker Quick Start

Docker Commands

Docker Build (Manual)

With API Keys (Optional)

Image Size

Demo Documents

Tech Stack

Project Structure

Architecture Decisions

Testing

Benchmarks

Changelog

Related Projects

Deploy

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages