Upload documents, ask questions -- get cited answers with a prompt engineering lab.
Live Demo -- try it without installing anything.
- RAG pipeline from upload to answer -- Ingest documents (PDF, DOCX, TXT, MD, CSV), chunk them with pluggable strategies, embed with TF-IDF, and retrieve using BM25 + dense hybrid search with Reciprocal Rank Fusion
- Prompt engineering lab for A/B testing -- Create prompt templates, run the same question through different strategies side-by-side, compare outputs
- Citation accuracy matters -- Faithfulness, coverage, and redundancy scoring for every generated citation
- Service 3: Custom RAG Conversational Agents
- Service 5: Prompt Engineering and System Optimization
- IBM Generative AI Engineering with PyTorch, LangChain & Hugging Face
- IBM RAG and Agentic AI Professional Certificate
- Vanderbilt ChatGPT Personal Automation
- Duke University LLMOps Specialization
flowchart TB
Upload["Document Upload\n(PDF, DOCX, TXT, MD, CSV)"]
Chunk["Chunking Engine\n(semantic, fixed, sliding window)"]
Embed["Embedding Layer\n(TF-IDF, BM25, Dense)"]
VStore["Vector Store\n(FAISS / in-memory)"]
Hybrid["Hybrid Retrieval\n(BM25 + Dense + RRF fusion)"]
Rerank["Cross-Encoder Re-Ranker"]
QExpand["Query Expansion\n(synonym, PRF, decompose)"]
Citation["Citation Scoring\n(faithfulness, coverage, redundancy)"]
Answer["Answer Generation"]
Convo["Conversation Manager\n(multi-turn context)"]
API["REST API\n(JWT auth, rate limiting, metering)"]
UI["Streamlit Demo UI\n(4-tab interface)"]
Upload --> Chunk --> Embed --> VStore
QExpand --> Hybrid
VStore --> Hybrid --> Rerank --> Answer
Answer --> Citation
Answer --> Convo
API --> Answer
UI --> API
| Metric | Value |
|---|---|
| Test Suite | 550+ automated tests |
| Retrieval Accuracy | Hybrid > BM25-only by 15-25% |
| Re-Ranking Boost | +8-12% relevance improvement |
| Query Latency | <100ms for 10K document corpus |
| Citation Accuracy | Faithfulness + coverage scoring |
| API Rate Limit | Configurable per-user metering |
| Module | File | Description |
|---|---|---|
| Ingest | ingest.py |
Multi-format document loading (PDF, DOCX, TXT, MD, CSV) |
| Chunking | chunking.py |
Pluggable chunking strategies: fixed-size, sentence-boundary, semantic |
| Embedder | embedder.py |
TF-IDF embedding (5,000 features, no external API calls) |
| Retriever | retriever.py |
BM25 + dense cosine + hybrid RRF fusion |
| Answer | answer.py |
Context-aware answer generation with source citations |
| Prompt Lab | prompt_lab.py |
Prompt versioning and A/B comparison framework |
| Citation Scorer | citation_scorer.py |
Citation faithfulness, coverage, and redundancy scoring |
| Evaluator | evaluator.py |
Retrieval metrics: MRR, NDCG@K, Precision@K, Recall@K, Hit Rate |
| Batch | batch.py |
Parallel batch ingestion and query processing |
| Exporter | exporter.py |
JSON/CSV export for results and metrics |
| Cost Tracker | cost_tracker.py |
Per-query token and cost tracking |
| Pipeline | pipeline.py |
End-to-end DocQAPipeline class |
| REST API | api.py |
FastAPI wrapper with JWT auth, rate limiting, metering |
| Vector Store | vector_store.py |
Pluggable vector store backends (FAISS, in-memory) |
| Re-Ranker | reranker.py |
Cross-encoder TF-IDF re-ranking with Kendall tau |
| Query Expansion | query_expansion.py |
Synonym, pseudo-relevance feedback, decomposition |
| Answer Quality | answer_quality.py |
Multi-axis answer quality scoring |
| Summarizer | summarizer.py |
Extractive and abstractive document summarization |
| Document Graph | document_graph.py |
Cross-document entity and relationship graph |
| Multi-Hop | multi_hop.py |
Multi-hop reasoning across document chains |
| Conversation Manager | conversation_manager.py |
Multi-turn context tracking and query rewriting |
| Context Compressor | context_compressor.py |
Token-budget context window compression |
| Benchmark Runner | benchmark_runner.py |
Automated retrieval and performance benchmarking |
git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
pip install -r requirements.txt
make test
make demoThe fastest way to run DocQA Engine with Docker:
# Clone and start
git clone https://github.com/ChunkyTortoise/docqa-engine.git
cd docqa-engine
docker-compose up -d
# Open http://localhost:8501| Command | Description |
|---|---|
docker-compose up -d |
Start demo in background |
docker-compose down |
Stop and remove containers |
docker-compose logs -f |
View logs |
docker-compose build |
Rebuild image |
# Build the image
docker build -t docqa-engine .
# Run the container
docker run -p 8501:8501 -v ./uploads:/app/uploads docqa-engine
# Open http://localhost:8501To enable LLM-powered answer generation:
# Create .env file with your API keys
echo "ANTHROPIC_API_KEY=your_key_here" > .env
# Start with environment variables
docker-compose --env-file .env up -dThe optimized multi-stage build produces images under 500MB:
- Base: Python 3.11 slim (~150MB)
- Dependencies: scikit-learn, Streamlit, etc. (~200MB)
- Application: ~50MB
| Document | Topic | Content |
|---|---|---|
python_guide.md |
Python Basics | Variables, control flow, functions, classes, error handling |
machine_learning.md |
ML Concepts | Supervised/unsupervised, regression, classification, neural networks |
startup_playbook.md |
Startup Advice | Product-market fit, MVP, fundraising, team building, metrics |
| Layer | Technology |
|---|---|
| UI | Streamlit (4 tabs) |
| Embeddings | scikit-learn (TF-IDF) |
| Retrieval | BM25 (Okapi) + Dense (cosine) + RRF |
| Document Parsing | PyPDF2, python-docx |
| Testing | pytest, pytest-asyncio (550+ tests) |
| CI | GitHub Actions (Python 3.11, 3.12) |
| Linting | Ruff |
docqa-engine/
├── app.py # Streamlit application (4 tabs)
├── docqa_engine/
│ ├── ingest.py # Document loading + parsing
│ ├── chunking.py # Pluggable chunking strategies
│ ├── embedder.py # TF-IDF embedding
│ ├── retriever.py # BM25 + Dense + Hybrid (RRF)
│ ├── answer.py # LLM answer generation + citations
│ ├── prompt_lab.py # Prompt versioning + A/B testing
│ ├── citation_scorer.py # Citation accuracy scoring
│ ├── evaluator.py # Retrieval metrics (MRR, NDCG, P@K)
│ ├── batch.py # Parallel batch processing
│ ├── exporter.py # JSON/CSV export
│ ├── cost_tracker.py # Token + cost tracking
│ └── pipeline.py # End-to-end pipeline
├── demo_docs/ # 3 sample documents
├── tests/ # 26 test files, 550+ tests
├── .github/workflows/ci.yml # CI pipeline
├── Makefile # demo, test, lint, setup
└── requirements.txt
| ADR | Title | Status |
|---|---|---|
| ADR-0001 | Hybrid Retrieval Strategy | Accepted |
| ADR-0002 | TF-IDF Local Embeddings | Accepted |
| ADR-0003 | Citation Scoring Framework | Accepted |
| ADR-0004 | REST API Wrapper Design | Accepted |
make test # Full suite (550+ tests)
python -m pytest tests/ -v # Verbose output
python -m pytest tests/test_ingest.py # Single moduleSee BENCHMARKS.md for detailed performance data.
python -m benchmarks.run_allSee CHANGELOG.md for release history.
- EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
- insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
- ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
- scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
- prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
- llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
- Portfolio -- Project showcase and services
MIT -- see LICENSE for details.
