HelpmateAI is a grounded long-document QA system for PDFs and DOCX files. Upload a policy, thesis, or research paper, ask a question in plain language, and get a readable answer with visible citations and raw supporting evidence.
The current product is a Next.js + FastAPI experience on top of a benchmark-driven Python retrieval core, deployed with a VPS-ready backend path. The system is designed to stay inspectable: retrieval is hybrid, answers are citation-aware, and the supporting passages remain visible instead of being hidden behind a polished summary.
Live landing page: https://helpmateai.xyz
Workspace app: https://app.helpmateai.xyz
- grounded answers instead of generic document chat
- visible citation trail plus raw evidence panels
- structure-aware retrieval for policies, theses, and research papers
- benchmark-driven architecture decisions instead of intuition-only RAG tuning
- a product-facing
Next.jsshell backed by a modular Python core
| Workspace | Answer panel |
|---|---|
![]() |
![]() |
- Upload a PDF or DOCX file.
- Build or reuse the document index.
- Ask a natural-language question.
- Review the answer, citation trail, and raw evidence together.
- On the stabilized
2026-04-19vendor rerun, Helpmate outperformed both external baselines across all four main document families we track: health policy, thesis,pancreas7, andpancreas8. - Averaged across those four families, Helpmate now leads Vectara by
+0.1997faithfulness,+0.1350answer relevancy, and+0.1523context precision, and leads OpenAI File Search by+0.4532,+0.4021, and+0.3697on the sameragasmetrics. - Current answer-quality snapshot versus Vectara: health policy
0.8846 / 0.6378 / 0.8825vs0.7692 / 0.4504 / 0.8235, thesis1.0000 / 0.6031 / 0.8588vs0.8750 / 0.5579 / 0.8035,pancreas70.9444 / 0.6499 / 1.0000vs0.6111 / 0.5009 / 0.7350, andpancreas80.9250 / 0.5527 / 0.9000vs0.7000 / 0.3941 / 0.6700forragasfaithfulness / answer relevancy / context precision. - Internal ablations still justify the current stack: reranker improved answer-layer supported rate from
0.8026to0.8816, improved citation page-hit rate from0.6974to0.8684, and planner plus reranker lifted evidence-fragment recall to0.7364. - The evidence selector is now benchmark-validated in reorder-only mode rather than prune mode. In production, the spread-triggered selector keeps strong answer quality (
0.8816supported-answer rate,0.9534focused-ragasfaithfulness,0.6501answer relevancy,0.9404context precision) without paying the always-on cost on every query.
The repo is no longer a notebook demo. It is a real app-shaped project with:
frontend/as the evolvingNext.jsproduct UIbackend/as the FastAPI boundary over the Python coreDockerfileas the backend deployment imagedeploy/vps/as the primary Docker Compose plus Caddy VPS deployment bundlesrc/for reusable ingestion, retrieval, generation, cache, and shared service logicsrc/structure/,src/query_analysis/,src/sections/, andsrc/query_router.pyfor the document-intelligence and routing layerstests/for focused fast checks around the core logicdocs/for architecture, evaluation policy, roadmap, and history
- Next.js
- FastAPI
- ChromaDB
- optional hosted Chroma-compatible HTTP backend
- optional Supabase-backed state persistence
- OpenAI
- scikit-learn
- sentence-transformers
uvfor project and dependency management



