A simple web app that allows you to chat with your PDF documents using Retrieval Augmented Generation (RAG) technology. Upload any PDF and ask questions about its content to receive accurate, context-aware answers.
- PDF Processing: Upload and process PDF documents of any size
- Natural Language Queries: Ask questions in plain English about your document
- Contextual Answers: Get answers based on the content of your PDF
- Interactive UI: Clean, responsive user interface for direct interaction
- Session Management: Multiple document sessions with persistent storage
- MongoDB Integration: Store sessions for persistence across server restarts
- Anaconda or Miniconda
- Python 11
- OpenAI API key
- MongoDB (local installation or cloud)
- Clone the repository
git clone https://github.com/andreas-pattichis/PDF-RAG-Chatbot-System.git
cd PDF-RAG-Chatbot-System- Create and activate a conda environment with Python 11
conda create -n pdf-rag-env python=3.11
conda activate pdf-rag-env- Install the required packages
pip install -r requirements.txt- Set up environment variables
Create a
.envfile in the root directory:
OPENAI_API_KEY=your_openai_api_key
MONGO_CONNECTION_STR=mongodb://localhost:27017
- Run the application
uvicorn main:app --reload- Access the application: Open your browser and navigate to
http://localhost:8000
- Open the application in your browser
- Drag and drop a PDF file into the upload area or click to browse
- Click the "Upload PDF" button
- Wait for processing to complete (larger PDFs take longer)
- Type your question in the text box at the bottom
- Press Enter or click the send button
- Continue asking questions as needed
Retrieval Augmented Generation (RAG) combines information retrieval with text generation for more accurate, contextual responses.
-
Document Processing Phase:
- PDF Document → Text Extraction → Chunk & Process Text → Create Embeddings → Store in Vector DB
-
Query Processing Phase:
- User Question → Query Embedding → Retrieve Similar Chunks → Context Augmentation → Answer Generation
- Frontend: HTML/CSS/JS interface for user interaction
- FastAPI Backend: Handles HTTP requests and coordinates the RAG workflow
- RAG Engine: Processes PDFs, creates embeddings, and generates answers
- Vector Database: Stores document embeddings for efficient semantic retrieval
- LLM Integration: Connects with OpenAI API for natural language responses
- MongoDB: Stores PDF sessions persistently
- Install MongoDB Community Edition from MongoDB Download Center
- Use connection string:
mongodb://localhost:27017in your.envfile - Install MongoDB Compass (optional) for visual database inspection
- Create an account at MongoDB Atlas
- Create a cluster, set up network access, and create a database user
- Add your connection string to the
.envfile
PDF-RAG-Chatbot-System/
├── app/
│ ├── __init__.py
│ ├── config.py # Configuration and environment variables
│ ├── db.py # MongoDB integration
│ ├── rag.py # RAG implementation
│ └── routes.py # API endpoints
├── frontend/
│ ├── index.html # Main HTML file
│ ├── script.js # Frontend JavaScript
│ └── style.css # CSS styles
├── .env # Environment variables (create this)
├── main.py # Application entry point
└── requirements.txt # Python dependencies
- PDFRAG Class: Handles text extraction, chunking, embeddings creation, and answer generation
- MongoDB Integration: Manages persistent storage of PDF sessions
- API Routes: Handles HTTP requests for uploads, chat, and session management
- Clean, responsive UI with drag-and-drop PDF upload and interactive chat interface
- JavaScript for AJAX requests, PDF handling, and UI state management
- Backend: FastAPI, LangChain, OpenAI API, PyPDF2, FAISS, MongoDB
- Frontend: HTML5/CSS3, JavaScript, Fetch API
This project is licensed under the MIT License.
Created with ❤️ by Andreas Pattichis