PDF RAG Chatbot System

A simple web app that allows you to chat with your PDF documents using Retrieval Augmented Generation (RAG) technology. Upload any PDF and ask questions about its content to receive accurate, context-aware answers.

🚀 Features

PDF Processing: Upload and process PDF documents of any size
Natural Language Queries: Ask questions in plain English about your document
Contextual Answers: Get answers based on the content of your PDF
Interactive UI: Clean, responsive user interface for direct interaction
Session Management: Multiple document sessions with persistent storage
MongoDB Integration: Store sessions for persistence across server restarts

🛠 Setup Instructions

Prerequisites

Anaconda or Miniconda
Python 11
OpenAI API key
MongoDB (local installation or cloud)

Installation Steps

Clone the repository

git clone https://github.com/andreas-pattichis/PDF-RAG-Chatbot-System.git
cd PDF-RAG-Chatbot-System

Create and activate a conda environment with Python 11

conda create -n pdf-rag-env python=3.11
conda activate pdf-rag-env

Install the required packages

pip install -r requirements.txt

Set up environment variables Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_api_key
MONGO_CONNECTION_STR=mongodb://localhost:27017

Run the application

uvicorn main:app --reload

Access the application: Open your browser and navigate to http://localhost:8000

📖 Usage Guide

Open the application in your browser
Drag and drop a PDF file into the upload area or click to browse
Click the "Upload PDF" button
Wait for processing to complete (larger PDFs take longer)
Type your question in the text box at the bottom
Press Enter or click the send button
Continue asking questions as needed

🧠 How RAG Works

Retrieval Augmented Generation (RAG) combines information retrieval with text generation for more accurate, contextual responses.

RAG Process Flow:

Document Processing Phase:
- PDF Document → Text Extraction → Chunk & Process Text → Create Embeddings → Store in Vector DB
Query Processing Phase:
- User Question → Query Embedding → Retrieve Similar Chunks → Context Augmentation → Answer Generation

🏗 System Architecture

Frontend: HTML/CSS/JS interface for user interaction
FastAPI Backend: Handles HTTP requests and coordinates the RAG workflow
RAG Engine: Processes PDFs, creates embeddings, and generates answers
Vector Database: Stores document embeddings for efficient semantic retrieval
LLM Integration: Connects with OpenAI API for natural language responses
MongoDB: Stores PDF sessions persistently

🗄️ MongoDB Setup

Local MongoDB Setup

Install MongoDB Community Edition from MongoDB Download Center
Use connection string: mongodb://localhost:27017 in your .env file
Install MongoDB Compass (optional) for visual database inspection

MongoDB Atlas (Cloud) Alternative

Create an account at MongoDB Atlas
Create a cluster, set up network access, and create a database user
Add your connection string to the .env file

📁 Project Structure

PDF-RAG-Chatbot-System/
├── app/
│   ├── __init__.py
│   ├── config.py          # Configuration and environment variables
│   ├── db.py              # MongoDB integration
│   ├── rag.py             # RAG implementation
│   └── routes.py          # API endpoints
├── frontend/
│   ├── index.html         # Main HTML file
│   ├── script.js          # Frontend JavaScript
│   └── style.css          # CSS styles
├── .env                   # Environment variables (create this)
├── main.py                # Application entry point
└── requirements.txt       # Python dependencies

🔍 Key Components

Backend Components

PDFRAG Class: Handles text extraction, chunking, embeddings creation, and answer generation
MongoDB Integration: Manages persistent storage of PDF sessions
API Routes: Handles HTTP requests for uploads, chat, and session management

Frontend Components

Clean, responsive UI with drag-and-drop PDF upload and interactive chat interface
JavaScript for AJAX requests, PDF handling, and UI state management

🛡 Technologies Used

Backend: FastAPI, LangChain, OpenAI API, PyPDF2, FAISS, MongoDB
Frontend: HTML5/CSS3, JavaScript, Fetch API

📄 License

This project is licensed under the MIT License.

Created with ❤️ by Andreas Pattichis

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF RAG Chatbot System

🚀 Features

🛠 Setup Instructions

Prerequisites

Installation Steps

📖 Usage Guide

🧠 How RAG Works

RAG Process Flow:

🏗 System Architecture

🗄️ MongoDB Setup

Local MongoDB Setup

MongoDB Atlas (Cloud) Alternative

📁 Project Structure

🔍 Key Components

Backend Components

Frontend Components

🛡 Technologies Used

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Chatbot System

🚀 Features

🛠 Setup Instructions

Prerequisites

Installation Steps

📖 Usage Guide

🧠 How RAG Works

RAG Process Flow:

🏗 System Architecture

🗄️ MongoDB Setup

Local MongoDB Setup

MongoDB Atlas (Cloud) Alternative

📁 Project Structure

🔍 Key Components

Backend Components

Frontend Components

🛡 Technologies Used

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages