Chicago Budget RAG

RAG pipeline over:

chicago_Annual_Appropriation_Ordinance_2026.pdf
chicago_Grant_Details_Ordinance_2026.pdf

The system does:

PDF text extraction (pdftotext -layout)
Improved chunking (smaller chunks + section-aware boundaries)
TOC suppression (TOC-like chunks are penalized and optionally filtered)
Hybrid retrieval (BM25 + optional embeddings)
Optional cross-encoder reranking path
Answers with page-level citations
Source links that open exact PDF pages (new tab or embedded viewer panel)
Clickable sample queries in the UI for quick testing
One-click export of query results to Markdown, JSON, or CSV

1) Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Optional cross-encoder reranker dependencies:

pip install -r requirements-reranker.txt

Optional provider setup examples:

OpenAI:

export LLM_PROVIDER=openai
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your_key
export OPENAI_CHAT_MODEL=gpt-4.1-mini
export OPENAI_EMBED_MODEL=text-embedding-3-small

Ollama:

export LLM_PROVIDER=ollama
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_CHAT_MODEL=llama3.2:latest
export OLLAMA_EMBED_MODEL=qwen3-embedding:4b

AWS Bedrock:

export LLM_PROVIDER=bedrock
export EMBEDDING_PROVIDER=bedrock
export AWS_REGION=us-east-1
export BEDROCK_CHAT_MODEL=anthropic.claude-3-5-sonnet-20241022-v2:0
export BEDROCK_EMBED_MODEL=amazon.titan-embed-text-v2:0

2) Build index

Default chunking now uses max_tokens=450 and overlap_tokens=70:

python3 build_index.py --pdf-dir . --index-dir data/index

Override if needed:

python3 build_index.py --max-tokens 400 --overlap-tokens 60

3) Query from CLI

python3 query_rag.py "What is budgeted for the Office of the Mayor?"

Override retrieval blend from CLI:

python3 query_rag.py "What grants mention ARPA?" --bm25-weight 0.9 --vector-weight 0.1

JSON output:

python3 query_rag.py "What grants mention ARPA?" --json

3b) Evaluate and tune retrieval

A starter evaluation set is included at:

eval/questions.sample.json

Run a single evaluation:

python3 eval_rag.py --questions-file eval/questions.sample.json --top-k 8 --show-queries

Run a tuning grid report (best BM25/vector blend):

python3 eval_rag.py --questions-file eval/questions.sample.json --top-k 8 --tune --bm25-grid "0.7,0.8,0.85,0.9,0.95"

JSON output is supported with --json.

4) Run web app

uvicorn app:app --reload --port 8000

Then open http://localhost:8000.

Export query results

After running a query in the UI, use the export buttons to download:

Markdown (.md)
JSON (.json)
CSV (.csv)

You can also call export directly:

curl -L "http://localhost:8000/export?query=What%20grants%20mention%20ARPA%3F&fmt=markdown" -o export.md

fmt supports: markdown, json, csv.

Docker

docker compose up --build

Then open http://localhost:8000.

Notes:

On first start, the container auto-builds data/index/index.json.
Index is stored in a persistent Docker volume (rag_index).
Force rebuild when embedding provider/model changes:

FORCE_REINDEX=1 docker compose up --build

Provider examples in Docker:

LLM_PROVIDER=openai EMBEDDING_PROVIDER=openai OPENAI_API_KEY=your_key docker compose up --build

LLM_PROVIDER=ollama EMBEDDING_PROVIDER=ollama OLLAMA_CHAT_MODEL=llama3.2:latest OLLAMA_EMBED_MODEL=qwen3-embedding:4b docker compose up --build

Ollama install + models

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Pull the two models used in this project:

ollama pull llama3.2:latest
ollama pull qwen3-embedding:4b

Run Docker with those two models:

LLM_PROVIDER=ollama EMBEDDING_PROVIDER=ollama OLLAMA_BASE_URL=http://host.docker.internal:11434 OLLAMA_CHAT_MODEL=llama3.2:latest OLLAMA_EMBED_MODEL=qwen3-embedding:4b docker compose up --build

LLM_PROVIDER=bedrock EMBEDDING_PROVIDER=bedrock AWS_REGION=us-east-1 docker compose up --build

Retrieval tuning flags

Set these as env vars (local or Docker):

RAG_BM25_WEIGHT (default 0.85)
RAG_VECTOR_WEIGHT (default 0.15)
RAG_TOC_PENALTY (default 0.35)
RAG_SUPPRESS_TOC (default true)
RAG_CANDIDATE_MULTIPLIER (default 8)

Reranking controls:

RAG_RERANKER=auto|cross-encoder|none (default auto)
RAG_RERANK_CANDIDATES (default 30)
RAG_RERANKER_MODEL (default cross-encoder/ms-marco-MiniLM-L-6-v2)

If cross-encoder dependencies are not installed, reranking falls back to heuristic ranking.

Rate limiting

The web app now includes an in-memory per-IP rate limiter for POST / by default.

Defaults:

RATE_LIMIT_ENABLED=true
RATE_LIMIT_MAX_REQUESTS=20
RATE_LIMIT_WINDOW_SECONDS=60
RATE_LIMIT_METHOD=POST
RATE_LIMIT_PATH=/
RATE_LIMIT_TRUST_PROXY=true

Example stricter public setting:

RATE_LIMIT_MAX_REQUESTS=10 RATE_LIMIT_WINDOW_SECONDS=60 docker compose up --build

Disable temporarily:

RATE_LIMIT_ENABLED=false docker compose up --build

Notes:

This limiter is process-local memory. If you scale to multiple app instances, use a shared limiter (Redis-based) so limits are enforced consistently.

Site on/off switch

You can disable the site with an env flag and show a temporary disabled page that links to your repo.

SITE_ENABLED=true|false (default true)
SITE_DISABLED_REPO_URL (default https://github.com)

Example:

SITE_ENABLED=false SITE_DISABLED_REPO_URL=https://github.com/your-org/your-repo docker compose up --build

When disabled, the app returns a temporary disabled page (health endpoint remains available).

Ubuntu fix: `docker-compose-plugin` not found

If sudo apt install docker-compose-plugin fails with Unable to locate package, install Docker from Docker's official apt repo:

sudo apt update
sudo apt install -y ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin git

Verify:

docker --version
docker compose version

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
eval		eval
src/chicago_budget_rag		src/chicago_budget_rag
static		static
templates		templates
.dockerignore		.dockerignore
.env.openai.example		.env.openai.example
.gitignore		.gitignore
DEPLOY_AWS_VERCEL_DNS.md		DEPLOY_AWS_VERCEL_DNS.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
build_index.py		build_index.py
chicago_Annual_Appropriation_Ordinance_2026.pdf		chicago_Annual_Appropriation_Ordinance_2026.pdf
chicago_Grant_Details_Ordinance_2026.pdf		chicago_Grant_Details_Ordinance_2026.pdf
docker-compose.yml		docker-compose.yml
eval_rag.py		eval_rag.py
query_rag.py		query_rag.py
requirements-reranker.txt		requirements-reranker.txt
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chicago Budget RAG

1) Setup

2) Build index

3) Query from CLI

3b) Evaluate and tune retrieval

4) Run web app

Export query results

Docker

Ollama install + models

Retrieval tuning flags

Rate limiting

Site on/off switch

Ubuntu fix: `docker-compose-plugin` not found

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chicago Budget RAG

1) Setup

2) Build index

3) Query from CLI

3b) Evaluate and tune retrieval

4) Run web app

Export query results

Docker

Ollama install + models

Retrieval tuning flags

Rate limiting

Site on/off switch

Ubuntu fix: docker-compose-plugin not found

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ubuntu fix: `docker-compose-plugin` not found

Packages