♻️ LLM-Based Sustainability Assistant for EPD Analysis

This project is an AI-powered system for analyzing and comparing Environmental Product Declarations (EPDs). It integrates structured data, semantic embeddings, and large language models (LLMs) to extract, enrich, and query sustainability-related information.

📌 Project Goals

Ingest and store EPD data from XML and PDF sources
Enrich incomplete product information using LLMs
Enable semantic search and comparison across indicators, materials, and categories
Provide a fast, interactive frontend for analysis and Q&A

✅ MVP (Minimum Viable Product)

📥 Data Collection & Storage

Retrieved EPD data using Selenium, parsed XML.
Stored structured product and indicator data in PostgreSQL.

🧠 Local LLM Experiments

Tested Mistral-7B-Instruct (quantized) locally.
Built a basic React + FastAPI full-stack prototype.
Used all-MiniLM-L6-v2 for initial embeddings.
Switched to Ollama with nous-hermes2 due to poor results and slow inference (60s per request).

🧪 Development Phase

⚡ Inference Optimization

Deployed a prebuilt Ollama container on RunPod using:
- RTX 2000 Ada GPU, 16 GB VRAM, 31 GB RAM, 100 GB disk
Benchmarked multiple LLMs:
- Mistral (7B), LLaMA3 (8B), Gemma, Qwen, Phi-3
Selected Mistral for consistent performance.

🧱 Text Processing & Embedding

Extracted text manually from PDFs and saved as .txt.
Programmatically chunked into 3–4 sentence segments.
Checked chunks for semantic relevance using LLM.
Embedded with bge-small-en-v1.5 (384 dimensions).
Stored in PostgreSQL using pgvector.

🧽 Data Enrichment & Cleanup

Filled missing English descriptions using LLM-based German translation.
Inferred material and use-case from unstructured text.
Generated statistics (mean, min, max, range) for indicators by category for fast retrieval.

🚀 Final System Architecture

💻 Frontend – React

Filters products by:
- Indicator, Category, Material, Use Case
Retrieves:
- Structured product/indicator data from PostgreSQL
- Embedded chunks from EPD XML and theory PDFs (via pgvector)
Sends questions to FastAPI for:
- LLM-based classification
- Routed LLM answers

🧠 Backend – FastAPI

Interfaces with both the database and LLM.
Handles:
- PostgreSQL queries for structured + embedded data
- pgvector semantic retrieval
- LLM prompt generation and routing

🗄️ Database – PostgreSQL + pgvector

Stores:
- Parsed XML product data
- Chunked and embedded EPD text and theory documents
Enables:
- Standard SQL queries
- Vector similarity search for semantic context

🤖 LLM Backend – RunPod/Ollama

Hosts Mistral (7B quantized)
Handles:
- Question classification
- Answer generation using retrieved context
Communicates with FastAPI via prompt/response API

📎 Tech Stack

Component	Tech
Frontend	React
Backend	FastAPI
Database	PostgreSQL + pgvector
Embeddings	`bge-small-en-v1.5`
LLMs	Mistral, LLaMA3, etc. via Ollama on RunPod

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
backend		backend
frontend		frontend
llm_inference		llm_inference
notes		notes
scripts		scripts
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

♻️ LLM-Based Sustainability Assistant for EPD Analysis

📌 Project Goals

✅ MVP (Minimum Viable Product)

📥 Data Collection & Storage

🧠 Local LLM Experiments

🧪 Development Phase

⚡ Inference Optimization

🧱 Text Processing & Embedding

🧽 Data Enrichment & Cleanup

🚀 Final System Architecture

💻 Frontend – React

🧠 Backend – FastAPI

🗄️ Database – PostgreSQL + pgvector

🤖 LLM Backend – RunPod/Ollama

📎 Tech Stack

📂 Folder Structure (Suggested)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

♻️ LLM-Based Sustainability Assistant for EPD Analysis

📌 Project Goals

✅ MVP (Minimum Viable Product)

📥 Data Collection & Storage

🧠 Local LLM Experiments

🧪 Development Phase

⚡ Inference Optimization

🧱 Text Processing & Embedding

🧽 Data Enrichment & Cleanup

🚀 Final System Architecture

💻 Frontend – React

🧠 Backend – FastAPI

🗄️ Database – PostgreSQL + pgvector

🤖 LLM Backend – RunPod/Ollama

📎 Tech Stack

📂 Folder Structure (Suggested)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages