🚀 Knowledge Graph Extraction - Quick Start Guide

📁 Files Created

I've created all the necessary files to run the knowledge graph extraction system locally:

Core Files

.env - Configuration file where you add your Gemini API key
requirements.txt - All Python dependencies
example_usage.py - Complete example showing how to use the system
run.py - Quick start script that runs everything
test_setup.py - Test script to verify your setup

Setup Files

setup.sh - Automated setup script (Unix/Mac)
.gitignore - Git ignore file for common artifacts
README.md - Updated with installation and usage instructions

🛠️ Installation & Setup

Option 1: Automated Setup (Recommended)

# Make setup script executable and run it
chmod +x setup.sh
./setup.sh

Option 2: Manual Setup

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🔑 API Key Configuration

Get a Gemini API key from: https://ai.google.dev/
Edit the .env file and replace your_gemini_api_key_here with your actual key:
```
GEMINI_API_KEY=your_actual_api_key_here
```

🧪 Testing Your Setup

Run the test script to verify everything is working:

python test_setup.py

🚀 Running the System

Quick Start

python run.py

This will automatically:

Extract a knowledge graph from sample Wikipedia articles
Save visualization as knowledge_graph.png
Save data as extracted_knowledge_graph.json

Custom Usage

from example_usage import run_custom_example

# Your own text and entity types
text = "Your custom text here..."
entity_types = ["PERSON", "COMPANY", "LOCATION"]
result = run_custom_example(text, entity_types)

📊 What the System Does

Phase 1: Extract

Splits text into chunks (4096 chars by default)
Uses LLM to extract relation triplets: (subject:type, relation, object:type)
Example: (Mark Zuckerberg:PERSON, founded, Facebook:COMPANY)

Phase 2: Build

Evaluates each extracted relation for consistency
Merges similar entities (e.g., "Mark Zuckerberg" and "Zuckerberg")
Builds coherent knowledge graph avoiding duplicates

🎯 Key Features

Entity Types: Customizable (PERSON, COMPANY, LOCATION, etc.)
Source Linking: Relations linked back to original text passages
Visualization: Automatic graph visualization with NetworkX
Export: JSON format for further processing
Scalable: Works with multiple documents/sources

🔧 Configuration Options

Edit .env file to customize:

# Model settings
DEFAULT_MODEL=gemini-2.0-flash-exp
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Processing settings  
CHUNK_SIZE=4096
CHUNK_OVERLAP=128
MAX_NEW_TOKENS=4096
TEMPERATURE=0.0

🎨 Output Files

After running, you'll get:

knowledge_graph.png - Visual representation of the graph
extracted_knowledge_graph.json - Structured data with all relations
Console output showing extraction progress and summary

🔍 Troubleshooting

Common Issues:

Import Errors: Run pip install -r requirements.txt
API Key Issues: Check .env file and verify key at https://ai.google.dev/
Memory Issues: Try smaller CHUNK_SIZE in .env
Network Issues: Check internet connection for API calls

Getting Help:

Run python test_setup.py to diagnose issues
Check the Colab notebook: https://colab.research.google.com/drive/1st_E7SBEz5GpwCnzGSvKaVUiQuKv3QGZ
Review the blog post for detailed explanations

📚 Next Steps

Start with the basic example: python run.py
Try your own text: Edit example_usage.py
Customize entity types: Modify the entity_types list
Experiment with different models: Try local HuggingFace models
Build GraphRAG applications: Use the extracted graphs for retrieval

⚡ Performance Notes

First run: May take longer due to model downloads
Gemini API: Recommended for best results (as per blog post)
Local models: Possible but may give lower quality results
Processing time: Depends on text length and API response times

🌟 System Architecture

The workflow uses a two-phase approach:

Extract: LLM extracts raw triplets from text chunks
Build: LLM validates and merges triplets into consistent graph

This ensures entity disambiguation and prevents duplicate information while maintaining links to source material.

Happy knowledge graph building! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Knowledge Graph Extraction - Quick Start Guide

📁 Files Created

Core Files

Setup Files

🛠️ Installation & Setup

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

🔑 API Key Configuration

🧪 Testing Your Setup

🚀 Running the System

Quick Start

Custom Usage

📊 What the System Does

Phase 1: Extract

Phase 2: Build

🎯 Key Features

🔧 Configuration Options

🎨 Output Files

🔍 Troubleshooting

Common Issues:

Getting Help:

📚 Next Steps

⚡ Performance Notes

🌟 System Architecture

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

🚀 Knowledge Graph Extraction - Quick Start Guide

📁 Files Created

Core Files

Setup Files

🛠️ Installation & Setup

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

🔑 API Key Configuration

🧪 Testing Your Setup

🚀 Running the System

Quick Start

Custom Usage

📊 What the System Does

Phase 1: Extract

Phase 2: Build

🎯 Key Features

🔧 Configuration Options

🎨 Output Files

🔍 Troubleshooting

Common Issues:

Getting Help:

📚 Next Steps

⚡ Performance Notes

🌟 System Architecture