Ontology2Graph: A Framework for Synthetic Knowledge Graph Generation Using Large Language Models.

Abstract

Ontology2Graph is a comprehensive Python framework designed to generate synthetic Knowledge Graphs thanks to Large Language Model (LLM). The system provides a complete computational pipeline that transforms ontological schemas into semantically coherent knowledge graphs with integrated quality assurance mechanisms.

Overview

Knowledge Graphs have emerged as fundamental structures for representing complex relationships in semantic data. This framework addresses the challenge of generating synthetic knowledge graphs at scale by leveraging the reasoning capabilities of Large Language Models while ensuring adherence to ontological constraints and semantic consistency.

Ontology2Graph implements a modular architecture that supports:

Automated knowledge graph generation from ontological specifications
Comprehensive graph analysis and key performance indicator computation
Interactive visualization capabilities for structural analysis
Graph merging and consolidation algorithms
Quality control and validation mechanisms

System Architecture

The framework is structured into four primary modules:

Knowledge Graph Generation Module (`generate_ttl_files/`)

The generation module serves as the core component for synthetic knowledge graph creation. It interfaces with various Large Language Models through standardized APIs to transform ontological schemas into RDF-compliant Turtle format graphs.

Key Components:

generate_ttl.py: Primary generation engine implementing LLM-based graph synthesis
utils_gen/utils.py: Supporting utilities for model configuration, prompt management and graph validation
Multi-model support architecture with configurable prompt strategies
Integrated TTL syntax validation & turtle formating pipeline

Visualization and Analysis Module (`display_graphs/`)

This module provides comprehensive analytical capabilities for knowledge graph assessment and interactive visualization generation.

Key Components:

display_graphs.py: Main visualization engine supporting multiple rendering modes
utils/utils_display.py: Analytical utilities for graph metrics computation
Interactive HTML generation using web-based visualization libraries
Key Performance Indicator (KPI) calculation for structural analysis
utils/visu_graph.py: visualization module for basic or advanced graphs representation

Graph Consolidation Module (`merge_ttl_files/`)

The consolidation module implements algorithms for intelligent merging of multiple knowledge graphs while maintaining semantic consistency and managing duplicate entities.

Key Components:

merge_ttl.py: Primary merging algorithm with homonyme nodes detection
utils_merge/utils.py: Specialized utilities for graph unification operations
RDF prefix normalization and namespace management
Post-merge validation and consistency checking

Common Utilities Module (`utils_common/`)

Shared infrastructure components providing fundamental operations across all modules.

Key Components:

utils_common/utils.py: Specialized utilities used by others modules
Argument parsing and configuration management
Logging and monitoring infrastructure
File I/O operations and path management
TTL validation and error handling mechanisms

Installation and Configuration

System Requirements

Python 3.8 or higher
External TTL validation tool
External TTL formating tool
Network connectivity for LLM API access

Environment Setup

Virtual Environment Creation:

python3 -m venv venv
source venv/bin/activate

Dependency Installation: Python 3.7+ Required packages: requests, rdflib, openai, networkx, owlready2; pyvis. Install them manually:
```
python3 -m pip install requests rdflib openai networkx owlready2 pyvis
```
or
```
python3 -m pip install -r requirements.txt
```
External Tool Installation:
- Install the Turtle Validator for syntax validation.
- Install the Ontology engineering tool for Turtle format rearrangement.
API Configuration: Configure LLM access credentials:
```
export LLM_API_KEY="your_api_key"
```

Usage Documentation

Knowledge Graph Generation

The generation process requires proper configuration of model parameters and prompt strategies:

cd generate_ttl_files
python generate_ttl.py --nbrttl <number_of_graphs> --reasoner <reasoner>

Parameters:

number_of_graph: The number of graphs you want to generate.
reasoner : The reasonner to use for consistency checking (Pellet or HermiT)

Configuration Parameters in generate_ttl_files:

model_nbr: Specifies the target LLM (reference: utils_gen/models/models.json)
prompt_type: Defines the generation strategy (reference: utils_gen/prompts/prompts.json)

Each TTL file generated is automatically checking by Ontology engineering tool for Turtle format rearrangement and Turtle Validator for syntax validation. Validated files are stored in the results/synthetic_graphs/ directory in TTL format.

Graph Consolidation

Multiple knowledge graphs can be merged using the consolidation module:

cd merge_ttl_files
python merge_ttl.py --path_file <source_directory> --ontology <ontology_file>

The consolidation process generates several merged graphs that differ from each other in terms of their number of nodes and therefore their density. This process is based on the recognition of homonymous nodes, in the set of graphes generated by the LLM, and their sequential renaming from a merged graph where no homonymous are renamed to the renaming of all homonymous corresponding to the juxtaposition of graphs generated by the LLM.

Merged files are automatically checking by Turtle Validator for syntax validation. Each validated file is stored in a "merged" folder beside the LLM generated graphs.

Graph Visualization and Analysis

The visualization module supports both basic structural rendering and advanced analytical visualization:

cd display_graphs
python display_graphs.py --path <input_path> --ontology <ontology> --mode <visualization_mode>

Parameters:

input_path: Path to individual TTL files or directories containing multiple graphs
ontology: Reference ontology file path
visualization_mode: Either basic for structural visualization or advanced for analytical rendering

Basic view, click on it to expand:

Advanced view, click on it to expand

Monitoring and Logging

The system generates comprehensive logs containing:

Generation timestamps and model identifiers
Token utilization metrics
Validation results and error reports
Performance indicators

Core Python Dependencies

rdflib: RDF graph processing and SPARQL query execution
networkx: Graph algorithmic analysis and structural computation
pyvis: Interactive web-based graph visualization
openai: Large Language Model API integration
owlready2: Ontology-oriented programming packages
requests: Core HTTp library for python, used for making API calls

Supported Formats

Input: TTL (Turtle) ontology files, JSON configuration files
Output: TTL knowledge graphs, HTML visualizations
Logging: Structured text logs with configurable verbosity levels

Quality Assurance

The framework implements multiple validation layers:

Syntactic TTL validation using external tools
Semantic consistency checking against source ontologies
Structural analysis for graph coherence assessment
Automated error detection and quarantine mechanisms

Performance Considerations

Scalability Factors

Graph size limitations dependent on available system memory
LLM API rate limiting considerations
Parallel processing support for visualization rendering

Research Applications

This framework supports various research applications in:

Semantic data augmentation for machine learning datasets
Ontology validation and consistency testing
Knowledge graph structural analysis and comparison
Synthetic data generation for privacy-preserving research

References and Documentation

See the official documentation: for details
Turtle Validator: External validation tools
Ontology engineering tool for Turtle format rearrangement.
RDFLib Documentation: RDF processing library reference

For additional documentation and contribution guidelines, refer to CONTRIBUTING.md.

License and Attribution

This software is distributed under the BSD 4-Clause License; refer to the LICENSE file for complete terms and conditions. Additionally, see the CONTRIBUTORS file for the list of maintainers and contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 376 Commits
.github/workflows		.github/workflows
display_graphs		display_graphs
docs		docs
generate_ttl_files		generate_ttl_files
merge_ttl_files		merge_ttl_files
results/synthetics_graphs		results/synthetics_graphs
utils_common		utils_common
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.txt		CONTRIBUTORS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ontology2Graph: A Framework for Synthetic Knowledge Graph Generation Using Large Language Models.

Abstract

Overview

System Architecture

Knowledge Graph Generation Module (`generate_ttl_files/`)

Visualization and Analysis Module (`display_graphs/`)

Graph Consolidation Module (`merge_ttl_files/`)

Common Utilities Module (`utils_common/`)

Installation and Configuration

System Requirements

Environment Setup

Usage Documentation

Knowledge Graph Generation

Graph Consolidation

Graph Visualization and Analysis

Monitoring and Logging

Core Python Dependencies

Supported Formats

Quality Assurance

Performance Considerations

Scalability Factors

Research Applications

References and Documentation

License and Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ontology2Graph: A Framework for Synthetic Knowledge Graph Generation Using Large Language Models.

Abstract

Overview

System Architecture

Knowledge Graph Generation Module (generate_ttl_files/)

Visualization and Analysis Module (display_graphs/)

Graph Consolidation Module (merge_ttl_files/)

Common Utilities Module (utils_common/)

Installation and Configuration

System Requirements

Environment Setup

Usage Documentation

Knowledge Graph Generation

Graph Consolidation

Graph Visualization and Analysis

Monitoring and Logging

Core Python Dependencies

Supported Formats

Quality Assurance

Performance Considerations

Scalability Factors

Research Applications

References and Documentation

License and Attribution

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Knowledge Graph Generation Module (`generate_ttl_files/`)

Visualization and Analysis Module (`display_graphs/`)

Graph Consolidation Module (`merge_ttl_files/`)

Common Utilities Module (`utils_common/`)

Packages