HyDRA is a framework for generating domain-specific ontologies and knowledge graphs. It employs an AI persona committee-based approach to comprehensively scope domains, generate ontological structures, and create knowledge graphs for various applications including question answering and domain exploration. HyDRA is built entirely on the symbolicai framework. Please support the project by starring the repository.
git clone git@github.com:ExtensityAI/ontology-hydra.git
cd ontology-hydraTo set up the environment, install the Python package manager uv.
Then, create a virtual environment and install the dependencies by running:
uv syncNow, you need to configure your symbolicai config. First, run:
uv run symconfigUpon running this command for the first time, it will start the initial packages caching and initializing the symbolicai configuration files in the uv's .venv directory, ultimately displaying the following warning:
UserWarning: No configuration file found for the environment. A new configuration file has been created at <full-path>/ontology-hydra/.venv/.symai/symai.config.json. Please configure your environment.
You then must edit the symai.config.json file. A neurosymbolic engine is required for the symbolicai framework to be used. More about configuration management here.
Once you've set up the symbolicai config, you can must also installed an additional plugin for the ontopipe package:
uv run sympkg i ExtensityAI/chonkie-symaiNow, you are set up.
You can use the ontopipe API to generate ontologies and knowledge graphs for a specific domain:
from pathlib import Path
from symai import Import, Symbol
from ontopipe import generate_kg, ontopipe
from ontopipe.models import KG, Ontology
# Define the domain and output directory
domain = "fiction"
cache_path = Path("cache")
# Generate ontology
ontology = ontopipe(domain, cache_path=cache_path) # saves to cache_path / 'ontology.json'
# or load from cache
# ontology = Ontology.from_json_file(cache_path / 'ontology.json')
texts = ['...'] # provide your list of texts chunks here
# the chunk length has an impact on the quality of the generated KG
# shorter chunks, denser KG
# longer chunks, sparser KG
# We also provide functionality to easily chunk text appropriately to your needs.
# We built on top of the chonkie library.
# E.g.:
# ex_str = Symbol('this is a test string to generate a knowledge graph')
# ChonkieChunker = Import.load_expression(
# 'ExtensityAI/chonkie-symai',
# 'ChonkieChunker'
# )
# chonkie = ChonkieChunker(tokenizer_name='gpt2')
# texts = chonkie(ex_str, chunk_size=...)
kg = generate_kg(
texts=texts,
ontology=ontology,
cache_path=cache_path,
kg_name='test_kg',
epochs=1 # iterates multiple times over the texts to improve the KG
)
# or load from cache
# kg = KG.from_json_file(cache_path / 'kg.json')
from ontopipe.vis import visualize_kg, visualize_ontology
visualize_ontology(ontology, output_html_path=cache_path / 'ontology_vis.html')
visualize_kg(kg, output_html_path=cache_path / 'kg_vis.html')
