Overview

HeXtractor is a tool designed to automatically convert selected data in tabular format into a PyTorch Geometric heterogeneous graph. As research into graph neural networks (GNNs) expands, the importance of heterogeneous graphs grows. However, data often comes in tabular form, and manually transforming this data into graph format can be tedious and error-prone. HeXtractor aims to streamline this process, providing researchers and practitioners with a more efficient workflow.

This package has been reviewed and published in the Journal of Open Source Software (JOSS). You can find the paper here.

Wójcik et al., (2025). HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks. Journal of Open Source Software, 10(110), 8057, https://doi.org/10.21105/joss.08057

Features

Automatic Conversion: Converts tabular data into heterogeneous graphs suitable for GNNs.
Support for Multiple Formats: Handles various tabular data formats with ease.
Integration with PyTorch Geometric: Directly creates graphs that can be used with PyTorch Geometric.
isualization: Utilizes NetworkX and PyVis for graph visualization.

Why HeXtractor?

Heterogeneous graphs are crucial in many applications of graph neural networks, yet creating them from tabular data manually is often cumbersome. HeXtractor automates this process, allowing researchers to focus on developing and training their models instead of data preprocessing.

Key Applications:

Transform single tabular datasets into heterogeneous graph structures.
Transform multiple tables into a heterogeneous graph.
Leverage Large Language Models (LLMs) to identify and extract semantic relationships from text, converting them into heterogeneous graph representations.

Technologies

Python: The primary programming language used for HeXtractor.
pandas: Utilized for data manipulation and handling tabular data.
PyTorch Geometric: Framework for creating and working with graph neural networks.
NetworkX: Used for creating and managing complex graph structures.
PyVis: Enables interactive visualization of graphs.

Installation

HeXtractor can be installed either from PyPI (recommended for most users) or from source code (recommended for developers or if you need the latest features).

From PyPI

To install the latest version from PyPI run:

pip install hextractor

From Source Code

To install HeXtractor from source, you'll first need to clone the repository:

git clone https://github.com/maddataanalyst/hextractor.git
cd hextractor

You can then install it using either conda or any standard Python virtual environment. We use Poetry as our primary dependency manager because it provides robust dependency resolution, reproducible builds, and better package management.

Option 1: Using Conda

If you prefer Conda for environment management:

# Create a new conda environment from the provided file
conda env create -f environment.yml

# Activate the environment
conda activate hextractor

# Install poetry inside the conda environment
pip install poetry

# Install the package with all dependencies
poetry install --with dev --with research

Option 2: Using Standard Python Virtual Environment

Create and activate a virtual environment using your preferred method:

# Using venv (Python 3.3+)
python -m venv hextractor-env
source hextractor-env/bin/activate  # On Windows: hextractor-env\Scripts\activate

# Or using virtualenv
virtualenv hextractor-env
source hextractor-env/bin/activate  # On Windows: hextractor-env\Scripts\activate

Install Poetry and the package:

# Install poetry
pip install poetry

# Install the package with all dependencies
poetry install --with dev --with research

Remember to activate your environment (conda or virtual environment) whenever you want to use HeXtractor.

Documentation

You can find an official, detailed documentation here.

Contributing and help

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Reporting bugs;
Fixing bugs;
Implementing features;
Writing documentation;
Submitting feedback.

Detailed contribution and community guidelines can be found in the CONTRIBUTING.rst file.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
docs		docs
hextractor		hextractor
notebooks		notebooks
paper_figures		paper_figures
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
mkdocs.yml		mkdocs.yml
paper.bib		paper.bib
paper.md		paper.md
paper.pdf		paper.pdf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Features

Why HeXtractor?

Technologies

Installation

From PyPI

From Source Code

Option 1: Using Conda

Option 2: Using Standard Python Virtual Environment

Documentation

Contributing and help

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

maddataanalyst/hextractor

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Why HeXtractor?

Technologies

Installation

From PyPI

From Source Code

Option 1: Using Conda

Option 2: Using Standard Python Virtual Environment

Documentation

Contributing and help

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages