Exploring Graph Tasks with LLMs: A Comprehensive Benchmark and Investigation

1. Environment Setup and Huggingface/Qwen/Deepseek/OpenAI API key

Before you begin, please make sure that you have Anaconda or Miniconda installed on your system. This guide assumes that you have a CUDA-enabled GPU.

# Create and activate a new Conda environment named 'LLM-benchmarking'
conda create -n LLM-benchmarking
conda activate LLM-benchmarking

# If your use a different CUDA version, please refer to the PyTorch and DGL websites for the appropriate versions.
conda install numpy
conda install pytorch==2.0.1 pytorch-cuda=12.2 -c pytorch -c nvidia

# Install additional dependencies
pip install torch_geometric pandas torchmetrics Deprecated

Other required packages include openai, Huggingface, ogb etc, detail configuration can be found in requirements.txt.

You need to set up your Huggingface/Qwen/Deepseek/OpenAI API key in suitable places. We will remind you in the code.

2. Unified Datasets and Raw Text

Datasets

Dataset	Domain	Task	#Graph	#Node	#Edge	#Classes	Metrics	Default feature
Cora	citation	Node, Link	1	2,708	5,429	7	Accuracy	Bag-of-Words
Pubmed	citation	Node, Link	1	19,717	44,338	3	Accuracy	TF-IDF
OGBN-Arxiv	citation	Node, Link	1	169,343	1,166,243	40	Accuracy	Skip-gram
OGBN-Products	e-commerce	Node, Link	1	2,449,029	61,859,140	47	Accuracy	Bag-of-Words

Download Datasets and Raw Text

Cora and PubMed: You can download these datasets from here. After downloading, place the datasets in the following directories:
- /dataset/cora/ for the Cora dataset
- /dataset/pubmed/ for the PubMed dataset
OGBN-Arxiv and OGBN-Product:
- For OGBN-Arxiv and OGBN-Product, the dataset will be automatically downloaded when you run the dataloader. However, you will need to manually download the raw text files:
  - For OGBN-Arxiv, download the file from this link and place it in /dataset/ogbn_arxiv/titleabs.tsv.
  - For OGBN-Product, download the folder from this Google Drive link and place it in /dataset/ogbn-products/Amazon-3M.raw.

Preprocess Datasets

We provide a function get_data_and_text to load and preprocess the datasets, and save them as .pt and .pkl files in the dataset folder.

def get_data_and_text(dataset_name, train_perc=0.6, val_perc=0.2, test_perc=0.2, use_text=True, seed=42):
    # file path
    data_path = f"dataset/{dataset_name}_data.pt"
    text_path = f"dataset/{dataset_name}_text.pkl"

    # check if the files exist
    if os.path.exists(data_path) and os.path.exists(text_path):
        print("Load data and text files...")

        # load data and text
        data = torch.load(data_path)
        with open(text_path, "rb") as f:
            text = pickle.load(f)
    else:
        print("generate data and text files...")
        # regenerate data and text
        data, text = load_data(dataset_name, train_perc, val_perc, test_perc, use_text, seed)

        # save data and text
        torch.save(data, data_path)
        with open(text_path, "wb") as f:
            pickle.dump(text, f)
        print("data and text files saved!")

    return data, text

3. Baseline models

In this paper, we evaluate multiple baseline models and provide detailed descriptions of their implementations as follows. These models were applied to a consistently preprocessed version of the datasets to ensure fair comparisons and produce the experimental results presented in this study.

GNNs: For GCN, GraphSAGE, GAT, and Mixhop, we follow the models on OGB Leaderboards. Specifically, the three models are all from Luo et al. (2024), and the codes can be obtained from here.
Graph SSL Models: We choose GraphCL and GraphMAE in this category. For GraphCL, we follow the implementation from this repository. For GraphMAE, we follow the implementation from this repository.
Graph Transformers: We use Graphormer in this category. The implementation is from this repository.
Foundational Graph Prompt Models: We use Prodigy, OFA, All in one, GPF-plus, GraphPrompt, and ZeroG in this category. Prodigy’s and OFA’s implementations can be obtained from here and here, respectively. For All in one, GPF-plus, and GraphPrompt, we use the implementation from ProG Zhi et al. (2024). For ZeroG, we follow the implementation from this repository.
LM-Augmented Graph Learning Models: For GIANT and TAPE, we follow the implementation from this repository.
LLM with Graph Projectors: LLaGA is chosen for this category. The implementation is from this repository.

We provide some baseline models in baseline folder. The baseline/output_model provides some pretrained models.

4. Quick Start

The Architecture of LLM benchmarking is shown as follows:

Graph encoding

We combine the original graph datasets with their corresponding raw text attributes to encode the graph into a format that LLMs can understand, i.e., prompts. The prompt formats required for node classification and link prediction differ based on the specific task.

cd generate

# generate node classification prompts
python run_generate_node.py

# generate link prediction prompts
python run_generate_link.py

Note: The prompts are generated based on the original graph datasets and their corresponding raw text attributes. The generated prompts are stored in the output folder. We put some examples for Cora

LLMs instruction tuning

We use the generated prompts to train the LLMs to perform node classification and link prediction tasks.

We provide some .sh files in lora-sft folder. The configuration of the training process can be adjusted according to your needs. Make sure you have deepspeed installed. For example, when you want to instruction tune the Llama3B model for node classification for the OGBN-Arxiv dataset, you can run the following command:

bash lora_sft/train_llama_3B_Instruct_arxiv_node.sh

The tuned LoRA weights will be stored in the output_model folder.

LLMs inference

We provide LLMs inference codes in LLM folder.

For Qwen/GPT-4o/Deepseek, you have to provide your API key in the corresponding files.

For Llama, you can run the following command (just for example, you can adjust the args according to your needs):

#Node classification
python Llama_Instruct_node.py --dataset_name cora --model_size 3B --scenario full fine-tune --mode neighbors --hop "1 hop" --huggingface_token your_huggingface_token

#Link prediction
python Llama_Instruct_link.py --dataset_name arxiv --model_size 3B --scenario "2 formats" --case "1 hop" --huggingface_token your_huggingface_token

The results will be stored in the results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LLM		LLM
baseline		baseline
dataset		dataset
few_shot_examples		few_shot_examples
generate		generate
lora_sft		lora_sft
output/cora		output/cora
output_model		output_model
results		results
utils		utils
LICENSE		LICENSE
MIT-LICENSE.txt		MIT-LICENSE.txt
README.md		README.md
big picture.jpg		big picture.jpg
ds_zero2_no_offload.json		ds_zero2_no_offload.json
ds_zero2_offload.json		ds_zero2_offload.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Graph Tasks with LLMs: A Comprehensive Benchmark and Investigation

1. Environment Setup and Huggingface/Qwen/Deepseek/OpenAI API key

2. Unified Datasets and Raw Text

Datasets

Download Datasets and Raw Text

Preprocess Datasets

3. Baseline models

4. Quick Start

Graph encoding

LLMs instruction tuning

LLMs inference

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exploring Graph Tasks with LLMs: A Comprehensive Benchmark and Investigation

1. Environment Setup and Huggingface/Qwen/Deepseek/OpenAI API key

2. Unified Datasets and Raw Text

Datasets

Download Datasets and Raw Text

Preprocess Datasets

3. Baseline models

4. Quick Start

Graph encoding

LLMs instruction tuning

LLMs inference

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages