Skip to content

Graphormer-IR is an extension of the Graphormer framework, specifically to perform predictions of infrared spectra using only chemical structure

Notifications You must be signed in to change notification settings

HopkinsLaboratory/Graphormer-IR

Repository files navigation

New Updates (April 2025) Graphormer-IR(IS)

  • Updated to include Graphormer-IRIS code from recent publication (Merged Graphormer-IR and IRIS branches);
  • Added command line tools for freezing encoder layers, MLP layers, and the graph feature encoder;
  • Fixed issues with relative paths, pickle files not uploaded;
  • Added docker installation route and guide;
  • Made it easier to save data by adding '--save-path' flag for evaluation script.

General

Graphormer-IR(IS) is an extension to the Graphormer package, with documentation, and the original code on Github with additional usage examples. If you use this code, please cite our paper and the original Graphormer work:

@article{Stienstra2025, author = {Cailum M. K. Stienstra and Teun van Wieringen and Liam Hebert and Patrick Thomas and Kas J. Houthuijs and Giel Berden and Jos Oomens and Jonathan Martens and W. Scott Hopkins}, doi = {10.1021/ACS.JCIM.4C02329}, issn = {1549-9596}, journal = {Journal of Chemical Information and Modeling}, month = {2}, publisher = {American Chemical Society}, title = {A Machine-Learned “Chemical Intuition” to Overcome Spectroscopic Data Scarcity}, url = {https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c02329}, year = {2025} }

@article{Stienstra2024, author = {Cailum M. K. Stienstra and Liam Hebert and Patrick Thomas and Alexander Haack and Jason Guo and W. Scott Hopkins}, doi = {10.1021/ACS.JCIM.4C00378}, issn = {1549-9596}, journal = {Journal of Chemical Information and Modeling}, month = {6}, publisher = {American Chemical Society}, title = {Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention}, url = {https://pubs.acs.org/doi/abs/10.1021/acs.jcim.4c00378}, year = {2024}, }

@inproceedings{ ying2021, title={Do Transformers Really Perform Badly for Graph Representation?}, author={Chengxuan Ying and Tianle Cai and Shengjie Luo and Shuxin Zheng and Guolin Ke and Di He and Yanming Shen and Tie-Yan Liu}, booktitle={Thirty-Fifth Conference on Neural Information Processing Systems}, year={2021}, url={https://openreview.net/forum?id=OeWooOxFwDa} }

Installation

Docker [April 2025]

We have developed a Docker Image to make installation and management of environments easier for Graphormer-IR. Installation Instructions are as follows

📦 How to Install and Run Graphormer-IR Using Docker Image

  1. Install the following software (if not already installed):

You can verify installation via the following commands:

docker --version
nvidia-smi
nvidia-container-cli --version
  1. Save the Dockerfile (the name should be “Dockerfile”).
  2. Open a terminal in the same folder as Dockerfile.
  3. Build the Docker image by running:
docker build --no-cache -t graphormer-ir .
  1. Run the Docker container with GPU support:
docker run -it --gpus all graphormer-ir bash
  1. Inside the container, navigate to the example directory, make the example script executable, and run the example script:
cd /workspace/Graphormer-IR/examples/property_prediction
chmod +x IRspec.sh  
./IRspec.sh  
  1. If it runs for an epoch and saves .pt files, you know you’ve succeeded.

A beginner's guide to Docker usage can be found HERE

  • To Upload files (e.g., new data) to the docker container, use:
docker cp ./local_file.txt container_id:/app/local_file.txt
  • To Download files (checkpoints, results) from this container, use:
docker cp <container_id>:<path_inside_container> <path_on_host> 

Old Instructions [Before April 2025]

We highly recommend following the installation guide, though we will suggest a few additional notes to make things easier:

  • Install fairseq directly from the Github repository, "pip install -e /path/to/folder" Make sure that you're using an old enough version that's compatible with Graphormer
  • Make sure that you're using an old enough version of PyTorch Geometric and the DGL libraries (there's a lookup table for compatibility on their website). These are the things that we found broke the most frequently, and the errors you get don't always tell you that it's these packages. If there are problems inheriting abstract data classes, just modify the class methods to include whatever class methods (e.g., "__len__"), in your install and it should work.
  • Refer to "requirement.txt" if you have any problems with version compatability.
  • Ensure that your CUDA and pytorch geometric versions are compatabile.

Data

Large collections of infrared spectra are owned by private organizations across a variety of domains, and no unified “machine learning ready” data set is available. As such, it was necessary to obtain, clean, and process a library of IR spectra from several different domains. IR spectra were obtained from three online sources: the National Institute of Advanced Industrial Science and Technology (AIST), the National Institute of Standards and Technology (NIST), and the Coblentz Society. Complete data access statements can be found in our original publication in the Supporting Information for the Graphormer-IR manuscript.

Since we are unable to provide this data, we instead provide sample data in /scripts/sample_data/ and indices that interface with our code and to provide an approximate template for evaluation.

Usage

This repository contains the code you need to reproduce the work in our recent publications. Most of our usage is identical to that found in the original Graphormer paper.

  • We have included dataloaders for IR, IRIS, and DFT spectra found in examples/property prediciton with bash scripts to run training. Here you can tune model hyperparameters, finetune pre-trainined models (while freezing layers), and change your data source

  • Our learned graph node feature encoder is found in /graphormer/modules/graphormer_layers.py. If you change the number/shape of input node features you will have to edit this code as well.

  • The model itself is in /graphormer/models/graphormer.py. Most hyperparameters can be tuned from the bash scripts

  • Once you have a trained model, evaluation occurs at /graphormer/evaluate/evaluate.sh. Make sure your model hyperparameteres match those used in training.

  • Additional data manipulation scripts (baseline correction, etc.) can be found in /scripts/

  • Sample data for training IR and DFT models are found in /sample_data/. IRIS spectra were not released in this study because of conflicts with other publications. The model weights trained on this data are available at Zenodo (see below)

Model Evaluation [May 2025]

Model evaluation can be completed using the evaluate.sh script (found in ../../graphormer/evaluate/), which calls the evaluate.py function. Open the script, and identify what dataloader the script is calling using these flags

      --user-data-dir testing_dataset \
      --dataset-name IR_test \

Which refer to the testing_dataset subdir containing a dataloader that is registered as 'testing_dataset' (see line 293 of /graphormer/evaluate/testing_dataset/IR_test.py). You can change these if needed if you want to modify your dataloader, organize files, etc.

  1. Make sure your model weights (the pre-trainined .pt model file) are in the correct directory. This is set by the flag:
        --save-dir '../../checkpoints' \

You can put multiple models into this folder, and it will make predictions for multiple pre-trainined instances. This might be good to do for robustness of predictions because of training/testing biases. You can download these model weights on our zenodo (see below). IR and IRIS predictions require different weights.

  1. Modify the dataloader to load the smiles/phase combinations you want. This can be done in line 182 of /graphormer/evaluate/testing_dataset/IR_test.py:
 x = import_data(r'../../sample_data/sample_IR_train_data.csv')[1:]). 

Make sure you match the .csv structure/order as the sample data provided:

  1. Set your file save location with the evaluate.sh flag:
 ' --save-path '../../predictions/ir_preds.csv' \

If this is not a .csv file path, it will not save any data. This will spit out your measured and predicted spectrum associated with the predictions in your dataloader.

  1. Run the bash script in your virtual environment! The model should save predictions in a .csv file named according to the .pt file used. Taking the average of multiple pre-trained model weights can help account for variations in training biases.

You should be able to make predictions for any combinations of SMILES and spectral phase/charge state. Keep in mind that predictions will only be as good as the chemical coverage of the pre-training libraries. (see papers)

Models

The five best performing Graphormer-IR models (full set of node features, learned node features, combinatoric edges, etc.), discussed in detail in the manuscript are freely available online at Zenodo. These can be used for model evaluation using the evaluate.sh script and accompanying dataloader.

[APRIL 2025 UPDATE]
Graphormer-IRIS models are available at Zenodo

Common Errors

"Segmentation Fault... Core Dumped" may indicate that you have installed the incorrect version of PyTorch Geometric (https://data.pyg.org/whl/). This can be further tested by checking the package import (e.g., from pytorch_geometric.data import data)

Contact

If you require further assistance with developing your own model or have any questions about its implementaton, the authors can be contacted at

About

Graphormer-IR is an extension of the Graphormer framework, specifically to perform predictions of infrared spectra using only chemical structure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published