Skip to content

Code for training Δ-ML models for the prediction of transition metal complex properties.

License

Notifications You must be signed in to change notification settings

uiocompcat/TMC-Delta-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Δ-ML for transition metal complexes

This repository contains the machine learning code for employing Δ-ML strategies for the prediction of transition metal complex (TMC) properties utilizing u-NatQG representations. The code is associated with the publication "∆-Machine Learning for the Prediction of Metal Complex Properties" which explains the technical details of the approach and discusses all conducted experiments.

Data

The dataset used to train these models has been derived from the tmQMg dataset and for each TMC contained in it corresponding low-fidelity u-NatQ graphs have been generated. The graph building procedure has been identical to the orginial publication with the only difference that the underlying data used for the construction was calculated at a lower level of theory. In total two different low-fidelity graph representations were generated at the following levels of theory:

  • GFN2-xTB // LSDA/LANL2DZ
  • GFN2-xTB // PBE0-D3BJ/def2-TZVP

The graphs can be downloaded from Zenodo.

Code

The employed graph neural network architecture constitutes a minor modification of the model proposed by Gilmer and coworkers. The code is designed to be executed using the Python package Weights and biases (wandb) to log the results.

Requirements

You need a Python3 installation with the following packages:

Because logging is done using the wandb package you will need an account to run these scripts.

Usage

Navigate into the code directory and open the file ml.py. In this file, edit the entries <wandb_project_name> and <wandb_entity> with your wandb credentials. This will direct all the logging output to a project in you wandb account. Then edit the entry <root_dir> with the path of the directory on your machine that you want to store the raw and processed data in. After that, run with python3 ml.py to perform all experiments reported in the publication.

In total there are 5 different quantum properties that are used. For the electronic and dispersion energies an addtional linear fitting procedure based on atomic contributions is applied. (Details can be found in the SI of the original publication.)

The following properties do not use this atomic fitting procedure:

  • HOMO-LUMO gap
  • Polarizability
  • Dipole moment

The following properties use this atomic fitting procedure:

  • Electronic energy
  • Dispersion energy

If you want to only run the models for a subset of properties you can change the corresponding function calls at the very bottom of the ml.py script. To control whether to utilize the Δ-ML approach, and if so which fidelity to use, use the delta_mode keyword in the function call:

  • 'bench' will run the benchmark approach.
  • 'lsda' will run the Δ-ML approach utilizing graphs obtain at the GFN2-xTB//LSDA/LANL2DZ level of theory.
  • 'pbe0' will run the Δ-ML approach utilizing graphs obtain at the GFN2-xTB//PBE0-D3BJ/def2-TZVP level of theory. To enforce a certain train-validation-test split use the train_val_test_split_dir keyword which should point to a directory holding three files train_ids.txt, val_ids.txt and test_ids.txt containing the CSD identifiers of the graphs to be used for each set. In the train_val_test_base/ and train_val_first_test_second_third directories we provide the corresponding for the splits used in the publication.

Note that the script tmQMg.py will automatically download all necessary data files from the tmQMg repository.

About

Code for training Δ-ML models for the prediction of transition metal complex properties.

Resources

License

Stars

Watchers

Forks

Languages