This repository contains the machine learning code for employing Δ-ML strategies for the prediction of transition metal complex (TMC) properties utilizing u-NatQG representations. The code is associated with the publication "∆-Machine Learning for the Prediction of Metal Complex Properties" which explains the technical details of the approach and discusses all conducted experiments.
The dataset used to train these models has been derived from the tmQMg dataset and for each TMC contained in it corresponding low-fidelity u-NatQ graphs have been generated. The graph building procedure has been identical to the orginial publication with the only difference that the underlying data used for the construction was calculated at a lower level of theory. In total two different low-fidelity graph representations were generated at the following levels of theory:
- GFN2-xTB // LSDA/LANL2DZ
- GFN2-xTB // PBE0-D3BJ/def2-TZVP
The graphs can be downloaded from Zenodo.
The employed graph neural network architecture constitutes a minor modification of the model proposed by Gilmer and coworkers. The code is designed to be executed using the Python package Weights and biases (wandb) to log the results.
You need a Python3 installation with the following packages:
Because logging is done using the wandb package you will need an account to run these scripts.
Navigate into the code directory and open the file ml.py. In this file, edit the entries <wandb_project_name> and <wandb_entity> with your wandb credentials. This will direct all the logging output to a project in you wandb account. Then edit the entry <root_dir> with the path of the directory on your machine that you want to store the raw and processed data in. After that, run with python3 ml.py to perform all experiments reported in the publication.
In total there are 5 different quantum properties that are used. For the electronic and dispersion energies an addtional linear fitting procedure based on atomic contributions is applied. (Details can be found in the SI of the original publication.)
The following properties do not use this atomic fitting procedure:
- HOMO-LUMO gap
- Polarizability
- Dipole moment
The following properties use this atomic fitting procedure:
- Electronic energy
- Dispersion energy
If you want to only run the models for a subset of properties you can change the corresponding function calls at the very bottom of the ml.py script. To control whether to utilize the Δ-ML approach, and if so which fidelity to use, use the delta_mode keyword in the function call:
- 'bench' will run the benchmark approach.
- 'lsda' will run the Δ-ML approach utilizing graphs obtain at the GFN2-xTB//LSDA/LANL2DZ level of theory.
- 'pbe0' will run the Δ-ML approach utilizing graphs obtain at the GFN2-xTB//PBE0-D3BJ/def2-TZVP level of theory.
To enforce a certain train-validation-test split use the
train_val_test_split_dirkeyword which should point to a directory holding three filestrain_ids.txt,val_ids.txtandtest_ids.txtcontaining the CSD identifiers of the graphs to be used for each set. In thetrain_val_test_base/andtrain_val_first_test_second_thirddirectories we provide the corresponding for the splits used in the publication.
Note that the script tmQMg.py will automatically download all necessary data files from the tmQMg repository.