Skip to content
johannesa edited this page Jul 24, 2020 · 13 revisions

Welcome to the gp-net wiki!

gp-net is a regression tool for predicting Density Functional Theory (DFT) calculated properties of materials, and estimating the uncertainties on these predictions. The prediction and uncertainties are obtained with a Matern One Half kernel Gaussian Process (GP), and active learning can be performed using these uncertainties.

gp-net uses MEGNet, a graph network for training optical properties of materials. gp-net has only been tested on data from the Materials Project database, so there is no guarantee data from other material science databases might work with gp-net.

gp-net replaces the 16 and 1 dense layers of the graph network with the GP. The activations of the 32 dense layer are extracted, and either used as they are or pre-processed (or transformed). ndims can be set as below:

  • 0 => no pre-processing of the extracted activations
  • 1 => scale features to 0, 1 range
  • 2 or 3 => apply tSNE

These transformed activations are used as latent index points by the GP for prediction and uncertainty quantification. By default, The GP kernel has dimensions equal to the transformed activations. The optimised hyperparameters of the GP kernel are obtained, by minimising the negative log likelihood function with the Adam optimiser. gp-net uses mean of the GP as the predicted property, and the standard deviation on the predictions as the uncertainties.

In the active learning process, the variance on the predictions are used as the uncertainties as recommended by kdnuggets.

Downloading data

A twenty-digit alphanumeric key and the optical property of interest are required. Data for multiple optical properties can be downloaded by passing these properties separated by white spaces. By default, gp-net downloads the data, and runs the requested feature on-the-fly. An example syntax to download band gap and formation energy per atom datasets is shown below:

-key xxYYzzlo987x6mnt7Y10 band_gap formation_energy_per_atom

NB: The key used in the example above is fictitious, but a valid key can be obtained from the Materials Project by first signing up to use their database.

Passing already existing data

Only pickle file formats are readable by gp-net. The filename(s) must be of the form property_data.pkl. Multiple pickle files can be passed by separating each filename with white spaces. An example syntax to pass band gap and formation energy per atom datasets is shown below:

-data band_gap_data.pkl formation_energy_per_atom_data.pkl

Including/excluding zero optical property values

By default, gp-net excludes zero optical property values from the dataset before any analysis is carried out. To include these zero values, the syntax is -include

Checking samples in data file

It is recommended the number of samples in the data file is first inspected before running any feature of gp-net. The number of samples in the data file is helpful in deciding on how to split the dataset for the analysis. To check the number of samples including zero values in the band gap data,

python gp-net.py -checkdata -data band_gap_data.pkl -include

OR exclude the zero values with

python -checkdata -data band_gap_data.pkl

Checking layers in a MEGNet model

The layers of a .hdf5 pre-trained MEGNet model can be inspected before proceeding with any analysis. The layer from which the activations to be extracted from can be passed to any feature of gp-net for analysis. The syntax for passing, example readout_0 layer is -layer readout_0. By default, the activations are extracted from the readout_0 layer ie the 32 dense layer. An example syntax for inspecting the layers of a fitted formation energy model is

python gp-net.py -ltype fitted_formation_energy_per_atom_model.hdf5

Using a pre-existing best MEGNet model

During training of the materials, gp-net allows for an already existing best model to be used by passing the argument -prev. By default, an existing best model is not used.

GP training: hyperparameter optimisation

To obtain the best hyperparameters of the kernel, maxiters must be greater than 0. For prediction and uncertainty quantification, gp-net selects the best hyperparameters that produce the least mean absolute error (MAE) on the data set to be predicted. An exponential bijector is applied to the hyperparameters to avoid obtaining negative values during training. If the best hyperparameters are known from a previous GP training run, they maxiters must be set to 0, and no bijector is applied.

Limitations

  • Prediction by the GP does not consider the uncertainties in the observations. This is primarily because gp-net has only been tested on DFT-calculated properties without uncertainties.
  • gp-net does not perform classification. There are plans to include classification in the next release.