Create a new python virtual environments and install requirements found in requirements.txt
This repository hosts scripts used for generating grain boundary descriptors in our research, "Describe, Transform, Machine Learning: Feature Engineering for Grain Boundaries and Other Variable-Sized Atom Clusters".
The scripts streamline the process of translating grain boundary representations from Cartesian coordinates to atomic environment descriptors, and finally to grain boundary descriptors, allowing the generated descriptors to be used as features in Machine Learning models.
The dataset used in the paper can be found here
LAMMPS dump files in sample_data
energies for corresponding grain boundary configs in sample_energy.txt
The .out files produced from LAMMPS simulations store the Cartesian coordinates of individual atoms within a grain boundary. The sample_data directory includes a fraction of the 7000 .out files contained in our original dataset.
To contextualize the atom in its environment, we transform the original Nx3 Cartesian coordinate representation into an atomic environment descriptor space (NxM). The following methods were employed in the transformation process:
- Smooth Overlap of Atomic Positions (SOAP)
- Atom Center Symmetry Functions (ACSF)
- Strain Functionals
- Atomic Cluster Expansion (ACE)
- Graph
- Centrosymmetry
- Strain Functionals
Strain functionals were not generated by us, the creators of strain functionals were given our data to genreated the strain functional data which then was given back to us.
Considering the variability in the number of atoms (N) within each .out file, a standardization step is needed before feeding these descriptors into a Machine Learning model. Consequently, each atomic environment descriptor (NxM) is further mapped into a grain boundary descriptor space (PxM) with a fixed length (P).
The implemented methods for this transformation include:
- Average
- Kmeans
- Skeleton Decomposition/CUR
- Largest Simplex
- GaussianKDE
- graph2vec
For an in-depth explanation of these methods, please refer to our paper.