A machine learning project to predict material synthesis conditions, leveraging models like CrabNet, MTEncoder, and XGBoost to accelerate materials discovery. 🚀
Traditional materials discovery relies on a combination of domain expertise, chemical intuition, and extensive trial-and-error experimentation. This process is often slow, expensive, and limited in scope. Several tools have been developed to accelerate the materials discovery workflow. However, one of the most important steps remains the search for suitable synthesis conditions:
SyntMTE: What precursors are you planning to use?
User: Fe3O4 and BaCO3.
SyntMTE: Understood. I suggest annealing for 10 hours at 815 °C, then sintering at 950 °C for 12 hours.
This project provides a data-driven approach to synthesis parameter planning. By learning from successful experiments reported in the literature, our models can guide researchers toward more promising synthesis routes, saving time and resources. For more information, read the paper cited below.
- 🎯 Multiple Models: Implements CrabNet, MTEncoder, and XGBoost for robust prediction.
- 🔌 Plug-and-Play: Easily train and evaluate different models using a unified interface.
- ⚡ Hyperparameter Optimization: Integrated with Optuna for efficient hyperparameter tuning.
- 📈 Extensible: Designed with a clear structure to facilitate the addition of new models and datasets.
- 📊 Analysis Ready: Includes notebooks for visualizing results and gaining insights from predictions.
- A virtual environment manager like
uvorconda
-
Clone the repository
git clone https://github.com/Thorben010/SyntMTE.git cd SyntMTE -
Create and activate environment (using
uv)uv venv --python 3.11.13 source .venv/bin/activate -
Install dependencies
uv pip install -r requirements.txt
-
Download pertrained MTEncoder Checkpoint The MTEncoder model uses pre-trained weights from a dedicated Hugging Face repository. Clone the repository into the
model_weightsdirectory:git clone https://huggingface.co/thor1/MTEncoder_alexandria model_weights
-
Download SyntMTE Model Weights In order to run inference without training your own model, clone the repository into the
SyntMTE_001directory:git clone https://huggingface.co/thor1/SyntMTE_001
The primary way to run this project is via the src/main.py script.
To run inference on your own data, simply replace example_inference.csv with the path to your custom CSV file.
python3 src/main.py \
--mode "predict" \
--embedder_type MTEncoder \
--aggregation_mode "mean" \
--predict_dataset "data/conditions/inference/example_inference.csv" \
--checkpoint_path 'SyntMTE_001/best_model.pth'You can customize runs by calling src/main.py directly. This allows you to specify the model, dataset, and other hyperparameters. Also keep in mind which default parameter choices are set.
python3 src/main.py \
--mode "train" \
--embedder_type MTEncoder \
--aggregation_mode "mean" \
--dataset data/conditions/random_split python3 src/main.py \
--mode "tune" \
--embedder_type MTEncoder \
--aggregation_mode "mean" \
--dataset data/conditions/random_splitYou can find the prediction results in the logs/ directory.
Click to view explanations.
| Flag | Type | Choices | Default | Description |
|---|---|---|---|---|
mode |
string |
train, tune, predict |
train |
Mode to run: standard training, hyperparameter optimization, or prediction. |
embedder_type |
string |
CrabNet, composition, MTEncoder, clr |
MTEncoder |
Type of embedder to use in the model. |
aggregation_mode |
string |
attention, mean, max, sum, mean_max, conv, lstm, precursor_target_concat, concat |
mean |
Type of aggregation to use for target and precursors. |
dataset |
string |
- | data/conditions/random_split |
Path to the dataset. |
checkpoint_path |
string |
- | .../logs/20250501-004811/best_model.pth |
Path to load a model checkpoint from. |
learning_rate |
float |
- | 0.0000439204 |
Learning rate for the optimizer (only used in training). |
use_target_only |
bool |
- | False |
If set, uses only the target material's representation for regression. |
Note:
If you encounter an error such as "error 118" or issues related to weights_only=True when loading model weights from Hugging Face, it may be due to missing .pth files. This often happens if Git LFS is not installed, which is required to download large files from Hugging Face repositories.
To resolve this, please ensure you have Git LFS installed and run git lfs pull in the repository directories to download all necessary model weight files.
If you use this work in your research, please cite our paper:
@article{Prein2025SyntMTE,
author = {Prein, Thorben and Pan, Elton and Jehkul, Janik and Weinmann, Steffen and Olivetti, Elsa A. and Rupp, Jennifer L. M.},
title = {Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials},
journal = {ACS Applied Materials & Interfaces},
year = {2025},
note = {Under Review}
}
