SyntMTE: Synthesis Condition Prediction for Inorganic Solid-State Reactions

A machine learning project to predict material synthesis conditions, leveraging models like CrabNet, MTEncoder, and XGBoost to accelerate materials discovery. 🚀

🎯 Why Predict Synthesis Conditions?

Traditional materials discovery relies on a combination of domain expertise, chemical intuition, and extensive trial-and-error experimentation. This process is often slow, expensive, and limited in scope. Several tools have been developed to accelerate the materials discovery workflow. However, one of the most important steps remains the search for suitable synthesis conditions:

User: I want to synthesize Ba₃FeO₄ 🔥
SyntMTE: What precursors are you planning to use?
User: Fe₃O₄ and BaCO₃.
SyntMTE: Understood. I suggest annealing for 10 hours at 815 °C, then sintering at 950 °C for 12 hours.

This project provides a data-driven approach to synthesis parameter planning. By learning from successful experiments reported in the literature, our models can guide researchers toward more promising synthesis routes, saving time and resources. For more information, read the paper cited below.

🔑 Key Features

🎯 Multiple Models: Implements CrabNet, MTEncoder, and XGBoost for robust prediction.
🔌 Plug-and-Play: Easily train and evaluate different models using a unified interface.
⚡ Hyperparameter Optimization: Integrated with Optuna for efficient hyperparameter tuning.
📈 Extensible: Designed with a clear structure to facilitate the addition of new models and datasets.
📊 Analysis Ready: Includes notebooks for visualizing results and gaining insights from predictions.

🛠️ Installation

Prerequisites

A virtual environment manager like uv or conda

Setup Instructions

Clone the repository

git clone https://github.com/Thorben010/SyntMTE.git
cd SyntMTE

Create and activate environment (using uv)

uv venv --python 3.11.13
source .venv/bin/activate

Install dependencies
```
uv pip install -r requirements.txt
```
Download pertrained MTEncoder Checkpoint The MTEncoder model uses pre-trained weights from a dedicated Hugging Face repository. Clone the repository into the model_weights directory:
```
git clone https://huggingface.co/thor1/MTEncoder_alexandria model_weights
```
Download SyntMTE Model Weights In order to run inference without training your own model, clone the repository into the SyntMTE_001 directory:
```
git clone https://huggingface.co/thor1/SyntMTE_001
```

🏃‍♂️ Quick Start

The primary way to run this project is via the src/main.py script.

Inference

To run inference on your own data, simply replace example_inference.csv with the path to your custom CSV file.

python3 src/main.py \
    --mode "predict" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --predict_dataset "data/conditions/inference/example_inference.csv" \
    --checkpoint_path 'SyntMTE_001/best_model.pth'

Custom Training

You can customize runs by calling src/main.py directly. This allows you to specify the model, dataset, and other hyperparameters. Also keep in mind which default parameter choices are set.

python3 src/main.py \
    --mode "train" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --dataset data/conditions/random_split

Hyperparameter Tuning

python3 src/main.py \
    --mode "tune" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --dataset data/conditions/random_split

You can find the prediction results in the logs/ directory.

📂 Flag Descriptions

Click to view explanations.

Flag	Type	Choices	Default	Description
`mode`	`string`	`train`, `tune`, `predict`	`train`	Mode to run: standard training, hyperparameter optimization, or prediction.
`embedder_type`	`string`	`CrabNet`, `composition`, `MTEncoder`, `clr`	`MTEncoder`	Type of embedder to use in the model.
`aggregation_mode`	`string`	`attention`, `mean`, `max`, `sum`, `mean_max`, `conv`, `lstm`, `precursor_target_concat`, `concat`	`mean`	Type of aggregation to use for target and precursors.
`dataset`	`string`	-	`data/conditions/random_split`	Path to the dataset.
`checkpoint_path`	`string`	-	`.../logs/20250501-004811/best_model.pth`	Path to load a model checkpoint from.
`learning_rate`	`float`	-	`0.0000439204`	Learning rate for the optimizer (only used in training).
`use_target_only`	`bool`	-	`False`	If set, uses only the target material's representation for regression.

Debugging

Note:
If you encounter an error such as "error 118" or issues related to weights_only=True when loading model weights from Hugging Face, it may be due to missing .pth files. This often happens if Git LFS is not installed, which is required to download large files from Hugging Face repositories.
To resolve this, please ensure you have Git LFS installed and run git lfs pull in the repository directories to download all necessary model weight files.

📝 Citation

If you use this work in your research, please cite our paper:

@article{Prein2025SyntMTE,
  author    = {Prein, Thorben and Pan, Elton and Jehkul, Janik and Weinmann, Steffen and Olivetti, Elsa A. and Rupp, Jennifer L. M.},
  title     = {Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials},
  journal   = {ACS Applied Materials & Interfaces},
  year      = {2025},
  note      = {Under Review}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data/conditions		data/conditions
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyntMTE: Synthesis Condition Prediction for Inorganic Solid-State Reactions

🎯 Why Predict Synthesis Conditions?

🔑 Key Features

🛠️ Installation

Prerequisites

Setup Instructions

🏃‍♂️ Quick Start

Inference

Custom Training

Hyperparameter Tuning

📂 Flag Descriptions

Debugging

📝 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Thorben010/SyntMTE

Folders and files

Latest commit

History

Repository files navigation

SyntMTE: Synthesis Condition Prediction for Inorganic Solid-State Reactions

🎯 Why Predict Synthesis Conditions?

🔑 Key Features

🛠️ Installation

Prerequisites

Setup Instructions

🏃‍♂️ Quick Start

Inference

Custom Training

Hyperparameter Tuning

📂 Flag Descriptions

Debugging

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages