Skip to content

Prein, T., Pan, E., Jehkul, J., Weinmann, S., Olivetti, E. A., & Rupp, J. L. M. (2025). Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials. ACS Applied Materials & Interfaces. Under review.

Notifications You must be signed in to change notification settings

Thorben010/SyntMTE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project logo

SyntMTE: Synthesis Condition Prediction for Inorganic Solid-State Reactions

License Python Code style: black Optuna

A machine learning project to predict material synthesis conditions, leveraging models like CrabNet, MTEncoder, and XGBoost to accelerate materials discovery. 🚀


🎯 Why Predict Synthesis Conditions?

Traditional materials discovery relies on a combination of domain expertise, chemical intuition, and extensive trial-and-error experimentation. This process is often slow, expensive, and limited in scope. Several tools have been developed to accelerate the materials discovery workflow. However, one of the most important steps remains the search for suitable synthesis conditions:

User: I want to synthesize Ba3FeO4 🔥
SyntMTE: What precursors are you planning to use?
User: Fe3O4 and BaCO3.
SyntMTE: Understood. I suggest annealing for 10 hours at 815 °C, then sintering at 950 °C for 12 hours.

This project provides a data-driven approach to synthesis parameter planning. By learning from successful experiments reported in the literature, our models can guide researchers toward more promising synthesis routes, saving time and resources. For more information, read the paper cited below.

Case Study Plot

🔑 Key Features

  • 🎯 Multiple Models: Implements CrabNet, MTEncoder, and XGBoost for robust prediction.
  • 🔌 Plug-and-Play: Easily train and evaluate different models using a unified interface.
  • Hyperparameter Optimization: Integrated with Optuna for efficient hyperparameter tuning.
  • 📈 Extensible: Designed with a clear structure to facilitate the addition of new models and datasets.
  • 📊 Analysis Ready: Includes notebooks for visualizing results and gaining insights from predictions.

🛠️ Installation

Prerequisites

  • A virtual environment manager like uv or conda

Setup Instructions

  1. Clone the repository

    git clone https://github.com/Thorben010/SyntMTE.git
    cd SyntMTE
  2. Create and activate environment (using uv)

    uv venv --python 3.11.13
    source .venv/bin/activate
  3. Install dependencies

    uv pip install -r requirements.txt
  4. Download pertrained MTEncoder Checkpoint The MTEncoder model uses pre-trained weights from a dedicated Hugging Face repository. Clone the repository into the model_weights directory:

    git clone https://huggingface.co/thor1/MTEncoder_alexandria model_weights
  5. Download SyntMTE Model Weights In order to run inference without training your own model, clone the repository into the SyntMTE_001 directory:

    git clone https://huggingface.co/thor1/SyntMTE_001

🏃‍♂️ Quick Start

The primary way to run this project is via the src/main.py script.

Inference

To run inference on your own data, simply replace example_inference.csv with the path to your custom CSV file.

python3 src/main.py \
    --mode "predict" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --predict_dataset "data/conditions/inference/example_inference.csv" \
    --checkpoint_path 'SyntMTE_001/best_model.pth'

Custom Training

You can customize runs by calling src/main.py directly. This allows you to specify the model, dataset, and other hyperparameters. Also keep in mind which default parameter choices are set.

python3 src/main.py \
    --mode "train" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --dataset data/conditions/random_split 

Hyperparameter Tuning

python3 src/main.py \
    --mode "tune" \
    --embedder_type MTEncoder \
    --aggregation_mode "mean" \
    --dataset data/conditions/random_split

You can find the prediction results in the logs/ directory.

📂 Flag Descriptions

Click to view explanations.
Flag Type Choices Default Description
mode string train, tune, predict train Mode to run: standard training, hyperparameter optimization, or prediction.
embedder_type string CrabNet, composition, MTEncoder, clr MTEncoder Type of embedder to use in the model.
aggregation_mode string attention, mean, max, sum, mean_max, conv, lstm, precursor_target_concat, concat mean Type of aggregation to use for target and precursors.
dataset string - data/conditions/random_split Path to the dataset.
checkpoint_path string - .../logs/20250501-004811/best_model.pth Path to load a model checkpoint from.
learning_rate float - 0.0000439204 Learning rate for the optimizer (only used in training).
use_target_only bool - False If set, uses only the target material's representation for regression.

Debugging

Note:
If you encounter an error such as "error 118" or issues related to weights_only=True when loading model weights from Hugging Face, it may be due to missing .pth files. This often happens if Git LFS is not installed, which is required to download large files from Hugging Face repositories.
To resolve this, please ensure you have Git LFS installed and run git lfs pull in the repository directories to download all necessary model weight files.

📝 Citation

If you use this work in your research, please cite our paper:

@article{Prein2025SyntMTE,
  author    = {Prein, Thorben and Pan, Elton and Jehkul, Janik and Weinmann, Steffen and Olivetti, Elsa A. and Rupp, Jennifer L. M.},
  title     = {Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials},
  journal   = {ACS Applied Materials & Interfaces},
  year      = {2025},
  note      = {Under Review}
}

About

Prein, T., Pan, E., Jehkul, J., Weinmann, S., Olivetti, E. A., & Rupp, J. L. M. (2025). Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials. ACS Applied Materials & Interfaces. Under review.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages