Skip to content
View Thorben010's full-sized avatar
:octocat:
:octocat:
  • Munich

Highlights

  • Pro

Block or report Thorben010

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Thorben010/README.md

👋 Hi, I'm Thorben – a 3rd-year PhD at TUM, exploring the intersection of Machine Learning and Materials Discovery! 🚀

I’m passionate about advancing materials science by integrating state of the art AI techniques. My contributions span from materials representation learning to NLP. 🌟


🧪 Materials Science

LLMForge: Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

  • Benchmarking of seven state-of-the-art LMs on precursor recommendation and synthesis-condition regression, achieving up to 53.8% Top-1 precursor accuracy and MAEs below 126 °C. Generated 28,548 synthetic solid-state recipes via OpenAI GPT-4.1, expanding current dataset size by 616 % over literature-mined protocols.
  • Methods: Large Language Models, Data Augmentation, Transformer Pretraining, Ensemble Modeling
  • Repository: Project GitHub & Dataset
Condition Regression Performance

MTENCODER: A Multi-task Pretrained Transformer Encoder for Materials Representation Learning

Reaction Graph Networks for Inorganic Synthesis Condition Prediction

Reaction Graph Network

A Chemically-Guided Generative Diffusion Model for Materials Synthesis Planning

Denoising Diffusion PNG Denoising Diffusion GIF

📖 Natural Language Processing

Augmenting Scientific Creativity with Retrieval across Knowledge Domains

Scientific Creativity

Extracting a Database of Challenges and Mitigation Strategies for Sodium-ion Battery Development

Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models


Pinned Loading

  1. tum-ai/number-token-loss tum-ai/number-token-loss Public

    A regression-alike loss to improve numerical reasoning in language models - ICML 2025

    Jupyter Notebook 27 5

  2. olivettigroup/cross-domain-exploration olivettigroup/cross-domain-exploration Public

    This project presents an exploratory search system that enables scientists to discover research outside their familiar domains by selecting text from paper abstracts to retrieve diverse yet relevan…

    8

  3. SyntMTE SyntMTE Public

    Prein, T., Pan, E., Jehkul, J., Weinmann, S., Olivetti, E. A., & Rupp, J. L. M. (2025). Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials. ACS Applied Materials & Int…

    Python 6 2

  4. llm_synthesis llm_synthesis Public

    Prein, T., Pan, E., Jehkul, J., Weinmann, S., Olivetti, E. A., & Rupp, J. L. M. (2025). Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials. ACS Applied Materials & Int…

    4

  5. olivettigroup/NLP4SIB olivettigroup/NLP4SIB Public

    Datasets and pre-trained models for Munjal, Mrigi, et al. "Scaling Sodium-ion Battery Development with NLP." AI for Accelerated Materials Design-NeurIPS 2023 Workshop. 2023.

    Python 6