GitHub - Materials-Informatics-Group/MonteCat: The MonteCat Code

This script follows the proposed MonteCat algorithm that constructs a Regression Model from a big pool of engineered Descriptors (Features) through an adaptation of the Metropolis-Hastings algorithm. The number of iterations and the Temperature modulating the Acceptance Probability are determined by the user and can be changed in the General Script Parameters section of the code under the variable names ‘iterations’ and ‘temperature’, respectively. The code has been tested with Temperature values ranging from 5 to 2000, and values up to 1,000,000 have been used for the number of iterations. Among other variables that can be changed are the regression model to use (this sample code contains only SVR and Linear regression models, but others can be added by the user), SVR hyperparameters (C and γ) and random seed values in case reproducibility is necessary for the intended application.

The generation of the randomized proposals that change the Descriptors used for the Regression model, the decision on whether or not to accept a specific outcome, the evaluation of the Regression models through Cross Validation and the preparation of the results for real time output are carried out by calling specific Functions contained in the 'Functions' subsection. The Main Script comprises the outline of the methodology: first determining the starting point throughb a greedy Descriptor Addition, and then looping for the specified numbers of iterations, generating random Descriptor addition and removal proposals and deciding on adopting them depending on how they improve the performance of the Regression model.

The MonteCat code was written in Python3 and uses the NumPy, pandas and scikit-learn libraries for its calculations. This script only needs an input dataset (the training data) and outputs a report file continuously overwritten in real time detailing the outcomes of each iteration of the algorithm, as well as a filtered training data containing only the descriptor variables at the end of the code's execution and the target variable.

Cite: MonteCat: A Basin-Hopping-Inspired Catalyst Descriptor Search Algorithm for Machine Learning Models Fernando Garcia-Escobar, Toshiaki Taniike, and Keisuke Takahashi J. Chem. Inf. Model. 2024, https://doi.org/10.1021/acs.jcim.3c01952

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
MonteCat_Sample_Code.py		MonteCat_Sample_Code.py
README.md		README.md
SampleData.csv		SampleData.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Materials-Informatics-Group/MonteCat

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages