Skip to content

ChemCluster is an interactive web app built with Streamlit that helps chemists analyze and visualize molecular structures. It supports both single-molecule and dataset exploration modes, offering 2D/3D visualization, conformer clustering, property calculation, and PCA-based clustering.

License

Notifications You must be signed in to change notification settings

erubbia/ChemCluster

Repository files navigation

Logo ChemCluster

- ChemCluster -

ChemCluster is an interactive web application for cheminformatics and molecular analysis, focusing on forming and visualizing molecular clusters built using Streamlit, RDKit, and scikit-learn.

Final project for the course Practical Programming in Chemistry — EPFL CH-200

📦 Package overview

ChemCluster is an interactive cheminformatics platform developed at EPFL in 2025 as part of the Practical Programming in Chemistry course. It is a user-friendly web application designed to explore and analyze chemical structures, either individually via the formation of conformers or as datasets.

This tool enables users to compute key molecular properties, visualize 2D and 3D structures, and perform clustering based on molecular similarity or conformer geometry. It also offers filtering options to help select clusters matching specific physicochemical criteria.

🌟 Features

  • Upload .sdf, .mol, or .csv files containing SMILES
  • Input or draw a single molecule and generate 3D conformers
  • Compute key molecular properties (MW, logP, H-bonding, etc.)
  • Visualize molecules in 2D (RDKit) and interactively in 3D (Py3Dmol)
  • Cluster molecules using PCA + KMeans with silhouette score optimization
  • Click points on the PCA plot to inspect molecules and properties
  • Overlay and compare 3D cluster centroids for conformers
  • Filter clusters based on desired property profiles
  • Export results and clusters as .csv files

🛠️ Installation

  1. Install from PyPI:
pip install chemcluster
  1. Run the app:
chemcluster

This will open the ChemCluster interface in your browser.

To run locally from source:

git clone https://github.com/erubbia/ChemCluster.git
cd ChemCluster
conda env create -f environment.yml
conda activate chemcluster-env
(chemcluster-env) $ pip install -e .

Testing can be done with 'pytest' or 'tox':

(chemcluster-env) $ pytest
# or 
(chemcluster-env) $ tox

📖 Usage

Launching the app brings you to the main page, where you can select one of two modes:

centroid superposition

Single molecule mode:

  • Draw and paste SMILES to visualize and cluster conformers
  • View and overlay optimized 3D centroid structures

centroid superposition

Data set mode:

  • Upload a SMILES data set to analyze chemical space
  • Perform PCA + KMeans clustering with property-based filters
  • Click to view molecules and export clusters

pca plot

pca plot 2

Then, you can select the cluster(s) that you want to export as a .csv file by scrolling to the bottom then clicking "Download Cluster Molecules".

📂 License

MIT License


👨‍🔬 Developers

  • Elisa Rubbia, Master's student in Molecular and Biological Chemistry at EPFL GitHub - erubbia

  • Romain Guichonnet, Master's student in Molecular and Biological Chemistry at EPFL GitHub - Romainguich

  • Flavia Zabala Perez, Master's student in Molecular and Biological Chemistry at EPFL GitHub - Flaviazab

About

ChemCluster is an interactive web app built with Streamlit that helps chemists analyze and visualize molecular structures. It supports both single-molecule and dataset exploration modes, offering 2D/3D visualization, conformer clustering, property calculation, and PCA-based clustering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •