Skip to content

imaginethinking/mol-prop-pred-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Molecular Property Prediction API (Backend)

Overview

A FastAPI backend that predicts molecular properties from SMILES strings using RDKit and a trained scikit-learn model.

This project demonstrates a complete cheminformatics workflow: validation → featurisation → model inference → API response.


Tech Stack

  • Python, FastAPI, Uvicorn
  • RDKit, NumPy, pandas
  • scikit-learn, joblib
  • pytest

Dataset

This project uses the AqSolDB aqueous solubility dataset.

Download from Kaggle: https://www.kaggle.com/datasets/sorkun/aqsoldb-a-curated-aqueous-solubility-dataset

After downloading, place the CSV file at:

backend/data/raw/aqsoldb.csv

Setup

# create environment
python -m venv .venv
.venv\Scripts\activate

# install deps
pip install -r backend/requirements.txt

# train model (required once)
python -m backend.training.train

Run API

uvicorn backend.app.main:app --reload

Base URL:

http://127.0.0.1:8000

Endpoints

Health

GET /api/v1/health

Model Info

GET /api/v1/model-info

Predict

POST /api/v1/predict
{
  "smiles": "CCO"
}

Example Response

{
  "input_smiles": "CCO",
  "canonical_smiles": "CCO",
  "valid": true,
  "property_name": "aqueous_solubility",
  "prediction": -0.77,
  "units": "logS"
}

Tests

pytest

Docker

docker build -t molprop-api ./backend
docker run -p 8000:8000 molprop-api

Notes

  • Model: RandomForestRegressor
  • Features: Morgan fingerprints (1024 bits)
  • Training and inference are fully separated

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors