Skip to content

asiomchen/molalchemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MolAlchemy

molalchemy - Making chemical databases as easy as regular databases! πŸ§ͺ✨

pypi version license python versions PyPI - Downloads PyPI Downloads codecov powered by rdkit SQLAlchemy Ruff

Extensions for SQLAlchemy to work with chemical cartridges

molalchemy provides seamless integration between python and chemical databases, enabling powerful chemical structure storage, indexing, and querying capabilities. The library supports popular chemical cartridges (Bingo PostgreSQL & RDKit PostgreSQL) and provides a unified API for chemical database operations.

This project was originally supposed to be a part of RDKit UGM 2025 hackathon, but COVID had other plans for me. Currently it is in alpha stage as a proof of concept. Contributions are welcome!

To give it a hackathon vibe, I build this PoC in couple hours, so expect some rough edges and missing features.

πŸš€ Features

  • Chemical Data Types: Custom SQLAlchemy types for molecules, reactions and fingerprints
  • Chemical Cartridge Integration: Support for Bingo and RDKit PostgreSQL cartridges
  • Substructure Search: Efficient substructure and similarity searching
  • Chemical Indexing: High-performance chemical structure indexing
  • Alembic Integration: Automatic handling of extensions and imports in database migrations
  • Typing: As much type hints as possible - no need to remember yet another abstract function name
  • Easy Integration: Drop-in replacement for standard SQLAlchemy types

πŸ“¦ Installation

Using pip

pip install molalchemy

From source

pip install git+https://github.com/asiomchen/molalchemy.git

# or clone the repo and install
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
pip install .

Prerequisites

  • Python 3.10+
  • SQLAlchemy 2.0+
  • rdkit 2024.3.1+
  • Running PostgreSQL with chemical cartridge (Bingo or RDKit) (see docker-compose.yaml for a ready-to-use setup)

For development or testing, you can use the provided Docker setup:

# For RDKit cartridge
docker-compose up rdkit

# For Bingo cartridge  
docker-compose up bingo

πŸ“ Project Structure

molalchemy/
β”œβ”€β”€ src/molalchemy/
β”‚   β”œβ”€β”€ types.py              # Base type definitions
β”‚   β”œβ”€β”€ helpers.py            # Common utilities
β”‚   β”œβ”€β”€ alembic_helpers.py    # Alembic integration utilities
β”‚   β”œβ”€β”€ bingo/               # Bingo PostgreSQL cartridge support
β”‚   β”‚   β”œβ”€β”€ types.py         # Bingo-specific types
β”‚   β”‚   β”œβ”€β”€ index.py         # Bingo indexing
β”‚   β”‚   β”œβ”€β”€ comparators.py   # SQLAlchemy comparators
β”‚   β”‚   └── functions/       # Bingo database functions
β”‚   └── rdkit/               # RDKit PostgreSQL cartridge support
β”‚       β”œβ”€β”€ types.py         # RDKit-specific types
β”‚       β”œβ”€β”€ index.py         # RDKit indexing  
β”‚       β”œβ”€β”€ comparators.py   # SQLAlchemy comparators
β”‚       └── functions/       # RDKit database functions
β”œβ”€β”€ tests/                   # Test suite
β”œβ”€β”€ docs/                    # Documentation
└── dev_scripts/             # Development utilities

πŸ”§ Quick Start

To learn how to use molalchemy, check out the Quick Start - RDKit and Quick Start - Bingo tutorials in the documentation.

πŸ—οΈ Supported Cartridges

Bingo Cartridge

from molalchemy.bingo.types import (
    BingoMol,              # Text-based molecule storage (SMILES/Molfile)
    BingoBinaryMol,        # Binary molecule storage with format conversion
    BingoReaction,         # Reaction storage (reaction SMILES/Rxnfile)
    BingoBinaryReaction    # Binary reaction storage
)
from molalchemy.bingo.index import (
    BingoMolIndex,         # Molecule indexing
    BingoBinaryMolIndex,   # Binary molecule indexing
    BingoRxnIndex,         # Reaction indexing
    BingoBinaryRxnIndex    # Binary reaction indexing
)
from molalchemy.bingo.functions import (
    # Individual function imports available, see documentation
    # for complete list of chemical analysis functions
)

RDKit Cartridge

from molalchemy.rdkit.types import (
    RdkitMol,              # RDKit molecule type with configurable return formats
    RdkitBitFingerprint,   # Binary fingerprints (bfp)
    RdkitSparseFingerprint,# Sparse fingerprints (sfp)
    RdkitReaction,         # Chemical reactions
    RdkitQMol,             # Query molecules
    RdkitXQMol,            # Extended query molecules
)
from molalchemy.rdkit.index import (
    RdkitIndex,            # RDKit molecule indexing (GIST index)
)
from molalchemy.rdkit.functions import (
    # Individual function imports available, see documentation
    # for complete list of 150+ RDKit functions
)

🎯 Advanced Features

Chemical Indexing

from molalchemy.bingo.index import BingoMolIndex
from molalchemy.bingo.types import BingoMol

class Molecule(Base):
    __tablename__ = 'molecules'
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    structure: Mapped[str] = mapped_column(BingoMol)
    name: Mapped[str] = mapped_column(String(100))
    
    # Add chemical index for faster searching
    __table_args__ = (
        BingoMolIndex('mol_idx', 'structure'),
    )

Configurable Return Types

from molalchemy.rdkit.types import RdkitMol

class MoleculeWithFormats(Base):
    __tablename__ = 'molecules_formatted'
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    # Return as SMILES string (default)
    structure_smiles: Mapped[str] = mapped_column(RdkitMol())
    # Return as RDKit Mol object
    structure_mol: Mapped[bytes] = mapped_column(RdkitMol(return_type="mol"))
    # Return as raw bytes
    structure_bytes: Mapped[bytes] = mapped_column(RdkitMol(return_type="bytes"))

Using Chemical Functions

The chemical functions are available as individual imports from the functions modules. Under the hood they use SQLAlchemy's func to call the corresponding database functions, and provide type hints and syntax highlighting in IDEs.

from molalchemy.bingo.functions import smiles, getweight, gross, inchikey

# Calculate molecular properties using Bingo functions
results = session.query(
    Molecule.name,
    getweight(Molecule.structure).label('molecular_weight'),
    gross(Molecule.structure).label('formula'),
    smiles(Molecule.structure).label('canonical_smiles')
).all()

# Validate molecular structures
from molalchemy.bingo.functions import checkmolecule

invalid_molecules = session.query(Molecule).filter(
    checkmolecule(Molecule.structure).isnot(None)
).all()

# Format conversions
inchi_keys = session.query(
    Molecule.id,
    inchikey(Molecule.structure).label('inchikey')
).all()

For RDKit functions:

from molalchemy.rdkit.functions import mol_amw, mol_formula, mol_inchikey

# Calculate molecular properties using RDKit functions
results = session.query(
    Molecule.name,
    mol_amw(Molecule.structure).label('molecular_weight'),
    mol_formula(Molecule.structure).label('formula'),
    mol_inchikey(Molecule.structure).label('inchikey')
).all()

Alembic Database Migrations

Molalchemy provides utilities for Alembic integration.For automatic import handling in migrations, the library provides type rendering utilities that ensure proper import statements are generated for molalchemy types.

# ...
from molalchemy import alembic_helpers
# ...

def run_migrations_offline():
    # ...
    context.configure(
        # ...
        render_item=alembic_helpers.render_item,
    )
    # ...


def run_migrations_online():
    # ...
    context.configure(
        # ...
        render_item=alembic_helpers.render_item,
    )
    # ...

πŸ§ͺ Development

Setting Up Development Environment

  1. Clone the repository:
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
  1. Install dependencies:
uv sync
  1. Activate the virtual environment:
source .venv/bin/activate

Running Tests

# Run all tests with coverage
make test

# Or use uv directly
uv run pytest

# Run specific test module
uv run pytest tests/bingo/

# Run with coverage
uv run pytest --cov=src/molalchemy

Code Quality

This project uses modern Python development tools:

  • uv: For virtual environment and dependency management
  • Ruff: For linting and formatting
  • pytest: For testing

Building Function Bindings

The chemical function bindings are automatically generated from cartridge documentation:

# Update RDKit function bindings
make update-rdkit-func

# Update Bingo function bindings  
make update-bingo-func

# Update all function bindings
make update-func

πŸ“š Documentation

🀝 Contributing

We welcome contributions! molalchemy offers many opportunities for developers interested in chemical informatics:

  • πŸ”° New to the project? Check out good first issues
  • πŸ”¬ Chemical expertise? Help complete RDKit integration or add ChemAxon support
  • 🐳 DevOps skills? Optimize our Docker containers and CI/CD pipeline
  • πŸ“š Love documentation? Create tutorials and improve API docs

Read our Contributing Guide for detailed instructions on getting started.

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

Core Technologies

  • RDKit - Open-source cheminformatics toolkit
  • Bingo - Chemical database cartridge by EPAM
  • SQLAlchemy - Python SQL toolkit and ORM

Inspiration and Similar Projects

  • GeoAlchemy2 - Spatial extension for SQLAlchemy, served as architectural inspiration for cartridge integration patterns
  • ord-schema - Open Reaction Database schema, is one of the few projects using custom chemical types with SQLAlchemy
  • Riccardo Vianello - His work on django-rdkit and razi provided valuable insights for chemical database integration (discovered after starting this project)

πŸ“§ Contact


molalchemy - Making chemical databases as easy as regular databases! πŸ§ͺ✨

About

Cheminformatic extension for SQLAlchemy

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages