molalchemy - Making chemical databases as easy as regular databases! π§ͺβ¨
Extensions for SQLAlchemy to work with chemical cartridges
molalchemy provides seamless integration between python and chemical databases, enabling powerful chemical structure storage, indexing, and querying capabilities. The library supports popular chemical cartridges (Bingo PostgreSQL & RDKit PostgreSQL) and provides a unified API for chemical database operations.
This project was originally supposed to be a part of RDKit UGM 2025 hackathon, but COVID had other plans for me. Currently it is in alpha stage as a proof of concept. Contributions are welcome!
To give it a hackathon vibe, I build this PoC in couple hours, so expect some rough edges and missing features.
- Chemical Data Types: Custom SQLAlchemy types for molecules, reactions and fingerprints
- Chemical Cartridge Integration: Support for Bingo and RDKit PostgreSQL cartridges
- Substructure Search: Efficient substructure and similarity searching
- Chemical Indexing: High-performance chemical structure indexing
- Alembic Integration: Automatic handling of extensions and imports in database migrations
- Typing: As much type hints as possible - no need to remember yet another abstract function name
- Easy Integration: Drop-in replacement for standard SQLAlchemy types
pip install molalchemypip install git+https://github.com/asiomchen/molalchemy.git
# or clone the repo and install
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
pip install .- Python 3.10+
- SQLAlchemy 2.0+
- rdkit 2024.3.1+
- Running PostgreSQL with chemical cartridge (Bingo or RDKit) (see
docker-compose.yamlfor a ready-to-use setup)
For development or testing, you can use the provided Docker setup:
# For RDKit cartridge
docker-compose up rdkit
# For Bingo cartridge
docker-compose up bingomolalchemy/
βββ src/molalchemy/
β βββ types.py # Base type definitions
β βββ helpers.py # Common utilities
β βββ alembic_helpers.py # Alembic integration utilities
β βββ bingo/ # Bingo PostgreSQL cartridge support
β β βββ types.py # Bingo-specific types
β β βββ index.py # Bingo indexing
β β βββ comparators.py # SQLAlchemy comparators
β β βββ functions/ # Bingo database functions
β βββ rdkit/ # RDKit PostgreSQL cartridge support
β βββ types.py # RDKit-specific types
β βββ index.py # RDKit indexing
β βββ comparators.py # SQLAlchemy comparators
β βββ functions/ # RDKit database functions
βββ tests/ # Test suite
βββ docs/ # Documentation
βββ dev_scripts/ # Development utilities
To learn how to use molalchemy, check out the Quick Start - RDKit and Quick Start - Bingo tutorials in the documentation.
from molalchemy.bingo.types import (
BingoMol, # Text-based molecule storage (SMILES/Molfile)
BingoBinaryMol, # Binary molecule storage with format conversion
BingoReaction, # Reaction storage (reaction SMILES/Rxnfile)
BingoBinaryReaction # Binary reaction storage
)
from molalchemy.bingo.index import (
BingoMolIndex, # Molecule indexing
BingoBinaryMolIndex, # Binary molecule indexing
BingoRxnIndex, # Reaction indexing
BingoBinaryRxnIndex # Binary reaction indexing
)
from molalchemy.bingo.functions import (
# Individual function imports available, see documentation
# for complete list of chemical analysis functions
)from molalchemy.rdkit.types import (
RdkitMol, # RDKit molecule type with configurable return formats
RdkitBitFingerprint, # Binary fingerprints (bfp)
RdkitSparseFingerprint,# Sparse fingerprints (sfp)
RdkitReaction, # Chemical reactions
RdkitQMol, # Query molecules
RdkitXQMol, # Extended query molecules
)
from molalchemy.rdkit.index import (
RdkitIndex, # RDKit molecule indexing (GIST index)
)
from molalchemy.rdkit.functions import (
# Individual function imports available, see documentation
# for complete list of 150+ RDKit functions
)from molalchemy.bingo.index import BingoMolIndex
from molalchemy.bingo.types import BingoMol
class Molecule(Base):
__tablename__ = 'molecules'
id: Mapped[int] = mapped_column(Integer, primary_key=True)
structure: Mapped[str] = mapped_column(BingoMol)
name: Mapped[str] = mapped_column(String(100))
# Add chemical index for faster searching
__table_args__ = (
BingoMolIndex('mol_idx', 'structure'),
)from molalchemy.rdkit.types import RdkitMol
class MoleculeWithFormats(Base):
__tablename__ = 'molecules_formatted'
id: Mapped[int] = mapped_column(Integer, primary_key=True)
# Return as SMILES string (default)
structure_smiles: Mapped[str] = mapped_column(RdkitMol())
# Return as RDKit Mol object
structure_mol: Mapped[bytes] = mapped_column(RdkitMol(return_type="mol"))
# Return as raw bytes
structure_bytes: Mapped[bytes] = mapped_column(RdkitMol(return_type="bytes"))The chemical functions are available as individual imports from the functions modules. Under the hood they use SQLAlchemy's func to call the corresponding database functions, and provide type hints and syntax highlighting in IDEs.
from molalchemy.bingo.functions import smiles, getweight, gross, inchikey
# Calculate molecular properties using Bingo functions
results = session.query(
Molecule.name,
getweight(Molecule.structure).label('molecular_weight'),
gross(Molecule.structure).label('formula'),
smiles(Molecule.structure).label('canonical_smiles')
).all()
# Validate molecular structures
from molalchemy.bingo.functions import checkmolecule
invalid_molecules = session.query(Molecule).filter(
checkmolecule(Molecule.structure).isnot(None)
).all()
# Format conversions
inchi_keys = session.query(
Molecule.id,
inchikey(Molecule.structure).label('inchikey')
).all()For RDKit functions:
from molalchemy.rdkit.functions import mol_amw, mol_formula, mol_inchikey
# Calculate molecular properties using RDKit functions
results = session.query(
Molecule.name,
mol_amw(Molecule.structure).label('molecular_weight'),
mol_formula(Molecule.structure).label('formula'),
mol_inchikey(Molecule.structure).label('inchikey')
).all()Molalchemy provides utilities for Alembic integration.For automatic import handling in migrations, the library provides type rendering utilities that ensure proper import statements are generated for molalchemy types.
# ...
from molalchemy import alembic_helpers
# ...
def run_migrations_offline():
# ...
context.configure(
# ...
render_item=alembic_helpers.render_item,
)
# ...
def run_migrations_online():
# ...
context.configure(
# ...
render_item=alembic_helpers.render_item,
)
# ...- Clone the repository:
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy- Install dependencies:
uv sync- Activate the virtual environment:
source .venv/bin/activate# Run all tests with coverage
make test
# Or use uv directly
uv run pytest
# Run specific test module
uv run pytest tests/bingo/
# Run with coverage
uv run pytest --cov=src/molalchemyThis project uses modern Python development tools:
- uv: For virtual environment and dependency management
- Ruff: For linting and formatting
- pytest: For testing
The chemical function bindings are automatically generated from cartridge documentation:
# Update RDKit function bindings
make update-rdkit-func
# Update Bingo function bindings
make update-bingo-func
# Update all function bindings
make update-func- π Project Roadmap - Development phases, timeline, and contribution opportunities
- π€ Contributing Guide - How to contribute to the project
- π§ API Reference - Complete API documentation
- π³ Bingo Manual - Bingo PostgreSQL cartridge guide
- βοΈ RDKit Manual - RDKit PostgreSQL cartridge guide
We welcome contributions! molalchemy offers many opportunities for developers interested in chemical informatics:
- π° New to the project? Check out good first issues
- π¬ Chemical expertise? Help complete RDKit integration or add ChemAxon support
- π³ DevOps skills? Optimize our Docker containers and CI/CD pipeline
- π Love documentation? Create tutorials and improve API docs
Read our Contributing Guide for detailed instructions on getting started.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- RDKit - Open-source cheminformatics toolkit
- Bingo - Chemical database cartridge by EPAM
- SQLAlchemy - Python SQL toolkit and ORM
- GeoAlchemy2 - Spatial extension for SQLAlchemy, served as architectural inspiration for cartridge integration patterns
- ord-schema - Open Reaction Database schema, is one of the few projects using custom chemical types with SQLAlchemy
- Riccardo Vianello - His work on django-rdkit and razi provided valuable insights for chemical database integration (discovered after starting this project)
- Author: Anton Siomchen
- Email: anton.siomchen+molalchemy@gmail.com
- GitHub: @asiomchen
- LinkedIn: Anton Siomchen
molalchemy - Making chemical databases as easy as regular databases! π§ͺβ¨