Skip to content

Dev for pr branch #6

Open
OmerKfir19695 wants to merge 7 commits intomainfrom
dev-for-pr
Open

Dev for pr branch #6
OmerKfir19695 wants to merge 7 commits intomainfrom
dev-for-pr

Conversation

@OmerKfir19695
Copy link
Collaborator

@OmerKfir19695 OmerKfir19695 commented Feb 12, 2026

Add Reaction Network Generation with CatPred Kinetics Estimation

This PR implements the core BEES reaction network generation engine with integrated CatPred kinetics estimation following the methodology from the CatPred paper.

Here is the main changes from main branch ( which are quite a bit):

Database: 37,119 reactions, filtered from BKMS datbases (BRENDA, KEGG, MetaCyc, SABIO-RK).
Add over when 17K compound different SMILES .
Chemical ontology (96 aliases) for flexible compound name matching
Engine: Automated network discovery from enzyme-substrate pairs with EC class-based product inference
kinetics estimation: build-in adaptor to Catpred - an ML-based kinetics estimation when database values are unavailable ( now it's anytime)
Examples & Tests: Glycolysis, Fatty Acid Synthesis pathways and comprehensive test suite. Attached to this folder file markdown with all necessry explanations.

More key features/modules :

  • Main modules for reaction engine - bees/model_generator.py, reaction_template.py:
    Iterative network discovery: products from one reaction become substrates for the next
    EC class-based templates for 7 enzyme classes (oxidoreductases, transferases, hydrolases, lyases, isomerases, ligases, translocases)
    Intelligent cofactor handling (ATP, NADH, CoA) with domain-specific substitutions
    Stoichiometry validation and Michaelis-Menten rate law assignment

-CatPred kinetics estimation Module - bees/kinetics_estimator.py:
Following the CatPred methodology:
kcat: Concatenated SMILES of all reactants → one kcat value per reaction
Km: Individual substrate SMILES → multiple Km values per reaction (one per substrate)
Optional prediction uncertainty (standard deviation) estimation
Configurable via environment variables (CATPRED_DIR, CATPRED_CHECKPOINT_BASE, CATPRED_CONDA_ENV)

  • Settings able to the user enter SMILES in the middle of the run if the SMILES of the compunds not found in the Db.

Example output ( not include the SD, optional for now and able to add it in the input):
Reaction: ATP + citrate + CoA → ADP + phosphate + acetyl-CoA + oxaloacetateKinetics:
Km(ATP)=0.043 mM, Km(citrate)=0.097 mM, Km(CoA)=0.008 mM, kcat(ATP_Citrate_Lyase)=88.8 1/s (source=catpred)

Example Projects (projects/)
Glycolysis: need to complete 10-step pathway ( Glucose -> Pyruvate)
Fatty Acid Synthesis: need to complete 3-step pathway ( Citrate -> Palmitic acid)

Please try those project ( don't forgat to install catpred as decribed before, pls contact me if there is a problam with that)

THX!

This is the first part to enlarge the software. Here is my main steps:

- Import 37,119 reactions from BKMS dataset (BRENDA, KEGG, MetaCyc, SABIO-RK). I filterd some of the reaction if they complete in the the database.
- Add compound SMILES mappings from PubChem. here I have only around 55% from the species ( exclude enzyme that get EC number)
- Implement ReactionDatabase class for querying by enzyme/substrate
- Add chemical ontology (96 aliases) for compound name matching
from dataclasses import dataclass
import os
from types import SimpleNamespace
import json
Comment on lines +27 to +32
from bees.common import (
BEES_PATH,
GENERAL_COFACTORS,
get_ontology_equivalents,
get_coenzyme_like_flags,
)
- Implement the iterative reaction network discovery. next PR will be filtration as well ( try to mimic simulation)
- Add ReactionTemplate with EC class-based product inference
- Support for cofactor handling and domain-specific substitutions
- Validate reactant availability across reaction pathways
- Generate stoichiometry and rate laws from database matches
CatPred tool use as the brain for my kinetic estimator, able to predict Km, Kcat and Ki.
for that here is my main changes : 

- Integrate CatPred for kcat and Km prediction when DB values missing. right now it's all the time!
- kcat: concatenated SMILES of all reactants .
- Km: individual substrate SMILES (multiple Km per reaction)
- Support prediction uncertainty (SD) estimation
- Environment-based configuration for CatPred paths
- Change the test suite for common utilities, logger, main, schema, and reaction templates

- Add Test chemical ontology and path globalization
OmerKfir19695 and others added 2 commits February 16, 2026 13:45
- Add Glycolysis, FattyAcidSynthesis, and Minimal example projects
- Move examples/ to projects/ for clearer organization
- Add comprehensive demo and commented template
- Include example output showing CatPred estimation results
- Add project folder README with usage guidelines

Co-authored-by: Cursor <cursoragent@cursor.com>
Did some cleaning changes for the PR

- Improve INFO-level logging

- Update README with CatPred integration docs and env var setup
- Remove obsolete BEES.bat and kinetic_database.py
- Fix user-specific paths in example inputs (use relative paths)
- Update main.py to handle project directory resolution properly
fixed bug from from PR recommendtion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant