Open
Conversation
This is the first part to enlarge the software. Here is my main steps: - Import 37,119 reactions from BKMS dataset (BRENDA, KEGG, MetaCyc, SABIO-RK). I filterd some of the reaction if they complete in the the database. - Add compound SMILES mappings from PubChem. here I have only around 55% from the species ( exclude enzyme that get EC number) - Implement ReactionDatabase class for querying by enzyme/substrate - Add chemical ontology (96 aliases) for compound name matching
bees/model_generator.py
Outdated
| from dataclasses import dataclass | ||
| import os | ||
| from types import SimpleNamespace | ||
| import json |
d243d5e to
717f11f
Compare
6d8dcc8 to
e0a23c6
Compare
- Implement the iterative reaction network discovery. next PR will be filtration as well ( try to mimic simulation) - Add ReactionTemplate with EC class-based product inference - Support for cofactor handling and domain-specific substitutions - Validate reactant availability across reaction pathways - Generate stoichiometry and rate laws from database matches
CatPred tool use as the brain for my kinetic estimator, able to predict Km, Kcat and Ki. for that here is my main changes : - Integrate CatPred for kcat and Km prediction when DB values missing. right now it's all the time! - kcat: concatenated SMILES of all reactants . - Km: individual substrate SMILES (multiple Km per reaction) - Support prediction uncertainty (SD) estimation - Environment-based configuration for CatPred paths
- Change the test suite for common utilities, logger, main, schema, and reaction templates - Add Test chemical ontology and path globalization
e0a23c6 to
4a316a3
Compare
- Add Glycolysis, FattyAcidSynthesis, and Minimal example projects - Move examples/ to projects/ for clearer organization - Add comprehensive demo and commented template - Include example output showing CatPred estimation results - Add project folder README with usage guidelines Co-authored-by: Cursor <cursoragent@cursor.com>
Did some cleaning changes for the PR - Improve INFO-level logging - Update README with CatPred integration docs and env var setup - Remove obsolete BEES.bat and kinetic_database.py - Fix user-specific paths in example inputs (use relative paths) - Update main.py to handle project directory resolution properly
4a316a3 to
68fe941
Compare
fixed bug from from PR recommendtion
68fe941 to
d5ede3a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Reaction Network Generation with CatPred Kinetics Estimation
This PR implements the core BEES reaction network generation engine with integrated CatPred kinetics estimation following the methodology from the CatPred paper.
Here is the main changes from main branch ( which are quite a bit):
Database: 37,119 reactions, filtered from BKMS datbases (BRENDA, KEGG, MetaCyc, SABIO-RK).
Add over when 17K compound different SMILES .
Chemical ontology (96 aliases) for flexible compound name matching
Engine: Automated network discovery from enzyme-substrate pairs with EC class-based product inference
kinetics estimation: build-in adaptor to Catpred - an ML-based kinetics estimation when database values are unavailable ( now it's anytime)
Examples & Tests: Glycolysis, Fatty Acid Synthesis pathways and comprehensive test suite. Attached to this folder file markdown with all necessry explanations.
More key features/modules :
Iterative network discovery: products from one reaction become substrates for the next
EC class-based templates for 7 enzyme classes (oxidoreductases, transferases, hydrolases, lyases, isomerases, ligases, translocases)
Intelligent cofactor handling (ATP, NADH, CoA) with domain-specific substitutions
Stoichiometry validation and Michaelis-Menten rate law assignment
-CatPred kinetics estimation Module - bees/kinetics_estimator.py:
Following the CatPred methodology:
kcat: Concatenated SMILES of all reactants → one kcat value per reaction
Km: Individual substrate SMILES → multiple Km values per reaction (one per substrate)
Optional prediction uncertainty (standard deviation) estimation
Configurable via environment variables (CATPRED_DIR, CATPRED_CHECKPOINT_BASE, CATPRED_CONDA_ENV)
Example output ( not include the SD, optional for now and able to add it in the input):
Reaction: ATP + citrate + CoA → ADP + phosphate + acetyl-CoA + oxaloacetateKinetics:
Km(ATP)=0.043 mM, Km(citrate)=0.097 mM, Km(CoA)=0.008 mM, kcat(ATP_Citrate_Lyase)=88.8 1/s (source=catpred)
Example Projects (projects/)
Glycolysis: need to complete 10-step pathway ( Glucose -> Pyruvate)
Fatty Acid Synthesis: need to complete 3-step pathway ( Citrate -> Palmitic acid)
Please try those project ( don't forgat to install catpred as decribed before, pls contact me if there is a problam with that)
THX!