-
Notifications
You must be signed in to change notification settings - Fork 10
non relevant output smiles #2
Description
Hi,
i was trying to train a model from chembl database. successfully trained without any issues. but when i try to optimize the model.
The smiles output that generated seemed very irrelevant. ie suppose if i give valid.txt as smiles
O=S(=O)(c1cccc2cnccc12)N1CCCNCC1
Cc1ccc(NC(=O)c2ccc(CN3CCN(C)CC3)cc2)cc1Nc1nccc(-c2cccnc2)n1
CO[C@H]1C[C@@h]2CCC@@HC@@(O2)C(=O)C(=O)N2CCCC[C@H]2C(=O)OC@HCC(=O)C@H/C=C(\C)C@@HC@@HC(=O)C@HCC@H/C=C/C=C/C=C/1C
the output that generates using optimize.py code is
CCCCCCOc1ccc(-c2ccnnc2O)c2c1C1C(=O)CC(O)C(=O)C1N2
CC(CCNCc1ccc(O)cc1)(CCNCc1ccc(O)cc1)CPHOCC1CCCO1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CNCCCNCCCCCNCCCCCN1CCC(C)(CCCCOc2ccc(Cn3ccnn3)cc2CN)CC1
CC(CCNCc1ccc(O)cc1)(CCNCc1ccc(O)cc1)CP(=O)(OCCOc1non+c1O)OCCN1C=CC(O)=NC1
CN
CCc1ccc(Cn2nc(CNc3ccc(CC)cc3)cc2O)cc1
CN
as u can see some smiles are lengthy C's or some short CN. why such a scenario occur any idea?
Like what are the possibilites such issues occur. Large dataset? cause chembl has 1.2M dataset.
Also have u tried on chembl dataset?
Your help would be really appreciated.
Thank you