Skip to content

non relevant output smiles #2

@erbb2

Description

@erbb2

Hi,

i was trying to train a model from chembl database. successfully trained without any issues. but when i try to optimize the model.
The smiles output that generated seemed very irrelevant. ie suppose if i give valid.txt as smiles
O=S(=O)(c1cccc2cnccc12)N1CCCNCC1
Cc1ccc(NC(=O)c2ccc(CN3CCN(C)CC3)cc2)cc1Nc1nccc(-c2cccnc2)n1
CO[C@H]1C[C@@h]2CCC@@HC@@(O2)C(=O)C(=O)N2CCCC[C@H]2C(=O)OC@HCC(=O)C@H/C=C(\C)C@@HC@@HC(=O)C@HCC@H/C=C/C=C/C=C/1C

the output that generates using optimize.py code is
CCCCCCOc1ccc(-c2ccnnc2O)c2c1C1C(=O)CC(O)C(=O)C1N2
CC(CCNCc1ccc(O)cc1)(CCNCc1ccc(O)cc1)CPHOCC1CCCO1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CNCCCNCCCCCNCCCCCN1CCC(C)(CCCCOc2ccc(Cn3ccnn3)cc2CN)CC1
CC(CCNCc1ccc(O)cc1)(CCNCc1ccc(O)cc1)CP(=O)(OCCOc1non+c1O)OCCN1C=CC(O)=NC1
CN
CCc1ccc(Cn2nc(CNc3ccc(CC)cc3)cc2O)cc1
CN

as u can see some smiles are lengthy C's or some short CN. why such a scenario occur any idea?
Like what are the possibilites such issues occur. Large dataset? cause chembl has 1.2M dataset.
Also have u tried on chembl dataset?

Your help would be really appreciated.
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions