Skip to content

js-ish/tsme

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tsme

This repository provides the code and resources supporting the manuscript "Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation" (under submission). It implements a deep learning framework that integrates patient genomic sequencing data with compound structural information, trained on phenotypic drug sensitivity profiles from lung cancer patient-derived organoids (PDOs), to enable rapid and accurate prediction of antitumor drug responses. The model achieves 81.6% prediction accuracy, validated through PDO experiments and a clinical lung cancer cohort, facilitating individualized therapy, novel compound evaluation, and drug repurposing in precision oncology.

Train

import os
import random
import torch
from torch import nn
import torch.optim as optim

from tsme import tokenizers as tkz

from tsme import models, datasets

tokenizer = tkz.PmtedTokenizer.from_jsonfile(tkz.PmtedTokenizer.DEFAULT_CONF)
ds = datasets.TsmeBase(smi_tokenizer=tokenizer)
smiles = ["c1cccc1c", "CC[N+](C)(C)Cc1ccccc1Br"]
mutations = [[random.choice([0, 1]) for _ in range(3008)]] * 2
# mutations contains 0/1 encoding information of the genome
values = [0.85, 0.78]
mut_x, smi_src, smi_tgt, out = ds(mutations, smiles, values)

# Regression train
model = models.Tsme(models.Tsme.DEFUALT_CONF)
model.load_pretrained(
    torch.load('src/moltx.ckpt', map_location=torch.device('cpu'))
)
mse_loss = nn.MSELoss()
optimizer = optim.Adam(
    model.parameters(),
    lr=1e-04,
    foreach=False
    )
optimizer.zero_grad()
pred = model(src=smi_src, tgt=smi_tgt, mutation=mut_x)
loss = mse_loss(pred, out)
loss.backward()
optimizer.step()

torch.save(model.state_dict(), '/path/to/tsme.ckpt')

Inference

import random
from tsme import tokenizers as tkz
from tsme import models, datasets
from tsme import pipelines, models

# Regression
tokenizer = tkz.PmtedTokenizer.from_jsonfile(tkz.PmtedTokenizer.DEFAULT_CONF)
model = models.Tsme(models.Tsme.DEFUALT_CONF)
model.load_state_dict(
    torch.load('/path/to/tsme.ckpt', map_location=torch.device("cpu"))
)
pipeline = pipelines.TsmeTeg(
    smi_tokenizer=tokenizer, model=model
    )
mutations = [random.choice([0, 1]) for _ in range(3008)]
smiles = "CC[N+](C)(C)Cc1ccccc1Br"
predict = pipeline(mut=mutations, smi=smiles) # e.g. 0.85

others

Our framework's embeddings draw from two key references: genotype embeddings are adapted from the DrugCell model (Ma et al., Cancer Cell, 2020; https://pubmed.ncbi.nlm.nih.gov/33096023/; GitHub: https://github.com/idekerlab/DrugCell), which represents tumor genotypes as binary mutation vectors for ~3,000 frequently mutated genes, processed via a visible neural network (VNN) mirroring gene ontology hierarchies to encode cellular subsystems; compound embeddings leverage our AdaMR model (arxiv:2401.06166; GitHub: https://github.com/js-ish/MolTx), utilizing adaptive multi-resolution representation learning through molecular canonicalization pre-training for substructure- and atomic-level encoding of chemical structures, enhancing predictive and generative tasks.

Sample datasets for model demonstration are located in the project's example/ directory. For core datasets used in training process, access requires approval through the NGDC platform.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages