tsme

This repository provides the code and resources supporting the manuscript "Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation" (under submission). It implements a deep learning framework that integrates patient genomic sequencing data with compound structural information, trained on phenotypic drug sensitivity profiles from lung cancer patient-derived organoids (PDOs), to enable rapid and accurate prediction of antitumor drug responses. The model achieves 81.6% prediction accuracy, validated through PDO experiments and a clinical lung cancer cohort, facilitating individualized therapy, novel compound evaluation, and drug repurposing in precision oncology.

Train

import os
import random
import torch
from torch import nn
import torch.optim as optim

from tsme import tokenizers as tkz

from tsme import models, datasets

tokenizer = tkz.PmtedTokenizer.from_jsonfile(tkz.PmtedTokenizer.DEFAULT_CONF)
ds = datasets.TsmeBase(smi_tokenizer=tokenizer)
smiles = ["c1cccc1c", "CC[N+](C)(C)Cc1ccccc1Br"]
mutations = [[random.choice([0, 1]) for _ in range(3008)]] * 2
# mutations contains 0/1 encoding information of the genome
values = [0.85, 0.78]
mut_x, smi_src, smi_tgt, out = ds(mutations, smiles, values)

# Regression train
model = models.Tsme(models.Tsme.DEFUALT_CONF)
model.load_pretrained(
    torch.load('src/moltx.ckpt', map_location=torch.device('cpu'))
)
mse_loss = nn.MSELoss()
optimizer = optim.Adam(
    model.parameters(),
    lr=1e-04,
    foreach=False
    )
optimizer.zero_grad()
pred = model(src=smi_src, tgt=smi_tgt, mutation=mut_x)
loss = mse_loss(pred, out)
loss.backward()
optimizer.step()

torch.save(model.state_dict(), '/path/to/tsme.ckpt')

Inference

import random
from tsme import tokenizers as tkz
from tsme import models, datasets
from tsme import pipelines, models

# Regression
tokenizer = tkz.PmtedTokenizer.from_jsonfile(tkz.PmtedTokenizer.DEFAULT_CONF)
model = models.Tsme(models.Tsme.DEFUALT_CONF)
model.load_state_dict(
    torch.load('/path/to/tsme.ckpt', map_location=torch.device("cpu"))
)
pipeline = pipelines.TsmeTeg(
    smi_tokenizer=tokenizer, model=model
    )
mutations = [random.choice([0, 1]) for _ in range(3008)]
smiles = "CC[N+](C)(C)Cc1ccccc1Br"
predict = pipeline(mut=mutations, smi=smiles) # e.g. 0.85

others

Our framework's embeddings draw from two key references: genotype embeddings are adapted from the DrugCell model (Ma et al., Cancer Cell, 2020; https://pubmed.ncbi.nlm.nih.gov/33096023/; GitHub: https://github.com/idekerlab/DrugCell), which represents tumor genotypes as binary mutation vectors for ~3,000 frequently mutated genes, processed via a visible neural network (VNN) mirroring gene ontology hierarchies to encode cellular subsystems; compound embeddings leverage our AdaMR model (arxiv:2401.06166; GitHub: https://github.com/js-ish/MolTx), utilizing adaptive multi-resolution representation learning through molecular canonicalization pre-training for substructure- and atomic-level encoding of chemical structures, enhancing predictive and generative tasks.

Sample datasets for model demonstration are located in the project's example/ directory. For core datasets used in training process, access requires approval through the NGDC platform.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
tsme		tsme
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tsme

Train

Inference

others

About

Uh oh!

Releases

Packages

Languages

License

js-ish/tsme

Folders and files

Latest commit

History

Repository files navigation

tsme

Train

Inference

others

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages