
Diagram of the generative model (WGAN) and the optimization procedure
UTRGAN is a deep learning-based model for novel 5' UTR sequence generation and optimization. The model integrates:
- WGAN-GP architecture for the generative model
- Xpresso model for optimizing TPM expression
- FramePool model for optimizing Mean Ribosome Load (MRL)
- MTtrans model for optimizing Translation Efficiency (TE)
UTRGAN enables researchers to design and optimize 5' UTR sequences for improved gene expression and translation efficiency, with applications in biotechnology and synthetic biology.
- Sina Barazandeh
- Furkan Ozden
- Ahmet Hincer
- Urartu Ozgur Safak Seker
- A. Ercument Cicek
UTRGAN requires specific dependencies which can be easily installed using the provided conda environment file.
For easy setup, use the provided environment file:
# Create and activate conda environment
conda env create --name utrgan -f environment.yml
conda activate utrganThe environment includes:
- tensorflow-gpu 2.14
- pytorch 2.0
- cudatoolkit 11.8
- biopython
- pandas
- scikit-learn
- seaborn
- and other dependencies
Note: The provided environment file is configured for Linux systems. MacOS users may need to adjust package versions accordingly.
Update (April 2025): The latest version of UTRGAN retrieves latest version of the gene information, including 5' UTR, TSS, and sequence of the genes querying the Ensembl Biomart API. Variance in the results are expected if the information obtained from the API changes. Please note that the API might sometimes fail, in that case, please wait a few seconds and try running the gene expression optimization code again.
Note: We encourage trying optimization with different initializations to get more diverse sequences and select the best results. Since usually higher batch size does not fit in the GPU, you can alternatively try running the code multiple times and use the best sequences overall.
Important: You can run scripts both from the root directory or from their respective directories as indicated below.
To train the WGAN model:
python train.py [-gpu GPU_IDS] [-bs BATCH_SIZE] [-d DATASET_PATH] [-lr LEARNING_RATE]Arguments:
-gpu: GPUs to use (sets CUDA_VISIBLE_DEVICES); uses CPU by default-bs, --batch_size: Batch size (default: 64)-d, --dataset: Path to CSV file with UTR samples (default: './../../data/utrdb2.csv')-lr, --learning_rate: Learning rate exponent (default: 5 for 1e-5)
Run the optimize_te_mrl.ipynb file in the root folder:
You can change the following parameters for different results. See the details further below for the meaning of the parameters
BATCH_SIZE = 64
TASK = "mrl" # use "mrl" for MLR optimization or "te" for TE optimization
GPU = '-1'
STEPS = 10Run the exp_optimization_single.ipynb file in the root folder:
You can change the following parameters for different results. See the details further below for the meaning of the parameters
BATCH_SIZE = 500
GENE = 'VEGFA'
GC_LIMIT = -1.00
LR = 0.005
GPU = '0'
STEPS = 2Run the exp_optimization_multiple.ipynb file in the root folder:
You can change the following parameters for different results. See the details further below for the meaning of the parameters
BATCH_SIZE = 100
N_GENES = 8
LR = 0.001
GPU = '0'
STEPS = 10
gene_names = ["MYOC", "TIGD4", "ATP6V1B2", "TAGLN", "COX7A2L", "IFNGR2", "TNFRSF21", "SETD6"]Optimize 5' UTR sequences for a single gene:
Important: Gene name is required here.
python ./src/exp_optimization/single-gene.py [-gpu GPU_IDS] [-g GENE_NAME] [-lr LEARNING_RATE] [-s STEPS] [-gc GC_CONTENT] [-bs BATCH_SIZE]Arguments:
-gpu: GPUs to use (-1: no gpu, ow: any gpu)-lr: Learning rate (default: 3e-5)-g: Gene Symbol/Name-s: Number of optimization iterations (default: 3,000)-gc: Upper limit for GC content percentage (default: no limit)-bs: Number of 5' UTR sequences to generate (default: 128)
Optimize 5' UTR sequences for multiple genes:
python ./src/exp_optimization/multiple-genes.py [-gpu GPU] [-g GENE_NAMES] [-lr LEARNING_RATE] [-s STEPS] [-bs BATCH_SIZE]Arguments:
-gpu: GPUs to use-g: Gene names separated by comma (e.g., "TLR6,INFG,TP53,TNF")-lr: Learning rate (default: 3e-5)-s: Number of optimization iterations (default: 3,000)-bs: Number of 5' UTRs to optimize per DNA (default: 100)
Jointly optimize translation efficiency and gene expression:
python ./src/exp_optimization/joint_opt.py [-gpu GPU] [-g GENE_NAME] [-s STEPS] [-lr LEARNING_RATE] [-bs BATCH_SIZE]Arguments:
-gpu: GPUs to use-g: Gene names separated by comma (e.g., TLR6,INFG,TP53,TNF)-s: Number of iterations for each optimization step (default: 1,000)-lr: Learning rate (default: 3e-5)-bs: Number of 5' UTRs to optimize per DNA (default: 100)
Optimize 5' UTRs for high Mean Ribosome Load or Translation Efficiency:
python ./src/mrl_te_optimization/optimize_te_mrl.py [-lr LEARNING_RATE] [-task TASK] [-s ITERATIONS] [-bs BATCH_SIZE]Arguments:
-lr: Learning rate (default: 3e-5)-s: Number of Iterations (default: 10000)-task: Optimization target - either "te" or "mrl"-bs: Number of 5' UTRs to optimize (default: 128)
Note: For statistical tests, larger batch sizes (up to 8192) can be used with different seeds
Run the optimize_te_mrl.ipynb file in the root folder:
You can change the following parameters for different results
BATCH_SIZE = 64
TASK = "mrl"
GPU = '-1'
STEPS = 10If you use UTRGAN in your research, please cite our paper:
[Citation information will be added upon publication]
- CC BY-NC-SA 2.0
- Copyright 2025 © UTRGAN
- Free for academic use
- For commercial licensing inquiries, please contact the authors
- For questions and comments: sina.barazandeh@bilkent.edu.tr
- For licensing inquiries: cicek@cs.bilkent.edu.tr
Related Links: