The SELEX method for RNA aptamer discovery faces challenges in time and cost. While deep generative models have emerged for in silico design, existing methods have not fully leveraged the step-wise sequence enrichment process unique to SELEX, potentially limiting generation quality and fidelity. To address this, we propose RaptEvo, a novel diffusion model-based framework that learns from all-round SELEX data to model the sequence evolution accompanying round progression. We compared RaptEvo with a model trained only on the final round (Single-Round) using three different SELEX datasets. Analyses using t-SNE and Hamming distance showed that RaptEvo more faithfully reproduced the real sequence distribution, demonstrating significantly higher fidelity. Furthermore, a model trained only on preceding rounds (excluding the final) still generated higher-quality sequences than the Single-Round model, suggesting the potential to predict final enrichment results from the evolutionary trajectory. These results highlight the importance of utilizing time-series SELEX data. RaptEvo offers a new computational foundation for accelerating aptamer development by generating high-quality sequences and demonstrating the potential for process efficiency through in silico simulation.
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Conda (recommended for environment management)
-
Clone the repository:
git clone https://github.com/your-username/RaptEvo.git cd RaptEvo -
Create and activate environment:
conda create -n raptevo python=3.8 conda activate raptevo pip install -r requirements.txt
# Train an All-Round model
python src/train.py --config configs/paper_experiments/SELEX_A/all_round.yaml
# Generate sequences from trained model
python src/sample.py --config configs/paper_experiments/SELEX_A/all_round.yaml \
--model_path results/SELEX_A/training_samples/all_round/best_model.ptpython src/train.py --config <path/to/config.yaml># Training suite (All-Round + Single-Round models)
./scripts/run_training.sh
# Single training run (All-Round or Single-Round)
run_training.shGenerate sequences from a trained model:
python src/sample.py \
--config <path/to/config.yaml> \
--model_path <path/to/model.pt> \
--samples <num_samples_per_round># Basic sampling
run_sampling.sh
# Batch generation (All-Round + Single-Round)
./scripts/run_generation_suite.sh
# Single-Round only generation
./scripts/run_generation_single.sh
# Pred-Round generation
./scripts/run_prediction.shRaptEvo/
├── configs/
│ ├── paper_experiments/ # Paper reproduction configs (SELEX_A, B, C)
│ │ ├── SELEX_A/ # 35nt random region
│ │ ├── SELEX_B/ # 30nt random region
│ │ └── SELEX_C/ # 20nt random region
│ └── templates/ # Templates for new projects
├── src/ # Source code
│ ├── train.py # Training script
│ ├── sample.py # Sampling script
│ ├── dataloader.py # Data loading utilities
│ └── models/ # Model definitions
│ ├── diffusion.py # Diffusion model
│ └── unet.py # U-Net architecture
├── scripts/ # Shell scripts for batch execution
├── data/ # Data directory (user-provided)
└── results/ # Output directory
To reproduce the experiments from the paper, use the configurations in configs/paper_experiments/:
| SELEX Dataset | Random Region | Config Directory |
|---|---|---|
| SELEX_A | 35nt | configs/paper_experiments/SELEX_A/ |
| SELEX_B | 30nt | configs/paper_experiments/SELEX_B/ |
| SELEX_C | 20nt | configs/paper_experiments/SELEX_C/ |
Each directory contains:
all_round.yaml: All-Round model trainingpred_round.yaml: Pred-Round model trainingsingle_round.yaml: Single-Round model training
To set up a new SELEX dataset:
-
Copy the template directory:
cp -r configs/templates configs/MY_SELEX
-
Modify
main.yaml:- Set
selex_nameto your dataset name - Adjust
sequence_lengthandimage_sizeto match your data
- Set
-
Update experiment configs in the
experiments/subdirectory as needed.
| Parameter | Description |
|---|---|
selex_name |
Dataset identifier |
sequence_length |
Length of random region (nt) |
image_size |
Must match sequence_length |
epochs |
Number of training epochs |
batch_size |
Training batch size |
learning_rate |
Initial learning rate |
timesteps |
Diffusion timesteps |
results/{selex_name}/training_samples/{experiment_type}/
├── training_info.txt # Training configuration log
├── epoch_{N}/
│ ├── {model_name}.pt # Model checkpoint
│ └── sampled_sequences_epoch_{N}.csv
└── best_model.pt # Best model (lowest validation loss)
results/{selex_name}/generated_samples/{experiment_type}/
└── sampled_sequences_R{round}.csv
The SELEX datasets used in this study are available from the DRA (DNA Data Bank of Japan Sequence Read Archive):
| SELEX Dataset | DRA Accession |
|---|---|
| SELEX_A | [PLACEHOLDER] |
| SELEX_B | [PLACEHOLDER] |
| SELEX_C | [PLACEHOLDER] |
If you use RaptEvo in your research, please cite:
@article{raptevo2025,
title={RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models},
author={Matsumoto, Hidenori and Nakano, Ryota and Nakamura, Yoshikazu and Adachi, Tatsuo and Sato, Kengo and Hamada, Michiaki},
journal={},
year={2025},
doi={}
}This project is licensed under the MIT License - see the LICENSE file for details.
[PLACEHOLDER]