RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models

Abstract

The SELEX method for RNA aptamer discovery faces challenges in time and cost. While deep generative models have emerged for in silico design, existing methods have not fully leveraged the step-wise sequence enrichment process unique to SELEX, potentially limiting generation quality and fidelity. To address this, we propose RaptEvo, a novel diffusion model-based framework that learns from all-round SELEX data to model the sequence evolution accompanying round progression. We compared RaptEvo with a model trained only on the final round (Single-Round) using three different SELEX datasets. Analyses using t-SNE and Hamming distance showed that RaptEvo more faithfully reproduced the real sequence distribution, demonstrating significantly higher fidelity. Furthermore, a model trained only on preceding rounds (excluding the final) still generated higher-quality sequences than the Single-Round model, suggesting the potential to predict final enrichment results from the evolutionary trajectory. These results highlight the importance of utilizing time-series SELEX data. RaptEvo offers a new computational foundation for accelerating aptamer development by generating high-quality sequences and demonstrating the potential for process efficiency through in silico simulation.

Installation

Prerequisites

Python 3.8+
CUDA-compatible GPU (recommended)
Conda (recommended for environment management)

Setup

Clone the repository:

git clone https://github.com/your-username/RaptEvo.git
cd RaptEvo

Create and activate environment:

conda create -n raptevo python=3.8
conda activate raptevo
pip install -r requirements.txt

Quick Start

# Train an All-Round model
python src/train.py --config configs/paper_experiments/SELEX_A/all_round.yaml

# Generate sequences from trained model
python src/sample.py --config configs/paper_experiments/SELEX_A/all_round.yaml \
    --model_path results/SELEX_A/training_samples/all_round/best_model.pt

Usage

Training

python src/train.py --config <path/to/config.yaml>

Using Shell Scripts

# Training suite (All-Round + Single-Round models)
./scripts/run_training.sh

# Single training run (All-Round or Single-Round)
run_training.sh

Sampling

Generate sequences from a trained model:

python src/sample.py \
    --config <path/to/config.yaml> \
    --model_path <path/to/model.pt> \
    --samples <num_samples_per_round>

Using Shell Scripts

# Basic sampling
run_sampling.sh

# Batch generation (All-Round + Single-Round)
./scripts/run_generation_suite.sh

# Single-Round only generation
./scripts/run_generation_single.sh

# Pred-Round generation
./scripts/run_prediction.sh

Project Structure

RaptEvo/
├── configs/
│   ├── paper_experiments/    # Paper reproduction configs (SELEX_A, B, C)
│   │   ├── SELEX_A/          # 35nt random region
│   │   ├── SELEX_B/          # 30nt random region
│   │   └── SELEX_C/          # 20nt random region
│   └── templates/            # Templates for new projects
├── src/                      # Source code
│   ├── train.py              # Training script
│   ├── sample.py             # Sampling script
│   ├── dataloader.py         # Data loading utilities
│   └── models/               # Model definitions
│       ├── diffusion.py      # Diffusion model
│       └── unet.py           # U-Net architecture
├── scripts/                  # Shell scripts for batch execution
├── data/                     # Data directory (user-provided)
└── results/                  # Output directory

Configuration

Paper Experiments

To reproduce the experiments from the paper, use the configurations in configs/paper_experiments/:

SELEX Dataset	Random Region	Config Directory
SELEX_A	35nt	`configs/paper_experiments/SELEX_A/`
SELEX_B	30nt	`configs/paper_experiments/SELEX_B/`
SELEX_C	20nt	`configs/paper_experiments/SELEX_C/`

Each directory contains:

all_round.yaml: All-Round model training
pred_round.yaml: Pred-Round model training
single_round.yaml: Single-Round model training

Custom Experiments

To set up a new SELEX dataset:

Copy the template directory:

cp -r configs/templates configs/MY_SELEX

Modify main.yaml:
- Set selex_name to your dataset name
- Adjust sequence_length and image_size to match your data
Update experiment configs in the experiments/ subdirectory as needed.

Key Configuration Parameters

Parameter	Description
`selex_name`	Dataset identifier
`sequence_length`	Length of random region (nt)
`image_size`	Must match `sequence_length`
`epochs`	Number of training epochs
`batch_size`	Training batch size
`learning_rate`	Initial learning rate
`timesteps`	Diffusion timesteps

Outputs

Training

results/{selex_name}/training_samples/{experiment_type}/
├── training_info.txt           # Training configuration log
├── epoch_{N}/
│   ├── {model_name}.pt         # Model checkpoint
│   └── sampled_sequences_epoch_{N}.csv
└── best_model.pt               # Best model (lowest validation loss)

Sampling

results/{selex_name}/generated_samples/{experiment_type}/
└── sampled_sequences_R{round}.csv

Data Availability

The SELEX datasets used in this study are available from the DRA (DNA Data Bank of Japan Sequence Read Archive):

SELEX Dataset	DRA Accession
SELEX_A	`[PLACEHOLDER]`
SELEX_B	`[PLACEHOLDER]`
SELEX_C	`[PLACEHOLDER]`

Citation

If you use RaptEvo in your research, please cite:

@article{raptevo2025,
  title={RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models},
  author={Matsumoto, Hidenori and Nakano, Ryota and Nakamura, Yoshikazu and Adachi, Tatsuo and Sato, Kengo and Hamada, Michiaki},
  journal={},
  year={2025},
  doi={}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

[PLACEHOLDER]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models

Abstract

Table of Contents

Installation

Prerequisites

Setup

Quick Start

Usage

Training

Using Shell Scripts

Sampling

Using Shell Scripts

Project Structure

Configuration

Paper Experiments

Custom Experiments

Key Configuration Parameters

Outputs

Training

Sampling

Data Availability

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_sampling.sh		run_sampling.sh
run_training.sh		run_training.sh

License

hmdlab/RaptEvo

Folders and files

Latest commit

History

Repository files navigation

RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models

Abstract

Table of Contents

Installation

Prerequisites

Setup

Quick Start

Usage

Training

Using Shell Scripts

Sampling

Using Shell Scripts

Project Structure

Configuration

Paper Experiments

Custom Experiments

Key Configuration Parameters

Outputs

Training

Sampling

Data Availability

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages