Skip to content

hmdlab/RaptEvo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models

License: MIT

Abstract

The SELEX method for RNA aptamer discovery faces challenges in time and cost. While deep generative models have emerged for in silico design, existing methods have not fully leveraged the step-wise sequence enrichment process unique to SELEX, potentially limiting generation quality and fidelity. To address this, we propose RaptEvo, a novel diffusion model-based framework that learns from all-round SELEX data to model the sequence evolution accompanying round progression. We compared RaptEvo with a model trained only on the final round (Single-Round) using three different SELEX datasets. Analyses using t-SNE and Hamming distance showed that RaptEvo more faithfully reproduced the real sequence distribution, demonstrating significantly higher fidelity. Furthermore, a model trained only on preceding rounds (excluding the final) still generated higher-quality sequences than the Single-Round model, suggesting the potential to predict final enrichment results from the evolutionary trajectory. These results highlight the importance of utilizing time-series SELEX data. RaptEvo offers a new computational foundation for accelerating aptamer development by generating high-quality sequences and demonstrating the potential for process efficiency through in silico simulation.

Table of Contents

Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • Conda (recommended for environment management)

Setup

  1. Clone the repository:

    git clone https://github.com/your-username/RaptEvo.git
    cd RaptEvo
  2. Create and activate environment:

    conda create -n raptevo python=3.8
    conda activate raptevo
    pip install -r requirements.txt

Quick Start

# Train an All-Round model
python src/train.py --config configs/paper_experiments/SELEX_A/all_round.yaml

# Generate sequences from trained model
python src/sample.py --config configs/paper_experiments/SELEX_A/all_round.yaml \
    --model_path results/SELEX_A/training_samples/all_round/best_model.pt

Usage

Training

python src/train.py --config <path/to/config.yaml>

Using Shell Scripts

# Training suite (All-Round + Single-Round models)
./scripts/run_training.sh

# Single training run (All-Round or Single-Round)
run_training.sh

Sampling

Generate sequences from a trained model:

python src/sample.py \
    --config <path/to/config.yaml> \
    --model_path <path/to/model.pt> \
    --samples <num_samples_per_round>

Using Shell Scripts

# Basic sampling
run_sampling.sh

# Batch generation (All-Round + Single-Round)
./scripts/run_generation_suite.sh

# Single-Round only generation
./scripts/run_generation_single.sh

# Pred-Round generation
./scripts/run_prediction.sh

Project Structure

RaptEvo/
├── configs/
│   ├── paper_experiments/    # Paper reproduction configs (SELEX_A, B, C)
│   │   ├── SELEX_A/          # 35nt random region
│   │   ├── SELEX_B/          # 30nt random region
│   │   └── SELEX_C/          # 20nt random region
│   └── templates/            # Templates for new projects
├── src/                      # Source code
│   ├── train.py              # Training script
│   ├── sample.py             # Sampling script
│   ├── dataloader.py         # Data loading utilities
│   └── models/               # Model definitions
│       ├── diffusion.py      # Diffusion model
│       └── unet.py           # U-Net architecture
├── scripts/                  # Shell scripts for batch execution
├── data/                     # Data directory (user-provided)
└── results/                  # Output directory

Configuration

Paper Experiments

To reproduce the experiments from the paper, use the configurations in configs/paper_experiments/:

SELEX Dataset Random Region Config Directory
SELEX_A 35nt configs/paper_experiments/SELEX_A/
SELEX_B 30nt configs/paper_experiments/SELEX_B/
SELEX_C 20nt configs/paper_experiments/SELEX_C/

Each directory contains:

  • all_round.yaml: All-Round model training
  • pred_round.yaml: Pred-Round model training
  • single_round.yaml: Single-Round model training

Custom Experiments

To set up a new SELEX dataset:

  1. Copy the template directory:

    cp -r configs/templates configs/MY_SELEX
  2. Modify main.yaml:

    • Set selex_name to your dataset name
    • Adjust sequence_length and image_size to match your data
  3. Update experiment configs in the experiments/ subdirectory as needed.

Key Configuration Parameters

Parameter Description
selex_name Dataset identifier
sequence_length Length of random region (nt)
image_size Must match sequence_length
epochs Number of training epochs
batch_size Training batch size
learning_rate Initial learning rate
timesteps Diffusion timesteps

Outputs

Training

results/{selex_name}/training_samples/{experiment_type}/
├── training_info.txt           # Training configuration log
├── epoch_{N}/
│   ├── {model_name}.pt         # Model checkpoint
│   └── sampled_sequences_epoch_{N}.csv
└── best_model.pt               # Best model (lowest validation loss)

Sampling

results/{selex_name}/generated_samples/{experiment_type}/
└── sampled_sequences_R{round}.csv

Data Availability

The SELEX datasets used in this study are available from the DRA (DNA Data Bank of Japan Sequence Read Archive):

SELEX Dataset DRA Accession
SELEX_A [PLACEHOLDER]
SELEX_B [PLACEHOLDER]
SELEX_C [PLACEHOLDER]

Citation

If you use RaptEvo in your research, please cite:

@article{raptevo2025,
  title={RaptEvo: Modeling Selective Sequence Enrichment Processes with Diffusion Models},
  author={Matsumoto, Hidenori and Nakano, Ryota and Nakamura, Yoshikazu and Adachi, Tatsuo and Sato, Kengo and Hamada, Michiaki},
  journal={},
  year={2025},
  doi={}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

[PLACEHOLDER]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published