Skip to content

Sajib-006/DeepAge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepAge: Harnessing Deep Neural Network for Epigenetic Age Estimation

AAAI Symposium Series 2024

πŸ“„ Paper: https://doi.org/10.1609/aaai.v4i1.31212
πŸ’» Code: https://github.com/Sajib-006/DeepAge

DeepAge is a deep learning framework for predicting biological age from DNA methylation profiles. The pipeline integrates large-scale methylation datasets, performs robust feature selection using correlation-based filtering, and trains a neural network model to estimate chronological age.

This repository provides a fully reproducible pipeline including data preprocessing, feature selection, model training, evaluation, and visualization.


Overview

Epigenetic clocks based on DNA methylation have become powerful tools for studying aging and disease. However, many existing models rely on limited CpG sets or shallow statistical models.

DeepAge introduces a scalable deep learning pipeline that:

  • Integrates multiple public methylation datasets
  • Performs dual-correlation feature selection
  • Learns complex nonlinear relationships between CpG sites and age
  • Provides interpretable evaluation metrics and visualizations

This repository contains everything needed to reproduce the modeling pipeline and experiments.


Repository Structure

DeepAge
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ .gitignore
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                 # Raw datasets (not included in repo)
β”‚   β”œβ”€β”€ processed/           # Processed training data
β”‚   └── README.md
β”‚
β”œβ”€β”€ src/
β”‚   └── deepage/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ preprocessing.py
β”‚       β”œβ”€β”€ feature_selection.py
β”‚       β”œβ”€β”€ model.py
β”‚       β”œβ”€β”€ train.py
β”‚       β”œβ”€β”€ evaluate.py
β”‚       β”œβ”€β”€ visualization.py
β”‚       └── utils.py
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ prepare_data.py
β”‚   β”œβ”€β”€ select_features.py
β”‚   β”œβ”€β”€ train_model.py
β”‚   β”œβ”€β”€ evaluate_model.py
β”‚   └── plot_results.py
β”‚
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ figures/
β”‚   └── metrics/
β”‚
β”œβ”€β”€ docs/
β”‚   └── methodology.md
β”‚
└── notebooks_archive/

Installation

Clone the repository

git clone https://github.com/<your-username>/DeepAge.git
cd DeepAge

Create a virtual environment

Using conda:

conda create -n deepage python=3.10
conda activate deepage

Or using venv:

python -m venv deepage_env
source deepage_env/bin/activate

Install dependencies

pip install -r requirements.txt

Data

This repository does not distribute raw methylation datasets due to licensing restrictions.

Users should download the datasets from their original sources (e.g., GEO) and place them in:

data/raw/

Expected data format:

sample_id,age,cpg1,cpg2,cpg3,...
S1,45,0.78,0.43,0.91,...
S2,62,0.21,0.88,0.12,...

Where:

  • rows = samples
  • columns = CpG methylation beta values
  • age = chronological age

After downloading the datasets, run the preprocessing pipeline to generate the processed dataset.


Pipeline

The full pipeline consists of five stages.


1. Data Preprocessing

Clean datasets, harmonize metadata, and prepare training matrices.

python scripts/prepare_data.py \
    --input_dir data/raw \
    --output data/processed/combined_dataset.csv

Outputs:

data/processed/combined_dataset.csv

2. Feature Selection

Identify informative CpG sites using Pearson and Spearman correlations.

python scripts/select_features.py \
    --input data/processed/combined_dataset.csv \
    --output data/processed/selected_features.csv

Outputs:

data/processed/selected_features.csv

3. Model Training

Train the deep learning model.

python scripts/train_model.py \
    --data data/processed/combined_dataset.csv \
    --features data/processed/selected_features.csv \
    --output results/models/deepage_model.pt

Outputs:

results/models/deepage_model.pt

4. Model Evaluation

Evaluate predictions on the test set.

python scripts/evaluate_model.py \
    --model results/models/deepage_model.pt \
    --data data/processed/combined_dataset.csv \
    --features data/processed/selected_features.csv \
    --output results/metrics/evaluation.json

Evaluation metrics include:

  • MAE
  • RMSE
  • RΒ²
  • Median Absolute Error

5. Visualization

Generate plots of model predictions and embeddings.

python scripts/plot_results.py \
    --model results/models/deepage_model.pt \
    --data data/processed/combined_dataset.csv \
    --features data/processed/selected_features.csv \
    --output_dir results/figures

Generated figures include:

  • Predicted vs chronological age
  • Residual error distribution
  • Feature heatmaps
  • t-SNE / UMAP embeddings

Reproducing Experiments

To reproduce the full pipeline:

python scripts/prepare_data.py
python scripts/select_features.py
python scripts/train_model.py
python scripts/evaluate_model.py
python scripts/plot_results.py

Example Output

After running the full pipeline, the repository will produce:

results/
β”‚
β”œβ”€β”€ models/
β”‚   └── deepage_model.pt
β”‚
β”œβ”€β”€ metrics/
β”‚   └── evaluation.json
β”‚
└── figures/
    β”œβ”€β”€ predicted_vs_age.png
    β”œβ”€β”€ residuals.png
    β”œβ”€β”€ tsne_embedding.png
    └── heatmap.png

Requirements

Main dependencies:

  • Python 3.9+
  • PyTorch
  • NumPy
  • Pandas
  • Scikit-learn
  • Matplotlib
  • Seaborn

All dependencies are listed in:

requirements.txt

Reproducibility

To ensure reproducibility:

  • All random seeds are fixed
  • Dataset splits are deterministic
  • Model hyperparameters are documented
  • All scripts support command-line arguments

Citation

If you use this repository in your research, please cite:

@inproceedings{dip2024deepage,
  title={DeepAge: Harnessing Deep Neural Network for Epigenetic Age Estimation From DNA Methylation Data of human blood samples},
  author={Dip, Sajib Acharjee and Ma, Da and Zhang, Liqing},
  booktitle={Proceedings of the AAAI Symposium Series},
  volume={4},
  number={1},
  pages={267--274},
  year={2024}
}

Contributing

Contributions are welcome.

To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

Please ensure all new code includes documentation and tests where applicable.


License

This project is released under the MIT License.

See LICENSE for details.


Contact

For questions or collaboration inquiries, please open an issue or contact:

Sajib Acharjee Dip
sajibacharjeedip@vt.edu

About

[AAAI FSS 2024] Deep learning framework for biological age prediction from DNA methylation data with preprocessing, CpG feature selection, model training, evaluation, and visualization pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors