This repository contains the analysis and results of a single-cell multi-omics study combining scRNA-seq and ATAC-seq data to investigate hematopoietic stem cells (HSCs) across different age groups. The analysis focuses on the transcriptional and chromatin accessibility landscapes in young and aged HSCs, with a particular emphasis on age-related changes in chromatin accessibility and stress response pathways.
single_cell/
├── README.md # Main project documentation
├── scripts/ # Analysis scripts
├── data/ # Raw and processed data
├── plots/ # Generated visualizations
├── output/ # Analysis outputs
├── results/ # Processed results
├── reports/ # Generated reports
└── docs/ # Documentation
For detailed project structure, see PROJECT_STRUCTURE.md.
- Clone this repository
- Install required R packages:
install.packages('BiocManager') BiocManager::install(c("Signac", "Seurat", "EnsDb.Mmusculus.v79"))
- Download data from GEO (GSE190424)
- Run the analysis pipeline:
rmarkdown::render("scripts/HSC_scRNA-ATAC_seq.Rmd")
The data used for this analysis is publicly available on GEO under the accession GSE190424. The study provides high-resolution single-cell datasets, allowing detailed insights into HSC biology.
This work is a reproduction of the research presented in the paper:
"Epigenetic traits inscribed in chromatin accessibility in aged hematopoietic stem cells" by Itokawa et al., published in Nature Communications (2022).
DOI: 10.1038/s41467-022-30374-9 | PMID: 35577813
- Compare gene expression profiles between young and aged HSCs
- Identify chromatin accessibility differences linked to aging
- Integrate scRNA-seq and ATAC-seq datasets to understand age-related changes in gene regulation and stress responses
- Chromatin Accessibility Changes: Aged HSCs exhibit alterations in chromatin accessibility, particularly in regions enriched for STAT, ATF, and CNC family transcription factor motifs
- Stress Response Pathways: Open differentially accessible regions (open DARs) in aged HSCs are linked to augmented transcriptional responses under stress conditions
- Enhancer States: Most open DARs comprise active, primed, and inactive enhancers, suggesting a role in regulating stress-induced gene expression
- scRNA-seq Data:
- Quality control filtering
- Normalization
- Feature selection
- ATAC-seq Data:
- Peak calling
- Fragment file processing
- Quality metrics calculation
- Cell filtering based on:
- Peak region fragments (3000-20000)
- Percentage of reads in peaks (>15%)
- Blacklist ratio (<0.05)
- Nucleosome signal (<4)
- TSS enrichment (>2)
-
Dimensionality Reduction
- PCA
- UMAP
- t-SNE
-
Clustering
- Graph-based clustering
- Cluster marker identification
-
Differential Analysis
- Gene expression differences
- Chromatin accessibility changes
- Motif enrichment analysis
-
Integration
- RNA-ATAC data integration
- Correlation analysis
- Regulatory network inference
UMAP visualization showing distinct clustering patterns between young and aged HSCs. The analysis reveals clear separation between different cell populations, indicating distinct chromatin states and transcriptional profiles.
Feature plot showing expression patterns of key genes across different cell clusters. This visualization helps identify genes that are differentially expressed between young and aged HSCs.
Violin plot showing the distribution of expression levels for key genes. This analysis reveals how gene expression varies between young and aged HSC populations.
Coverage plot showing chromatin accessibility profiles around differentially accessible regions. This visualization demonstrates how chromatin accessibility changes with age in specific genomic regions.
- Aged HSCs exhibited altered expression of genes associated with:
- Stemness (e.g., GATA2, RUNX1)
- Differentiation
- Stress response
- Aged HSCs displayed:
- Reduced accessibility in promoter and enhancer regions
- Open DARs enriched for specific transcription factor motifs
- Altered regulatory landscape
- Strong correlation between chromatin accessibility and gene expression
- Enhanced transcriptional responses under stress conditions
- Regulatory network changes in aged HSCs
- R (version 4.0 or higher)
- Required R packages:
- Seurat (for scRNA-seq analysis)
- Signac (for ATAC-seq analysis)
- EnsDb.Mmusculus.v79 (for genome annotation)
- Additional packages: tidyverse, ggplot2
- ATAC-seq peak count matrix (.h5)
- Cell metadata (.csv)
- Fragment files (.tsv.gz)
- Genome annotation (mm10)
- Processed data objects
- Quality control plots
- Differential analysis results
- Integration analysis results
- Visualization plots
-
Memory Issues
- Solution: Increase R memory limit
- Command:
memory.limit(size = 16000)
-
Package Installation
- Solution: Install BiocManager first
- Command:
install.packages('BiocManager')
-
Data Loading
- Solution: Check file paths and formats
- Ensure all required files are present
- Use parallel processing for large datasets
- Implement chunked processing for memory efficiency
- Save intermediate results
If you use this repository or the reproduced analysis in your work, please cite:
Itokawa N, Oshima M, Koide S, Takayama N et al. Epigenetic traits inscribed in chromatin accessibility in aged hematopoietic stem cells. Nat Commun 2022 May 16;13(1):2691. PMID: 35577813
This project is licensed under the MIT License. See docs/LICENSE for details.
For questions or issues, please open an issue in this repository.
- Original paper authors for providing the dataset
- Seurat and Signac development teams
- R/Bioconductor community