This repository contains preprocessing and quality control (QC) workflows for DNA methylation (DNAm) data arrayed using the Illumina EPIC v2 array. The data were derived from the IMAGEN, STRATIFY, and ESTRA cohorts.
Preprocessing and QC are performed using the minfi package.
The overall workflow includes raw data import, normalisation, batch correction, cell type deconvolution, and quality control.
See full implementation in this script:
Preprocess and QC for DNAm data.R
| File | Description |
|---|---|
RGset.rda |
Raw intensity data (methylated and unmethylated signals) prior to normalisation or preprocessing. |
beta_Quantile.rda |
Beta values obtained after quantile normalisation. |
Quantile-norm.rda |
Quantile-normalised intensity data. |
fast_svd.rda |
Results from fast singular value decomposition (SVD), used to detect and adjust for batch effects or confounding factors. |
cellcount.rda |
Estimated cell-type proportions per sample. |
QC plots are included. For documentation, see:
Additional QC procedures are described in More QC option.R, including:
-
SNP-based probe filtering
- Remove probes with SNPs at CpG, single base extension (SBE), or probe body (
MAF > 0.05)
- Remove probes with SNPs at CpG, single base extension (SBE), or probe body (
-
Detection p-value filtering
- Remove probes with detection p-values > 0.01
- Exclude probes failing in >20% of samples
-
Non-variable CpG filtering
- Remove probes with invariant methylation (beta ≤ 0.2 or ≥ 0.8 in all samples)
-
Sex chromosome probe removal
- Exclude probes on chrX and chrY
-
Missing value handling
- Set failed probes to
NAin the beta matrix - Retain only high-confidence probes in downstream analysis
- Set failed probes to
To remove cross-reactive and polymorphic probes from your analysis, refer to:
https://github.com/markgene/maxprobes