A SG10K Health Study, investigating the associations of various epigenetic clocks on ageing, in a multi-ethnic Singaporean cohort.
Explore the Singapore National Precision Medicine Strategy »
Report bug
·
Request data
·
Themes
·
Blog
Principal Investigators (SG10K Health Aging Study)
Neerja Karnani
.
Joanne Ngeow Yuen Yie
.
Brian Kennedy
Singapore's National Precision Medicine (NPM) strategy seeks to acceleration biomedical research, improve health outcomes, and enhance opportunities for economic value across sectors through a decade long roadmap. The first phase of this strategy is a "proof of concept", through the SG10K health study - primarily generating a genomic reference database of 10,000 healthy Singaporeans, demonstrating the feasibility of large-scale genomic data generation. As part of SG10K health study, this study investigates the Ageing-related clinical phenotypes, alongside genetic, epigenetic, telomere length, and epigenetic clocks, to provide a comprehensive overview of the molecular landscape of Age-related phenotypes in an Asian multiethnic cohort.
This git repository houses the codes used for the analysis of the NPM Aging Study.
placeholder for link to manuscript.
- Study Cohorts
- DNA Methylation
- Genomics
- EpiAge Estimates
- Telomere Length
- Analysis Overview
- Status
- Authors
A birth cohort comprising of one of the most carefully phenotyped parent-offspring study, enabling examination of the potential roles of fetal, developmental, and epigenetic factors in pathways to disease.
An adult cohort which aims to identify the genetic and environmental factors that underpin development of obesity, diabetes, cardiovascular disease and other complex diseases in Singapore.
An adult cohort which aims to discover how lifestyle factors, physiological factors, genetic factors and their interactions impact the development of common health conditions, and to monitor risk factors in the population and gain insight into determinants of health-related behaviours.
An adult cohort, flagship initiative of the academic medical centre in precision medicine, a discipline where medical treatments and procedures are tailored to individual patients, based on their detailed genetic, molecular and clinical profiles.
An adult cohort which aims to provide novel knowledge in the population eye health to enable dissecting, detecting and preventing the eye diseases in Singapore and Asia, and to promote and improve global eye health.
An adult cohort collected between 2015 and 2016 in TTSH Health Screening Programmes to support health related studies at TTSH.
- Illumina EPIC Array pre-processing was performed by Marie's Loh lab.
- Single-sample csv files per study were obtained following standard Type 1/Type 2 and Red/Green channel normalizations.
- Independent of whole-genome sequencing (WGS) data, Marie's Lab incorporated a PCA-based ethnicity QC to determine population structure and stratify by ethnic groups. As genomic data supersedes epigenetic information, we do not apply Marie's ethnicity QCs and classifications. (They are very similar but not 100% identical. Also for samples that fail genomic QC, it does not make sense to superimpose epigenetic ethnicity classifications onto the subject; because they are not 100% identical.)
- QC parameters employed in this study include:
- Sample call rates were removed (total CpGs passing QC per sample < 90%)
- Sex QC
- Subject duplication
- Age (is NA or not)
- Kinship (cryptic relationships)
- Cohort resolution
- 10,019 samples passed the initial QCs in Marie Loh's lab.
- Multimodal CpGs were removed (nmode.mc (modedist=0.2) > 1)
- Non-variable CpGs were removed (IQR < 0.05)
- CpGs failing marker call rates were removed (Det P > 0.01)
- Sex chromosomes were removed.
- CpGs with ethnic-specific (based on SG10K MAF <5%) within single-base extension were removed.
- Cross hybridizing probes and probes recommended to be removed under the Illumina EPIC manifest (v1_0_b5) were removed.
- 747,212 CpGs passed this QC.
- Whole Genome Sequencing of 10,259 healthy Singaporeans was performed.
- Single-sample gVCF files were obtained following GATK4 "germline short variant per-sample calling" reference implementation defined parameters and companion files (GATK resource bundle GRCh38).
- msVCF files were obtained by performing a joint-calling step.
Sample QC & annotation
- 9,770 samples passed the initial genomic coverage requirements per study.
- Variants failing VQSR filter were removed.
- Sex was imputed based on the mean depth ratio of chrX/chr20 and chrY/chr20 of each sample, and samples with abnormal ploidy were excluded.
- Samples with call rate < 95%, contamination rate > 2%, error rate > 1.5%, extreme heterozygosity (> 3SD) were excluded.
- Only non-monomorphic autosomal biallelic SNPs in HWE (P < 10e-8) were included.
- Low complexity regions were excluded after LD pruning (r^2 > 0.2).
- Samples with cryptic relationships were excluded (pi-hat > 0.2).
- Samples showing evidence of admixture between ethnicities through PCA outliers were excluded.
- 8,118 samples passed this WGS QC.
- Subjects passing WGS QCs (having WGS-derived ethnicity classifications) were combined with those passing DNA Methylation QCs. This gave us 6,240 unique subjects (5,566 adults + 674 birth cohort).
- Post QC DNA methylation betas were used in the generation of epigenetic ages.
- Code to calculate the clocks such as Systems Age are available in the Methylcipher package.
- Ethnic-specific age residuals were adjusted for age, sex, BMI, batch, cohort, and cell type proportions.
- Sex-specific age residuals were adjusted for age, ethnicity, BMI, batch, cohort, and cell type proportions.
- TelSeq was conducted on SG10K genomic data passing genomic QC, evaluating the frequency of telomeric repeats (TTAGGG) with a default parameter of k=7.
- TelSeq estimates were correlated with qPCR measured telomere lengths in a subset of the same study samples as well as other WGS based telomere length estimations (Telomerecat).
- Raw estimates were normalized through rank-based z-scores.
- 8,045 samples passed this QC.
- Linear regression models were employed, adjusting for cohort, ethnicity, sex, BMI, and cell type proportions.
- 5,566 unique adult samples had genetic and age acceleration passing QC.
- Samples were stratified by cohort, ethnicity, and sex prior to analyses.
- For sex-stratified analyses, post QCed SNPs present across all three ethnic groups were used.
- Associations were adjusted for sex and genetic PCA PCs.
- For common SNPs,
- Linear Wald test analysis was conducted using Efficient and Parallelizable Association Container Toolbox (EPACTS v3.3.0)
- Cohort-specific association summaries were meta-analyzed using a standard error approach in METAL.
- SNPs present in 3 or less (out of 5) of the cohorts, or having a VQSLOD < 0 were excluded from the meta-analysis.
- Pairwise comparisons of lead SNP effect estimates between phenotypes (ethnicity and sex) were done using a z-test for independent summary estimates with two-sided p-values.
- Bonferroni multiple-testing correction for p-values was done.
- To mitigate the influence of DNA methylation outliers, we truncate outlier values beyond 2xIQR to the nearest value. [see PMID: 34633450]
- DNA methylation associations with chronological age were adjusted for BMI, ethnicity, sex, cohort, batch, and cell type proportion.
- This gave us 97,208 Age significant CpGs.
Manuscript under review.