This project investigates the relationship between mitochondrial DNA variation (specifically heteroplasmy and haplogroups) and cognitive function in midlife through a crosssectional and longitudinal analysis of genetic and cognitive data, using Coronary Artery Risk Development in Young Adults (CARDIA) data.
CARDIA began in 1985-6 initially to assess risk factors of cardiovascular disease. 5115 Black and White participants aged 18-30 were recruited and followed-up every 5 years. The most recent visit is in Year 35 (2020-2022).
Repository Name: mtDNAhtz_CARDIA Repository URL: https://github.com/AndrewsLabUCSF/mtDNAhtz_CARDIA Date of Access: 2024.07.25
This section describes the data files included in this directory. Raw data files are not stored to github.
-
phenotypes: This folder contains files for
-
data.csv: demographic data (Y15, Y20, Y25, Y30, Y35), ApoE Phenotype (Y7), and cognitive test scores (Y25, Y30, Y35) -
y20cov.csv: Year 20 lifestyle (smoking status, alcohol use, physical activity) and comorbidity (diabetes, hypertension, depression, BMI) variables -
SDOH_vars.csv: social determinants of health (SDOH) index score
-
-
mtDNAseq: This folder contains mtDNA sequencing and processing results.
-
Heteroplasmy_Estimates: This folder contains output of GATK variant calling results. -
output: This folder contains output files of the mitoverse pipeline. Each batch folder contains: a multiqc report, QC report for each sample, and results files. See https://mitoverse.readthedocs.io/mtdna-server/mtdna-server/ for output files of mitoverse.
-
-
MLC_score: This folder contains MLC score file from Lake et al., 2024 and allele frequency (AF) results from Bolze et al., 2019, Gupta et al., 2023, and Laricchia et al., 2022.
| Data Type | Data Structure | Column Description |
|---|---|---|
| Phenotype | tabular | Variable dicionary are in / PDF files in folder |
| mtDNAseq | BAM | Files obtained from Hou lab |
| MLC score | tabular | position, reference allele, alternate allele, MLC score |
| AF files | tabular | position, reference allele, alternate allele, AF (for heteroplasmies and homoplasmies) |
-
Phenotype files were shared by Yaffe Lab.
-
mtDNAseq BAM files were shared by CARDIA. And processed using mtdna-server 2.
-
MLC score file was downloaded from Lake, N.J. et al. (2024) Quantifying constraint in the human mitochondrial genome. Nature..
-
Allele frequency (AF) files were downloaded from:
-
UKBmtDNA.csv:Gupta, R. et al. (2023) Nuclear genetic control of mtDNA copy number and heteroplasmy in humans. Nature. -
gnomad.genomes.v3.1.sites.chrM.reduced_annotations.tsv: mtDNA AF from Genome Aggregation Database (gnomAD™) v3.1 obtained from gnomAD website. Variant calling pipeline was described in Laricchia et al. (2022) Mitochondrial DNA variation across 56,434 individuals in gnomAD. bioRxiv.. -
HelixMTdb_20200327.tsv: mtDNA AF from HelixMTdb downloaded from website. Variant calling pipeline was described in Bolze et al. (2019) A catalog of homoplasmic and heteroplasmic mitochondrial DNA variants in humans. bioRxiv.
-
-
Phenotype - Demographic (age, sex, education, race): Missing data at baseline for education and age were imputed from closest time point.
-
Phenotype - Lifestyle / Comorbidities: Missing data imputed by random forest algorithm.
-
mitochondrial DNA sequencing:
- Possible false-positives were filtered from called variants
- Heteroplasmy levels were extracted from filterd variants
- Contaminated samples were flagged and filtered from data
The CARDIA Investigators welcomes collaborative research. Manuscript proposal was submitted to CARDIA Coordinating Center and approved by CARDIA P&P on Sept 11, 2024 to obtain the data for this project.