Skip to content

AndrewsLabUCSF/mtDNAhtz_CARDIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mtDNAhtz_CARDIA

This project investigates the relationship between mitochondrial DNA variation (specifically heteroplasmy and haplogroups) and cognitive function in midlife through a crosssectional and longitudinal analysis of genetic and cognitive data, using Coronary Artery Risk Development in Young Adults (CARDIA) data.

Data

Description

CARDIA began in 1985-6 initially to assess risk factors of cardiovascular disease. 5115 Black and White participants aged 18-30 were recruited and followed-up every 5 years. The most recent visit is in Year 35 (2020-2022).

Data Source

Repository Name: mtDNAhtz_CARDIA Repository URL: https://github.com/AndrewsLabUCSF/mtDNAhtz_CARDIA Date of Access: 2024.07.25

Data Files

This section describes the data files included in this directory. Raw data files are not stored to github.

  • phenotypes: This folder contains files for

    • data.csv: demographic data (Y15, Y20, Y25, Y30, Y35), ApoE Phenotype (Y7), and cognitive test scores (Y25, Y30, Y35)

    • y20cov.csv: Year 20 lifestyle (smoking status, alcohol use, physical activity) and comorbidity (diabetes, hypertension, depression, BMI) variables

    • SDOH_vars.csv: social determinants of health (SDOH) index score

  • mtDNAseq: This folder contains mtDNA sequencing and processing results.

    • Heteroplasmy_Estimates: This folder contains output of GATK variant calling results.

    • output: This folder contains output files of the mitoverse pipeline. Each batch folder contains: a multiqc report, QC report for each sample, and results files. See https://mitoverse.readthedocs.io/mtdna-server/mtdna-server/ for output files of mitoverse.

  • MLC_score: This folder contains MLC score file from Lake et al., 2024 and allele frequency (AF) results from Bolze et al., 2019, Gupta et al., 2023, and Laricchia et al., 2022.

Data Structure

Data Type Data Structure Column Description
Phenotype tabular Variable dicionary are in / PDF files in folder
mtDNAseq BAM Files obtained from Hou lab
MLC score tabular position, reference allele, alternate allele, MLC score
AF files tabular position, reference allele, alternate allele, AF (for heteroplasmies and homoplasmies)

Data Download

Data Processing

  • Phenotype - Demographic (age, sex, education, race): Missing data at baseline for education and age were imputed from closest time point.

  • Phenotype - Lifestyle / Comorbidities: Missing data imputed by random forest algorithm.

  • mitochondrial DNA sequencing:

    1. Possible false-positives were filtered from called variants
    2. Heteroplasmy levels were extracted from filterd variants
    3. Contaminated samples were flagged and filtered from data

Usage Notes

The CARDIA Investigators welcomes collaborative research. Manuscript proposal was submitted to CARDIA Coordinating Center and approved by CARDIA P&P on Sept 11, 2024 to obtain the data for this project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages