Cross-Ancestry Polygenic Risk Scores Enhance Alzheimer’s Disease Risk Prediction in Multiethnic Cohorts
Meri Okorie, Caroline Jonson, PhD, Alexis P. Oddi, Patricia A. Castruita, Brian Fulton-Howard, PhD, Kristine Yaffe, MD, Jennifer S. Yokoyama, PhD, Chinedu Udeh-Momoh, PhD, Shea J. Andrews, PhD for the Alzheimer’s Disease Sequencing Project and the Healthy Aging Brain Study - Health Disparities
Evaluation of single-, multi-, and cross-ancestry approaches to Alzheimer's disease polygenic risk scores in diverse cohorts. Association analysis of PRS models and AD diagnosis and endophenotyes
Quality control of the ADSP dataset
- Pre-filtering of low quality sample with DP and GQ (PLINK 1.9 availablt at https://www.cog-genomics.org/plink/)
- Defaul Genotools variant- and sample-level QC and filtering (https://github.com/dvitale199/GenoTools) + MAF filter
- Genetic ancestry estimation using pgsc_calc (pgsc_calc available at https://github.com/PGScatalog/pgsc_calc)
- Phenotype harmonization was done following NIAGADS phenotype harmonization protocol (found at https://github.com/NIAGADS/ADSPIntegratedPhenotypes), and the our adopted code can be found at workflow/scripts/https://github.com/makingphenofiles.qmd
- Admixture analysis was conducted using ADMIXTURE v1.3 available at https://github.com/NovembreLab/admixture/tree/master/releases
Quality control of the HABSHD
- Variant-level QC: exclude SNPs with call rate < 0.95 or HWE p < 1 × 10⁻⁶.
- Sample-level QC: remove samples with call rate < 0.95, sex discordance (X-chromosome heterozygosity), outlier heterozygosity, or cryptic relatedness (IBD > 0.1875 using KING).
- Imputation performed on the TOPMed Imputation Server (vR3) using Eagle (phasing) and Minimac3 (imputation).
- Post-imputation QC: remove variants with r² < 0.3 or MAF < 0.01; merge ancestry groups; remove poorly imputed variants (call rate < 95%).
- Genetic ancestry estimation using pgsc_calc (pgsc_calc available at https://github.com/PGScatalog/pgsc_calc)
- Excluded individuals with major neurological, psychiatric, or medical conditions affecting assessments.
Pruning and thresholding (P+T) PRS models (EUR and MAMA) were constructed using PRSice (https://choishingwan.github.io/PRSice/) and PLINK 1.9 (https://www.cog-genomics.org/plink/) PRSCSx PRS models were constructed using PRSCSx (https://github.com/getian107/PRScsx.git)
1000 Genome Project reference LD panels from phase 3 were downloaded from the PRSCSx github repository (https://github.com/getian107/PRScsx.git) and were used for both P+T and PRSCSx models HapMap 3 reference SNPs (downloaded from https://www.broadinstitute.org/medical-and-population-genetics/hapmap-3) was used to select SNP sets for PRSCSx.
Project directory:
Tree diagram for data, code, outputs within the project directory.
project_directory # The working directory
└── workflow
└── script
├── analyses_regression.qmd
├── data_harmonization.qmd
├── prscsx.py
└── data_visualziation.qmdC.J. is supported in part by the NIH Intramural Center for Alzheimer’s and Related Dementias (CARD), project NIH-NIA ZIAAG000534. J.S.Y. receives funding from NIH-NIA R01AG062588, R01AG057234, P30AG062422, P01AG019724, and U19AG079774; NIH-NINDS U54NS123985; the Rainwater Charitable Foundation; the Alzheimer’s Association; the Global Brain Health Institute; Genentech; the French Foundation; and the Mary Oakley Foundation. This work was conducted using the National Alzheimer’s Coordinating Center Uniform Dataset under application 10238; the Alzheimer’s Disease Neuroimaging Initiative under application SJA; and the Alzheimer’s Disease Sequencing Project under application 10050. SJA is supported by the National Alzheimer’s Coordinating Center New Investigator Award.