Disease was defined by phecodes. Quantitative traits were extracted from the EMR, including anthropometric, vital signs and laboratory measurements. The flow charts of quality control for quantitative traits can see 1-Phenotyping.
The quality control for genotyping can see 2.1-Genotyping_QC.
Phasing was conducted with SHAPEIT5. Genome imputation was carried out with IMPUTE5. See 2.2-Imputation for details.
PC-AiR and PC-Relate (GENESIS package) were used for PCA and relatedness estimation and PRIMUS was used for identifying the maximum unrelated set. See the pedigree reconstruction in genotyping QC for details.
SAIGE was applied for the mixed effect model GWAS (SAIGE.sh and SAIGE_qtrait.sh).
PLINK2 was applied for the generalized linear model GWAS (plink_for_ldsc.sh).
To evaluate the performance of our GWAS, PGRM was used to calculate the overall and power-adjusted replication rates and actual over expected ratio.
(PGRM.R)
SuSiE was conducted for summary statistics-based fine-mapping.
(fine-mapping.sh)
LDSC and LSH was used to estimate the SNP-based heritability.
(LDSC.sh and LSH.R)
h2gene analysis was conducted to partition SNP-based heritability to the gene level.
(H2Gene.sh)
To examine whether there are shared common genetic causal variants between tissue-specific gene expression and traits of interest.
coloc was used to evaluate colocalization between gene expression and the trait of interest, and expression quantitative traits locus (eQTL) resources from 49 tissues in GTEx v8 were used for testing.
(Coloc.R)
LDSC was performed to obtain pairwise genetic correlations. Popcorn was performed for the cross-population genetic correlation.
( genetic_correlation.sh and popcorn.sh)
Five popular PRS tools were used for the single traits PRS model building
- LDpred2 (
LDpred2_lassosum2_phecode.RandLDpred2_lassosum2_qtrait.R) - Lassosum2
- PRS-CS (
PRScs.sh) - MegaPRS (
MegaPRS.sh) - SBayesR (
SBayesR.sh)
PRSmix+ was performed for the multiple traits PRS model building
(run_PRSmix_by_phecode.R)
The explained variance (r2) was used to evaluate the performance of PRS for quantitative traits. Two indices, area under the receiver operating characteristic curve (AUC) and liability-scaled r2, were used for PRS of disease. (calc_auc.py and calc_r2.R)