Ad_PX_pipe

This repository reorganizes and restructures scripts from Genetically regulated gene expression underlies lipid traits in Hispanic cohorts to be more user-friendly. All data input (PLINK .bed/.bim/.fam, covariates w/ and w/o IIDs, phenotypes w/ and w/o IIDs, and phenotype names) must be exactly in the same format as the example data, including file names, and all missing data must be replaced with "NA". This pipeline is optimized for genotypes built in hg19 and linear phenotypes, and may not work with case/control (logistic). All paths to softwares are defaulted to those on wheelerlab3, and it is expected that all scripts are run from the same directory. For exact details on the inner workings of each script, use the --help flag or see the wiki. We will be using genotypes from 1000 Genomes American superpopulation (AMR.bed/.bim/.fam) and simulating phenotypes and covariances in R. The input population does not need to be admixed, but local ancestry steps (12-15) are for use mainly in admixed (African-American, Hispanic) populations. You should be able to copy and paste all the commands from this README and everything should run... but will you understand it?

For much more detail on the process of everything in here, please see the wiki, I spent a lot of time writing it :).

Produce phenotypes and covariates (ex. medicines) in R (for test data only)
- Rscript 00_simulate_pheno_covar.R --bfile AMR

For use in all populations:

Perform quality control in PLINK using gwasqc_pipeline
- Test data has already been filtered
Calculate principal components and a relationship matrix in KING
- If your data runs into a segmentation fault, try using only a subset of SNPs (ex. restrict to chr. 22)
- Rscript 02_relate_matrix_PCs.R --bfile AMR
Impute data with the Michigan Imputation Server
- Test data is already imputed
- Follow instructions in 03_Michigan_Imputation_Server.pptx
Convert imputed data to PrediXcan dosages using this script.
- Since test data is already imputed in PLINK format, run this (for example data only)
  - mkdir dosages/; cd dosages/; wget https://raw.githubusercontent.com/hakyimlab/PrediXcan/master/Software/convert_plink_to_dosage.py; awk '{ print $1"\t"$2 }' ../AMR.fam > samples.txt; python convert_plink_to_dosage.py -b ../AMR -p /usr/local/bin/plink; cd ..; echo "Completed converting dosages."
Perform a genome wide association study in GEMMA
- a. Convert from PrediXcan dosage format to BIMBAM format (GEMMA genotype input)
  - python 05a_PrediXcan_dosages_to_GEMMA.py --dosage_suffix .txt.gz
- b. Make covariance file from known covariates and KING PCs
  - Rscript 05b_make_GEMMA_covars.R --pcs_num 5
- c. Run all chrs. in a loop
  - python 05c_GEMMA_wrapper.py --GWAS
Calculate predicted gene expressions in PrediXcan using GTEx and MESA models
- python 06_make_pred_exp.py
Perform an imputed transcriptome-based association study in GEMMA
- a. Convert predicted expression to GEMMA-style pseudo-genotypes
  - python 07a_convert_PrediXcan_to_GEMMA.py
- b. Run all pops. and tissues in a loop
  - python 05c_GEMMA_wrapper.py --pred_exp
Find significant SNPs from GWAS and significant genes from PrediXcan in R
- python 08_sig_SNP_sig_gene.py --SNP_sig 5e-6 --gene_sig 5e-3
  - NOTE: I set these thresholds arbitrarilty low so we have significant genes to with with later. Usually, use the defaults, 5e-8 and 9.654e-6, respectively.
Calculate independent significant SNPs in a joint analysis in GCTA-COJO
- a. Make GWAS output into GCTA-COJO format
  - python 09a_GEMMA_to_GCTA-COJO.py --fam AMR.fam
- b. Run GCTA (see manual)
  - gcta64 --cojo-file pheno1.ma --cojo-slct --cojo-p 5e-6 --bfile AMR --cojo-actual-geno --out pheno1; gcta64 --cojo-file pheno2.ma --cojo-slct --cojo-p 5e-6 --bfile AMR --cojo-actual-geno --out pheno2
    - Again, set P arbitrarily low for example; set to 5e-8 for real analyses
Perform colocalization between GWAS results and eQTL data using COLOC in a COLOC wrapper
- a. Put GWAS and eQTL data into COLOC input format (takes a while due to size of eQTL files)
  - Rscript 10a_make_COLOC_input.R --sample_size 320
- b. Run COLOC wrapper (takes a while due to size of eQTL files)
  - python 10b_COLOC_wrapper.py --sample_size 320
Perform backward elimination modeling of all significant genes to find independent signals in R
- Rscript 11_back_elim.R --sig_gene output/pheno1_sig_genes.txt --pheno pheno_wIID.txt --pheno_name pheno1

For use in admixed populations:

Prepare genotype data with PLINK
- a. Merge genotypes with a reference populations (options are African-American (AFA) and Hispanic (HIS)), restrict to only SNPs in the study population
  - plink --bfile AMR --bmerge /home/angela/Ad_PX_pipe_data/RFMix/RefPop/HIS --make-bed --out AMR_w_ref --extract AMR.bim; plink --bfile AMR --exclude AMR_w_ref-merge.missnp --make-bed --out AMR_for_merge; plink --bfile /home/angela/Ad_PX_pipe_data/RFMix/RefPop/HIS --exclude AMR_w_ref-merge.missnp --make-bed --out HIS_for_merge --extract AMR.bim; plink --bfile AMR_for_merge --bmerge HIS_for_merge --make-bed --out AMR_w_ref --allow-no-sex
- b. Order genotype file by population and add cM positions
  - cat HIS_for_merge.fam AMR_for_merge.fam > merged.fam; plink --bfile AMR_w_ref --indiv-sort file merged.fam --cm-map /home/angela/Ad_PX_pipe_data/cm_map/genetic_map_chr@_combined_b37.txt --make-bed --out AMR_w_ref_ordered
- c. Split input by chr
  - mkdir -p merged_w_ref/; for i in {1..22}; do plink --bfile AMR_w_ref_ordered --chr ${i} --make-bed --out merged_w_ref/chr${i}; done
Phase haplotypes with HAPI-UR. Change value of -w depending on parameters described in the manual
- mkdir -p haplotypes/; for i in {1..22}; do /home/angela/Ad_PX_pipe_data/HAPI-UR/hapi-ur -p merged_w_ref/chr${i} -w 64 -o haplotypes/chr${i}; done
Calculate local ancestry inference in RFMix
- a. Make additional files for RFMix input. When making classes, choose a reference population (options are African-American (AFA) and Hispanic (HIS)).
  - for i in {1..22}; do awk '{print $3}' haplotypes/chr${i}.phsnp > haplotypes/chr${i}.snp_locations; done; Rscript 14a_make_classes.R haplotypes/chr22.phind HIS
  - Note: the sample data is rather small, so some chromosomes may not output
- b. Run RFMix to estimate local ancestry for all individuals. This process takes a very long time, so we will only run chr. 22 for this example. Push chrs. to multiple cores when running real data.
  - mkdir -p RFMix/; cd /home/angela/Ad_PX_pipe_data/RFMix/; python RunRFMix.py -e 2 -w 0.2 --num-threads 10 --use-reference-panels-in-EM --forward-backward PopPhased /home/angela/Ad_PX_pipe/haplotypes/chr22.phgeno /home/angela/Ad_PX_pipe/RFMix.classes /home/angela/Ad_PX_pipe/haplotypes/chr22.snp_locations -o /home/angela/Ad_PX_pipe/RFMix/chr22.rfmix
Perform admixture mapping in GEMMA
- a. Convert RFMix output into an intermediate for GEMMA input downstream
  - python 15a_RFMix_to_BIMBAM.py --Viterbi RFMix/chr22.rfmix.2.Viterbi.txt --phind haplotypes/chr22.phind --fam AMR.fam --phsnp haplotypes/chr22.phsnp --chr 22
- b. Run phenotype associations with each local ancestry in GEMMA
  - python 15b_loc_anc_wrapper.py --snptable loc_anc_input/chr22.csv --refpop HIS --chr 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ad_PX_pipe

For use in all populations:

For use in admixed populations:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
make_plots		make_plots
00_simulate_pheno_covar.R		00_simulate_pheno_covar.R
02_relate_matrix_PCs.R		02_relate_matrix_PCs.R
03_Michigan_Imputation_Server.pptx		03_Michigan_Imputation_Server.pptx
05a_PrediXcan_dosages_to_GEMMA.py		05a_PrediXcan_dosages_to_GEMMA.py
05b_make_GEMMA_covars.R		05b_make_GEMMA_covars.R
05c_GEMMA_wrapper.py		05c_GEMMA_wrapper.py
06_make_pred_exp.py		06_make_pred_exp.py
07a_convert_PrediXcan_to_GEMMA.py		07a_convert_PrediXcan_to_GEMMA.py
08_sig_SNP_sig_gene.py		08_sig_SNP_sig_gene.py
09a_GEMMA_to_GCTA-COJO.py		09a_GEMMA_to_GCTA-COJO.py
10a_make_COLOC_input.R		10a_make_COLOC_input.R
10b_COLOC_wrapper.py		10b_COLOC_wrapper.py
11_back_elim.R		11_back_elim.R
14a_make_classes.R		14a_make_classes.R
15a_RFMix_to_BIMBAM.py		15a_RFMix_to_BIMBAM.py
15b_loc_anc_wrapper.py		15b_loc_anc_wrapper.py
99_BIMBAM_to_PX_dosages.py		99_BIMBAM_to_PX_dosages.py
AMR.bed		AMR.bed
AMR.bim		AMR.bim
AMR.fam		AMR.fam
README.md		README.md
master.py		master.py
region.txt		region.txt

Folders and files

Latest commit

History

Repository files navigation

Ad_PX_pipe

For use in all populations:

For use in admixed populations:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages