Skip to content
Guo-Bo Chen edited this page Apr 25, 2018 · 3 revisions

Projected PC will generate principal components (PC)/eigenvectors based on a reference population.

The HapMap reference genotype data, the eigenvector scores can be found HERE. The demo is also included.

The latest GEAR can be downloaded HERE!


In this procedure the above issues will be solved and consequently makes prediction easier and avoid logistic such as strand issues.

It should be noted that GEAR will leave monomorphic loci out.

The format of the score pc loading file

SNP RefAllele pc1_score pc2_score
SNPA A 1.95 -0.5
SNPB C 2.04 -0.7
SNPC C -0.98 0.34
SNPD C -0.24 3.1

By default, gear assumes that the score file contains a header line. If your pc score file doesn't contains the header line, you should switch on the --no-score-header option.

Options

--score/--score-gz

Specify the score file, or specify the score file in gz format.

--batch

Often it is better to generate projected pc for the reference samples (such as HapMap) and the target samples together. It provides more information especially in illustration, as demonstrated below. In batch.txt is the list of the roots of file names. For examples, for two files, dat1, dat2.

HM3_clean
PUR_chr1_com

The files can be more than two. By default, only consensus markers across those files will be further matched to the scores. If the user wants to generate projected pc using as many as possible markers, --greedy should switched on. However, when --greedy option is on, the generated projected PC may not be matched up at the same space.

--no-score-header

When there is no title line for the score file, this option should be used.

--extract-score

Only SNPs included in both --extract-score and --score/--score-gz will be used for generating profile scores.

--remove-score

SNPs included in --removed-score will be used for generating profile scores.

--keep-atgc

It will keep palindrome (AT/GC) loci in the risk profile. However, the user should be sure whether the genotypes in both the reference panel and the target set are coded on the same reference allele/strand for each locus. By default, this option is off.

--auto-flip-off

When this option is on, a locus has flipped alleles in the testing set will not be matched.** As genotypes may be called on the complementary strands across genotyping platforms, gear will match them by flipping SNPs automatically. For example, the named SNP is "A" in the score file, but due to flipping the reported SNPs are "T/C" in the validation set. Under --auto-flip-off option is switched off, gear will flip "T/C" back to "A/G", and consequently match the score to the validation set. Of course, gear presumes the polymorphism is same across the discovery and the validation sets.

There are four possible schemes for matching a SNP between the discovery and the validation sets

SNP match (more technical doc)
The named score SNP matches the reference allele in the validation set
The named score SNP matches the alternative allele in the validation set
The named score SNP matches the flipped reference allele in the validation set
The named score SNP matches the flipped alternative allele in the validation set
Matches neither, then this locus will be discarded

Notes AT/GC loci will be left out if --keep-atgc is not sswitched on. Probably --keep-atgc should not be turned on otherwise the SNP coding on the same strand for each locus in both the discovery and the validation panels.

In the examples below, it shows how to generate projected PC for Puerto Rican cohort in 1000 Genome projects

Example 1 generating projected pc using batch solution

java -Xmx15G -jar /path/gear.jar probatch --batch batch.txt --score score.txt --out pur
java -Xmx15G -jar /path/gear.jar probatch --batch batch.txt --score-gz score.txt.gz --out pur

Inside batch.txt is

HM3_clean
PUR_chr1_com

The illustration of the projected pc for Puerto Ricans as well as HapMap reference is as below pur.

In addtion, the above procedure can also be implemented step by step if the user feels interested.

java -Xmx15G -jar /path/gear.jar comsnp --bfiles PUR HapMap --out score
java -Xmx15G -jar /path/gear.jar propc --bfile PUR --extract-score score.comsnp --score-gz HM3_SNP.blup20.gz --out Target
java -Xmx15G -jar /path/gear.jar propc --bfile HapMap --extract-score score.comsnp --score-gz HM3_SNP.blup20.gz --out HapMap_Ref

Return to GEAR Home

Clone this wiki locally