RFMix_Pipe

Local ancestry pipeline for running RFMix

Contained in this repository is a handful of scripts to create the files necessary to run RFMix from VCF.

Clean_input_main.sh is the main script while Clean_input_intersect.R and Clean_input_make_classes.R are helper scripts. All are designed to be run from the command line, an example of which can be found in the file labelled wrap. This wrapper file can be modified with the users files to easily create the necessary inputs

nano wrap ###modify with your files, see inputs below
./wrap

Example Data

Reference files were sourced from 1000 Genomes project YRI and CEU populations.

Example query data can be taken from the 1000 Genomes ASW population

Map files were sourced from: https://github.com/joepickrell/1000-genomes-genetic-maps

RFMix can be downloaded from https://sites.google.com/site/rfmixlocalancestryinference/

Inputs

There are five main inputs.

--query <query.vcf> VCF file containing the admixed experimental population on which local ancestry inferrence is to be conducted.
--ref <reference.vcf> VCF file that contains all samples to be used as reference populations
--pop <pop_codes.txt> A two column tab delimeted file that contains the reference samples in column 1 and their respective populations in column 2. Should not contain the admixed sample IDs as those will be inferred from <query.vcf>.
--map <genetic_map.txt> A three column text file similar to the genetic map used by SHAPEIT. Column 1 is SNP ID, column 2 is base pair position, and column 3 is centiMorgan position.
--out <outdir/output_prefix> As in plink, out specifies the both the output directory as well as the output prefix for files to be output as.

All inputs should be separated by chromosome (barring pop which is the same across chromosomes). Make sure all inputs are in the same build and that their SNP ID formats are concordant.

Workflow

You may need to install a Local copy of RFMIx.

Phase data if not phased. See Phasing_example.md
Run Clean_input_main.sh

Remove snps containing duplicate positions/IDs
Find the subset of snps that exist across all files and make pos file
Subset files
Merge vcfs and create haplotype files
Make class file for RFMix input

Run RFMix (See Below)

The script should take a few minutes to execute depending on the number of snps/samples being processed. After it has executed users can take the output and run RFMix as normal. Be aware that RFMix needs to be run from the install directory containing RFMix

Example: Running RFMix

cd RFMix_v1.5.4 
 python RunRFMix.py -e 2 -w 0.2 --num-threads <n> --forward-backward PopPhased <example_merged.haps> <example.classes> <example.pos> -o <output.results>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFMix_Pipe

Example Data

Inputs

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Clean_input_intersect.R		Clean_input_intersect.R
Clean_input_main.sh		Clean_input_main.sh
Clean_input_make_classes.R		Clean_input_make_classes.R
Phasing_example.md		Phasing_example.md
README.md		README.md
wrap		wrap

Folders and files

Latest commit

History

Repository files navigation

RFMix_Pipe

Example Data

Inputs

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages