GitHub - RILAB/arg-ne: Scripts for estimating ARGs and Ne from ARGs

1 Align genomes to reference

Align each assembly to the reference using anchorwave.

2 Assemble a gvcf

Individual .maf files need to be converted to .gvcf and then combined to a single .gvcf. We recommend doing this separately by chromosome. Instructions for these steps are here.

3 Clean gvcf & prep for SINGER

3A Clean gvcf

Assuming your gvcf is formatted like the example file and is for a single chromosome. Please split any multi-chromosome gvcfs into individual chromosomes before continuing.

Run split.py using python3 split.py --depth=<depth> <filename.vcf>. Normally you will want to set depth equal to your sample size. In some files depth is recorded as 30 for each individual, so you should set depth to 30 x sample size.

This script writes three files, .inv, .filtered, and .clean. Each includes the regular header. File outputs will be large when unzipped, it is recommended to run with --gzip-output to automatically zip output files. Writes to stderr log of how many bp (expanding END segments) were written to each file. Your gvcf must have invariant sites. If there are no invariant sites, go back to step 2.

`.inv`

Contains lines from vcf where:

INFO is .
INFO contains END=

For multi-bp lines (with END=, the script copies the line once for each bp. This means the number of non-header lines (e.g. from wc -l) should represent the number of invariant bp in the file.

`.filtered`

Contains lines from vcf where any of the following occur:

the line contains * as an allele
symbolic / non-ACGT alleles
DP is less than the depth parameter given. Since we are using whole genome alignment we are assuming sites with DP < depth but no explicit indels "*" have missing data still due to structural variation of some sort and should be removed.

`.clean`

Should contain only biallelic SNPs in vcf passing all checks as well as mutliallelic SNPs with no indels that will be filtered out by SINGER snakemake pipleine later (and used to adjust the mutation rate.

3B Prep for SINGER

.clean will be the SNP data you give to SINGER. You will also need a .bed format file of bp that are masked. Usually these are everything in your .filtered file. filt_to_bed.py will take a vcf and make a bedfile.

Run using: python3 filt_to_bed.py <vcf file of filtered snps> --merge. Dropping the --merge will result in a bigger bedfile with many small, contiguous regions and is not recommended.

4 Run SINGER

Use Nate Pope's snakemake pipeline. Using the steps above, there is no need to have a filter file. Use the bedfile made in step 4 as the mask bedfile.

5 Edit and validate ARG, estimate and validate Ne models

This step requires interactive use of the jupyter notebook argcheck.ipynb. You will need to have msprime tskit tszip demes demesdraw and yaml among other packages. There are many places in the notebook where you will need to modify the code or variables to suite your files/system. Example outputs for maize are shown in the notebook preview.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
argcheck.html		argcheck.html
argcheck.ipynb		argcheck.ipynb
filt_to_bed.py		filt_to_bed.py
split.py		split.py
test.vcf.gz		test.vcf.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1 Align genomes to reference

2 Assemble a gvcf

3 Clean gvcf & prep for SINGER

3A Clean gvcf

`.inv`

`.filtered`

`.clean`

3B Prep for SINGER

4 Run SINGER

5 Edit and validate ARG, estimate and validate Ne models

About

Uh oh!

Releases

Packages

Languages

License

RILAB/arg-ne

Folders and files

Latest commit

History

Repository files navigation

1 Align genomes to reference

2 Assemble a gvcf

3 Clean gvcf & prep for SINGER

3A Clean gvcf

.inv

.filtered

.clean

3B Prep for SINGER

4 Run SINGER

5 Edit and validate ARG, estimate and validate Ne models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`.inv`

`.filtered`

`.clean`

Packages