Here is a quick overview of the functionalities.
For more details see the help pages, e.g.
./d_stats.py -h
./d_genomewide.R -h
Count ABBA and BABA for D(pop1, pop2, pop3, pop4) based on a
VCF file or allele frequencies as computed by allele_freqs.py.
Go through all autosome subdirectories and combine ABBA and BABA counts from d_stats.py
to compute D(pop1, pop2, pop3, pop4).
Perform blockwise jackknife to compute Z-scores.
Plot a heatmap of D(X, Y, pop3, pop4) for every pop3-pop4 combination.
Plot D(pop1, pop2, X, pop4) for every pop1-pop2-pop4 combination as barplot.
Plot D(X, Y, pop3, pop4) for every pop3-pop4 combination as horizontal barplot.
See slides/D_stats_freqbins_patterns.pdf for some notes on the interpretation of stratified D-statistics
compute D(pop1, pop2, pop3, pop4) per B-allele-frequency-bin in pop3.
Introgression from pop1 or pop2 into pop3 should be most visible at the expected
frequency of the introgressed B-allele in pop3.
Requires prior counting of ABBA and BABA patterns with d_stats.py with --sites 'full' and d_genomewide.R.
Plot D(pop1, pop2, X, pop4) per B-allele-frequency-bin for every pop1-pop2-pop4 combination as lines or dots.
compute ABBA and BABA counts for D(pop1, pop2, pop3, pop4) per ancestral/derived (A/B) allele pair.
Requires prior counting of ABBA and BABA patterns with d_stats.py with --sites 'full' and d_genomewide.R.
plot ABBA and BABA counts for D(pop1, pop2, pop3, pop4) per ancestral/derived (A/B) allele pair.
counts AAAA, BAAA and ABAA patterns for all combinations of pop1-pop4 (transitions and transversions separately).
Useful for quickly checking excess of B-alleles in pop1 or pop2, potential errors.
Calculate allele frequencies per population for biallelic sites.
The output is similar to a VCF-file and can be used as alternative input for d_stats.py (instead of a VCF).
d_stats.py is much faster with precomputed allele frequencies.
Merge two VCF files, one from stdin the other as first command-line argument (.vcf or .vcf.gz).
Do filtering on the fly, write to stdout for multiple merging steps.
Merge a VCF file from stdin with a file containing [chr | pos | base | (base)],
like created by bam_ran_base.py.
Bases get converted to pseudo-genotypes.
Take a VCF file from stdin that has bases in LAST columns instead of genotype,
convert bases to pseudo-genotype.
Use case: Kays BamSNPAddMaf was used to add ape-bases to VCF.
Filter one VCF file from stdin. Acts on whole lines and/or individual genotypes.
Filter VCF-like file from stdin with bed-file.
Also works with allele frequency files created by allele_freqs.py.
Subsample streamed in vcf-like file randomly given the proportion of passing sites.
E.g. zcat file.vcf.gz | subsample_vcf.py 0.5 will remove around half of the positions from file.vcf.gz.
Remove lines from VCF that are only variable due to alleles in a group of samples.
Mask genotypes that are private to the group in lines that are kept due to other variation.
Use case: Remove genotypes from VCF that have mutations only observed in low-coverage genomes (potential errors).
Count sites in VCF that are triallelic only due to a group of specified samples.
Use case: How many biallelic sites got lost, because some other sample was added and introduced a third allele?
Take VCF from stdin and an infofile with sex per sample,
and convert diploid genotypes for males outside the PAR region to haploid genotypes.
E.g. '0/0' -> '0', '1/1' -> '1', '1/0' -> '.'
Take a samtools mpileup *.bam output from stdin and sample a random read.
Write [chr | pos | base] to outfile (one file per chrom).
Restrict to biallelic transversions given index of REF and ALT column (can be applied to several file-formats).
Restrict to sites in range of allele-frequencies given index of allele-freq column (can be applied to several file-formats).
Calculate pairwise divergences between populations given a VCF file.