Calculate Site Frequency Spectra With and Without Invariant Sites
All scripts used to generate the spectra are currently located in the repo, but more work needs to be done to make the scripts more extensible.
- All analyses are run on three strain sets: swept, divergent, and the whole population.
- Scripts to generate spectra are in the
2020_SFS_Analysisfolder - Run
classify_strains.Rto generate the different strain populations - For each population do the following:
- Paste strain names output by
classify_strains.Ron line 5 ofmake_files.shin the population subdirectory - Run
make_files.shusing the pruned VCF: VCF pruning was performed by only including sites with no missing data from the 20200815 CeNDR release. make_files.shgenerates a file calledSFS_INPUT.tsv- Run
generate_invariant.R. This script takes a spliced CDS fasta file and the output frommake_files.shto generate counts of 0- and 4-fold sites that do are invariant across the tested population. This script needs to be updated to work within the repository file structure. - Run
generate_spectra.Rto generate spectra for DFE analysis
- See
Readmein Scripts directory to perform DFE analysis
- Run
GENERATE_SFS_FILES.sh. This takes a VCF and generates a processed data set that is used byGenerate_Spectra.Rto generate spectra files. This script needs to be updated in the following ways (currently doesn't run as a script, but all the commands are there):
- Modify paths to work in the github repository file structure.
- Modify script to take VCF, sample names file, and ancestor name as an input.
- Run
Invariant_SFS.R. This script takes a spliced CDS fasta file and the output fromGENERATE_SFS_FILES.shto generate counts of 0- and 4-fold sites that do are invariant across the tested population. This script needs to be updated to work within the repository file structure. - Run
Generate_Spectra.R. This generates.sfsfiles used for DFE analysis. The parameters for this script are:
- The output from
GENERATE_SFS_FILES.sh:SFS_INPUT.tsv no_indelorindelto include indels in the spectra- The output from
Invariant_SFS.R:invariant_site_by_region.tsv