scripts and data for genome size estimation.

Order of scripts to get finalized output.

Place your paired fastq files in numbered folder (e.g. folder_1, folder_2, folder_3) for better parallization.
Make sure to put your own account name in the batch script!
Submit .slurm scripts to scheduler and just run .sh scripts on head node.

Script Title	Use
0_run_bbmap.slurm	bbduk.sh and dedupe.sh
1_run_kraken.slurm	Decontamination
2_run_jellyfish.slurm	Histogram Generation
3_run_compile_hist_data.sh	Compiling Necessary Data for RESPECT (do not submit to slurm scheduler)
4_run_respect_full.slurm	Runs RESPECT
5_run_downsample.slurm	Uses seqtk to downsample data
6_run_jellyfish_downsampled.slurm	Histogram Generation on Downsampled Replicates
7_run_compile_downsampled_data.sh	Compiles Necessary Data Again (do not submit to slurm scheduler)
8_run_respect_downsampled.slurm	Runs Final RESPECT Run

Notes:

You may have to change the SCRIPT_DIR variable from 0_run_bbmap.slurm to the path where the scripts have been placed.
Install kraken2 (export to PATH if installed via github)
Download PFP-Plus 16 and add to directory where analyses are being run.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data/downsampled_respect_output		data/downsampled_respect_output
scripts		scripts
README.md		README.md