Skip to content

Parallelizable SLURM skimming pipeline for downsampling and genome size estimation with RESPECT.

Notifications You must be signed in to change notification settings

echarvel3/genome_size_scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

scripts and data for genome size estimation.

Order of scripts to get finalized output.

  1. Place your paired fastq files in numbered folder (e.g. folder_1, folder_2, folder_3) for better parallization.
  2. Make sure to put your own account name in the batch script!
  3. Submit .slurm scripts to scheduler and just run .sh scripts on head node.
Script Title Use
0_run_bbmap.slurm bbduk.sh and dedupe.sh
1_run_kraken.slurm Decontamination
2_run_jellyfish.slurm Histogram Generation
3_run_compile_hist_data.sh Compiling Necessary Data for RESPECT (do not submit to slurm scheduler)
4_run_respect_full.slurm Runs RESPECT
5_run_downsample.slurm Uses seqtk to downsample data
6_run_jellyfish_downsampled.slurm Histogram Generation on Downsampled Replicates
7_run_compile_downsampled_data.sh Compiles Necessary Data Again (do not submit to slurm scheduler)
8_run_respect_downsampled.slurm Runs Final RESPECT Run

Notes:

  • You may have to change the SCRIPT_DIR variable from 0_run_bbmap.slurm to the path where the scripts have been placed.
  • Install kraken2 (export to PATH if installed via github)
  • Download PFP-Plus 16 and add to directory where analyses are being run.

About

Parallelizable SLURM skimming pipeline for downsampling and genome size estimation with RESPECT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published