bagPipes is a collection of next-gen sequencing processing pipelines for the Phanstiel Lab.
bagPipes includes pipelines for paired-end RNA-seq data, ChIP-seq/CUT&RUN data, and ATAC-seq data using the Snakemake framework.
It also includes a pipeline for single-end ChIP-seq/CUT&RUN data.
This pipeline is inteded to be run on the UNC HPC longleaf cluster with SLURM.
The individual pipelines of bagPipes are run in very similar ways, with differences in their config .yaml files that can be used to fine tune runs. When using default settings, the general steps for all pipelines are as follows:
-
Clone workflow into working direcotry:
git clone https://github.com/PhanstielLab/bagPipes.git . -
Edit the tab-separated
XXXXsamplesheet.txtfile with the names ofRead1andRead2gzipped fastq files (or justRead1for single-end data), and the paths to these files under theSequencing_Directorycolumn. No naming convention is needed. Any number of optional columns can be included for metadata, output file naming, and sample merging. An example is shown below:Project Cell_Type Genotype Bio_Rep Tech_Rep Seq_Rep Read1 Read2 Sequencing_Directory PROJ CELL WT 1 1 1 sample1_R1.fq.gz sample1_R2.fq.gz /path/to/fastq/directory/ PROJ CELL WT 2 1 1 sample2_R1.fq.gz sample2_R2.fq.gz /path/to/fastq/directory/ PROJ CELL MUT 1 1 1 sample3_R1.fq.gz sample3_R2.fq.gz /path/to/fastq/directory/ PROJ CELL MUT 2 1 1 sample4_R1.fq.gz sample4_R2.fq.gz /path/to/fastq/directory/ -
Edit the appropriate
config.yamlfile for your system/experiment (RNAconfig.yaml,ChIPconfig.yaml, orATACconfig.yamlwill be used for appropriate pipelines by default). ThefileNamesFromparameter uses a list of column names to generate sample names. ThemergeByparameter will be used to create merged alignments and downstream files (signal tracks, peaks). To suppress merging, setmergeByequal tofileNamesFromor to''. See the exampleRNAconfig.yamlbelow:
## Path to sample sheet
samplesheet: 'RNAsamplesheet.txt'
## List columns to concatenate for file names
## (or indicate a single Name column, i.e. fileNamesFrom: ['Name'])
fileNamesFrom: ['Project', 'Cell_Type', 'Genotype', 'Time', 'Bio_Rep', 'Tech_Rep']
## List columns to group (i.e. any columns left out will become part of the same group)
## set mergeBy: '' for no merging
mergeBy: ['Project', 'Cell_Type', 'Genotype']
## Indicate whether to create stranded signal tracks
stranded: True # False
## Genome-specific reference parameters (choose correct genome build!)
salmon: '/proj/seq/data/salmon_RNAseq_genomes/hg38_cdna/salmon_index/default'
hisat2: '/proj/seq/data/hg38_UCSC/Sequence/HISAT2Index/genome'
gtf: '/proj/seq/data/GRCh38_GENCODE/gencode.v32.primary_assembly.annotation.gtf'
## salmon: '/proj/seq/data/HG19_UCSC/Sequence/SalmonIndex/'
## hisat2: '/proj/seq/data/HG19_UCSC/Sequence/HISAT2Index/'
## gtf: '/proj/seq/data/HG19_UCSC/Annotation/Genes/genes.gtf'
## Software versions
fastqcVers: "0.11.9"
trimVers: "0.6.7"
salmonVers: "1.10.0"
hisatVers: "2.2.1"
samtoolsVers: "1.17"
deeptoolsVers: "3.5.1"
multiqcVers: "1.11"
rVers: "4.2.2"-
Submit to
SLURMwith sbatch scripts:sbatch RNApipe.sh # to launch RNApipe sbatch ATACpipe.sh # to launch ATACpipe sbatch ChIPpipe.sh # to launch ChIPpipe
The RNApipe SLURM wrapper will sequentially launch the RNApipeCore and mergeSignal workflows using the RNAconfig.yaml file.
RNApipeCore creates quantification files, alignments and signal tracks. Output files are named according to fileNamesFrom list in the config file.
mergeSignal generates merged alignments and signal tracks based on the mergeBy list in the config file.
For both workflows, stranded signal tracks will be created by default, but can be toggled off using the stranded parameter in the config file.
The RNApipe core workflow (without merging) is as follows:

RNApipe-{jobid}.out
output/
├── align
├── benchmarks
├── {runName}_RNApipeSamplesheet.txt
├── logs
├── logs_slurm
├── (mergeAlign) # merging only
├── (mergeSignal) # merging only
├── QC
├── quant
├── signal
└── trimExample of code for RNApipeCore workflow
# rule fastqc
fastqc -o output/QC {read1}.fastq.gz {read2}.fastq.gz
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R1_fastqc.zip
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R2_fastqc.zip
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
# rule trim
trim_galore -o output/trim --paired {read1}.fastq.gz {read2}.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_val_1.fq.gz output trim/{sampleName}_R1_trimmed.fastq.gz
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_val_2.fq.gz output trim/{sampleName}_R2_trimmed.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R1_trimming_report.txt
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R2_trimming_report.txt
# rule quant
salmon quant --writeUnmappedNames --threads 1 -i {config.salmon} -l A -1 output/trim/{sampleName}_R1_trimmed.fastq.gz -2 output/trim/{sampleName _R2_trimmed.fastq.gz -o output/quant/{sampleName}
# rule align
hisat2 -q -x {config.hisat2} -1 output/trim/{sampleName}_R1_trimmed.fastq gz -2 output/trim/{sampleName}_R2_trimmed.fastq.gz | samtools view -u | samtools sort -o output/align/{sampleName}_sorted.bam
samtools flagstat output/align/{sampleName}_sorted.bam > output/align {sampleName}_stats.txt
samtools index output/align/{sampleName}_sorted.bam
# rule signal
bamCoverage -b output/align/{sampleName}_sorted.bam -o output/signal unstranded/{sampleName}.bw
# rule strandedSignal
bamCoverage --filterRNAstrand forward -b output/align/{sampleName}_sorted bam -o output/signal/stranded/{sampleName}_fwd.bw
bamCoverage --filterRNAstrand reverse -b output/align/{sampleName}_sorted bam -o output/signal/stranded/{sampleName}_rev.bw
# rule multiqc
multiqc -f output/* -o output/QC
mv output/QC/multiqc_report.html output/QC/{runName}_multiqc_report.html
mv output/QC/multiqc_data output/QC/{runName}_multiqc_data
# rule tximport
Rscript workflows/utils/txImporter.R output/{runName}_RNApipeSamplesheet txt {config.gtf} output/quant {runName}Example of code for mergeSignal workflow
# rule mergeAlign
samtools merge output/mergeAlign/{mergeName}_sorted.bam output/align {sampleName1}_sorted.bam output/align/{sampleName2}_sorted.bam
samtools flagstat output/mergeAlign/{mergeName}_sorted.bam > output mergeAlign/{mergeName}_stats.txt
samtools index output/mergeAlign/{mergeName}_sorted.bam
# rule mergeSignal
bamCoverage -b output/mergeAlign/{mergeName}_sorted.bam -o output mergeSignal/unstranded/{mergeName}.bw
# rule mergeStrandedSignal
bamCoverage --filterRNAstrand forward -b output/mergeAlign/{mergeName _sorted.bam -o output/mergeSignal/stranded/{mergeName}_fwd.bw
bamCoverage --filterRNAstrand reverse -b output/mergeAlign/{mergeName _sorted.bam -o output/mergeSignal/stranded/{mergeName}_rev.bwThe ChIPpipe and ATACpipe SLURM wrappers will launch either the baseline or merged workflows via the ChIPpipeLauncher and ATACpipeLauncher decision workflows.
For single-end ChIP/CUT&RUN data, be sure to set paired: False in the ChIPconfig.yaml file and modify the ChIPsamplesheetSE.txt file accordingly.
Both baseline and merged workflows output peak calls, a count matrix, alignments, and signal tracks. Both workflows generate a final summary peakCounts.tsv file containing the counts from each individual alignment at the merged set of peaks.The ChIPpipeMerged and ATACpipeMerged workflows call peaks from merged alignments, rather than individual alignments, based on the ChIPconfig.yaml and ATACconfig.yaml files.
Output files are named according to the fileNamesFrom entry in the config file. Merging is dictated by the mergeBy entry in the config file. If mergeBy is empty or equivalent to fileNamesFrom, only the baseline workflow will be run.
The ChIPpipe and ATACpipe core workflows (without merging) are as follows:

ChIPpipe-{jobid}.out # or ATACpipe-{jobid}.out
output/
├── align
├── benchmarks
├── {runName}_ChIPpipeSamplesheet.txt # {runName}_ATACpipeSamplesheet.txt
├── logs
├── logs_slurm
├── (mergeAlign) # merging only
├── (mergeSignal) # merging only
├── peaks
├── QC
├── signal
└── trimExample of code for baseline ChIPpipe workflow
# rule fastqc
fastqc -o output/QC {read1}.fastq.gz {read2}.fastq.gz
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R1_fastqc.zip
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R2_fastqc.zip
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
# rule trim
trim_galore -o output/trim --paired {read1}.fastq.gz {read2}.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_val_1.fq.gz output trim/{sampleName}_R1_trimmed.fastq.gz
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_val_2.fq.gz output trim/{sampleName}_R2_trimmed.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R1_trimming_report.txt
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R2_trimming_report.txt
# rule align
bwa mem -t 8 {config.bwamem} output/trim/{sampleName}_R1_trimmed.fastq.gz output/trim/{sampleName}_R2_trimmed.fastq.gz | samtools view -u | samtools sort -o output/align/{sampleName}_sorted.bam
samtools flagstat output/align/{sampleName}_sorted.bam > output/align/{sampleName}_stats.txt
java -Xmx16g -jar /nas/longleaf/apps/picard/2.10.3/picard-2.10.3/picard.jar MarkDuplicates I=output/align/{sampleName}_sorted.bam O=output/align/{sampleName}_nodups_sorted.bam M=output/align/{sampleName}_dup_metrics.txt REMOVE_SEQUENCING_DUPLICATES=true
samtools index output/align/{sampleName}_nodups_sorted.bam
# rule signal
bamCoverage -b output/align/{sampleName}_nodups_sorted.bam -o output/signal unstranded/{sampleName}.bw
# rule peaks
macs2 callpeak -t output/align/{sampleName}_nodups_sorted.bam -f BAM -q 0.01 -g hs --nomodel --shift 0 --extsize 200 --keep-dup all -B --SPMR --outdir output/peaks -n {sampleName}
# rule multiqc
multiqc -f output/* -o output/QC
mv output/QC/multiqc_report.html output/QC/{runName}_multiqc_report.html
mv output/QC/multiqc_data output/QC/{runName}_multiqc_data
# rule countMatrix
cat output/peaks/*_peaks.narrowPeak | awk '{{ OFS="\t"}};{{ print $1, $2, $3, $4 }}' | sort -k1,1 -k2,2n | bedtools merge > TEMP.bed
grep -ve "-1" TEMP.bed > output/peaks/{runName}_mergedPeaks.bed
printf "chr\tstart\tstop\t" > output/peaks/{runName}_peakCounts.tsv
for f in output/align/*_nodups_sorted.bam; do NAME=$(basename $f _filter_sorted.bam); printf '%s\t' "$NAME" >> output/peaks/{runName}_peakCounts.tsv; done
printf "\n" >> output/peaks/{runName}_peakCounts.tsv
bedtools multicov -bams output/align/*_nodups_sorted.bam -bed output/peaks/{runName}_mergedPeaks.bed >> output/peaks/{runName}_peakCounts.tsvExample of code for merged ChIPpipe workflow
# rule fastqc
fastqc -o output/QC {read1}.fastq.gz {read2}.fastq.gz
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R1_fastqc.zip
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R2_fastqc.zip
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
# rule trim
trim_galore -o output/trim --paired {read1}.fastq.gz {read2}.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_val_1.fq.gz output trim/{sampleName}_R1_trimmed.fastq.gz
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_val_2.fq.gz output trim/{sampleName}_R2_trimmed.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R1_trimming_report.txt
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R2_trimming_report.txt
# rule align
bwa mem -t 8 {config.bwamem} output/trim/{sampleName}_R1_trimmed.fastq.gz output/trim/{sampleName}_R2_trimmed.fastq.gz | samtools view -u | samtools sort -o output/align/{sampleName}_sorted.bam
samtools flagstat output/align/{sampleName}_sorted.bam > output/align/{sampleName}_stats.txt
java -Xmx16g -jar /nas/longleaf/apps/picard/2.10.3/picard-2.10.3/picard.jar MarkDuplicates I=output/align/{sampleName}_sorted.bam O=output/align/{sampleName}_nodups_sorted.bam M=output/align/{sampleName}_dup_metrics.txt REMOVE_SEQUENCING_DUPLICATES=true
samtools index output/align/{sampleName}_nodups_sorted.bam
# rule signal
bamCoverage -b output/align/{sampleName}_nodups_sorted.bam -o output/signal unstranded/{sampleName}.bw
# rule mergeAlign
samtools merge output/mergeAlign/{mergeName}_nodups_sorted.bam output/align/{sampleName1}_nodups_sorted.bam output/align/{sampleName2}_nodups_sorted.bam
samtools flagstat output/mergeAlign/{mergeName}_nodups_sorted.bam > output/mergeAlign/{mergeName}_stats.txt
samtools index output/mergeAlign/{mergeName}_nodups_sorted.bam
# rule mergeSignal
bamCoverage -b output/mergeAlign/{mergeName}_nodups_sorted.bam -o output/mergeSignal/{mergeName}.bw
# rule peaks
macs2 callpeak -t output/mergeAlign/{mergeName}_nodups_sorted.bam -f BAM -q 0.01 -g hs --nomodel --shift 0 --extsize 200 --keep-dup all -B --SPMR --outdir output/peaks -n {mergeName}
# rule multiqc
multiqc -f output/* -o output/QC
mv output/QC/multiqc_report.html output/QC/{runName}_multiqc_report.html
mv output/QC/multiqc_data output/QC/{runName}_multiqc_data
# rule countMatrix
cat output/peaks/*_peaks.narrowPeak | awk '{{ OFS="\t"}};{{ print $1, $2, $3, $4 }}' | sort -k1,1 -k2,2n | bedtools merge > TEMP.bed
grep -ve "-1" TEMP.bed > output/peaks/{runName}_mergedPeaks.bed
printf "chr\tstart\tstop\t" > output/peaks/{runName}_peakCounts.tsv
for f in output/align/*_nodups_sorted.bam; do NAME=$(basename $f _filter_sorted.bam); printf '%s\t' "$NAME" >> output/peaks/{runName}_peakCounts.tsv; done
printf "\n" >> output/peaks/{runName}_peakCounts.tsv
bedtools multicov -bams output/align/*_nodups_sorted.bam -bed output/peaks/{runName}_mergedPeaks.bed >> output/peaks/{runName}_peakCounts.tsvExample of code for baseline ATACpipe workflow
# rule fastqc
fastqc -o output/QC {read1}.fastq.gz {read2}.fastq.gz
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R1_fastqc.zip
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R2_fastqc.zip
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
# rule trim
trim_galore -o output/trim --paired {read1}.fastq.gz {read2}.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_val_1.fq.gz output trim/{sampleName}_R1_trimmed.fastq.gz
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_val_2.fq.gz output trim/{sampleName}_R2_trimmed.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R1_trimming_report.txt
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R2_trimming_report.txt
# rule align
bwa mem -t 8 {config.bwamem} output/trim/{sampleName}_R1_trimmed.fastq.gz output/trim/{sampleName}_R2_trimmed.fastq.gz | samtools view -u | samtools sort -o output/align/{sampleName}_sorted.bam
samtools flagstat output/align/{sampleName}_sorted.bam > output/align/{sampleName}_stats.txt
java -Xmx16g -jar /nas/longleaf/apps/picard/2.10.3/picard-2.10.3/picard.jar MarkDuplicates I=output/align/{sampleName}_sorted.bam O=output/align/{sampleName}_nodups_sorted.bam M=output/align/{sampleName}_dup_metrics.txt REMOVE_SEQUENCING_DUPLICATES=true
samtools index output/align/{sampleName}_nodups_sorted.bam
samtools idxstats output/align/{sampleName}_nodups_sorted.bam | cut -f 1 | grep -v 'chrM' | xargs samtools view -b output/align/{sampleName}_nodups_sorted.bam > output/align/{sampleName}_nodups_filtered_sorted.bam
samtools index output/align/{sampleName}_nodups_filtered_sorted.bam
# rule signal
bamCoverage -b output/align/{sampleName}_nodups_sorted.bam -o output/signal unstranded/{sampleName}.bw
# rule peaks
macs2 callpeak -t output/align/{sampleName}_nodups_sorted.bam -f BAM -q 0.01 -g hs --nomodel --shift 100 --extsize 200 --keep-dup all -B --SPMR --outdir output/peaks -n {sampleName}
# rule multiqc
multiqc -f output/* -o output/QC
mv output/QC/multiqc_report.html output/QC/{runName}_multiqc_report.html
mv output/QC/multiqc_data output/QC/{runName}_multiqc_data
# rule countMatrix
cat output/peaks/*_peaks.narrowPeak | awk '{{ OFS="\t"}};{{ print $1, $2, $3, $4 }}' | sort -k1,1 -k2,2n | bedtools merge > TEMP.bed
grep -ve "-1" TEMP.bed > output/peaks/{runName}_mergedPeaks.bed
printf "chr\tstart\tstop\t" > output/peaks/{runName}_peakCounts.tsv
for f in output/align/*_nodups_filtered_sorted.bam; do NAME=$(basename $f _filter_sorted.bam); printf '%s\t' "$NAME" >> output/peaks/{runName}_peakCounts.tsv; done
printf "\n" >> output/peaks/{runName}_peakCounts.tsv
bedtools multicov -bams output/align/*_nodups_filtered_sorted.bam -bed output/peaks/{runName}_mergedPeaks.bed >> output/peaks/{runName}_peakCounts.tsvExample of code for merged ATACpipe workflow
# rule fastqc
fastqc -o output/QC {read1}.fastq.gz {read2}.fastq.gz
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R1_fastqc.zip
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.zip output/QC {sampleName}_R2_fastqc.zip
mv output/QC/$(basename {read1}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
mv output/QC/$(basename {read2}.fastq.gz .fastq.gz)_fastqc.html output/QC {sampleName}_R1_fastqc.html
# rule trim
trim_galore -o output/trim --paired {read1}.fastq.gz {read2}.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_val_1.fq.gz output trim/{sampleName}_R1_trimmed.fastq.gz
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_val_2.fq.gz output trim/{sampleName}_R2_trimmed.fastq.gz
mv output/trim/$(basename {read1}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R1_trimming_report.txt
mv output/trim/$(basename {read2}.fastq.gz .fastq.gz)_trimming_report.txt output/trim/{sampleName}_R2_trimming_report.txt
# rule align
bwa mem -t 8 {config.bwamem} output/trim/{sampleName}_R1_trimmed.fastq.gz output/trim/{sampleName}_R2_trimmed.fastq.gz | samtools view -u | samtools sort -o output/align/{sampleName}_sorted.bam
samtools flagstat output/align/{sampleName}_sorted.bam > output/align/{sampleName}_stats.txt
java -Xmx16g -jar /nas/longleaf/apps/picard/2.10.3/picard-2.10.3/picard.jar MarkDuplicates I=output/align/{sampleName}_sorted.bam O=output/align/{sampleName}_nodups_sorted.bam M=output/align/{sampleName}_dup_metrics.txt REMOVE_SEQUENCING_DUPLICATES=true
samtools index output/align/{sampleName}_nodups_sorted.bam
samtools idxstats output/align/{sampleName}_nodups_sorted.bam | cut -f 1 | grep -v 'chrM' | xargs samtools view -b output/align/{sampleName}_nodups_sorted.bam > output/align/{sampleName}_nodups_filtered_sorted.bam
samtools index output/align/{sampleName}_nodups_filtered_sorted.bam
# rule signal
bamCoverage -b output/align/{sampleName}_nodups_sorted.bam -o output/signal unstranded/{sampleName}.bw
# rule mergeAlign
samtools merge output/mergeAlign/{mergeName}_nodups_sorted.bam output/align/{sampleName1}_nodups_sorted.bam output/align/{sampleName2}_nodups_sorted.bam
samtools flagstat output/mergeAlign/{mergeName}_nodups_sorted.bam > output/mergeAlign/{mergeName}_stats.txt
samtools index output/mergeAlign/{mergeName}_nodups_sorted.bam
# rule mergeSignal
bamCoverage -b output/mergeAlign/{mergeName}_nodups_sorted.bam -o output/mergeSignal/{mergeName}.bw
# rule peaks
macs2 callpeak -t output/mergeAlign/{mergeName}_nodups_sorted.bam -f BAM -q 0.01 -g hs --nomodel --shift 100 --extsize 200 --keep-dup all -B --SPMR --outdir output/peaks -n {mergeName}
# rule multiqc
multiqc -f output/* -o output/QC
mv output/QC/multiqc_report.html output/QC/{runName}_multiqc_report.html
mv output/QC/multiqc_data output/QC/{runName}_multiqc_data
# rule countMatrix
cat output/peaks/*_peaks.narrowPeak | awk '{{ OFS="\t"}};{{ print $1, $2, $3, $4 }}' | sort -k1,1 -k2,2n | bedtools merge > TEMP.bed
grep -ve "-1" TEMP.bed > output/peaks/{runName}_mergedPeaks.bed
printf "chr\tstart\tstop\t" > output/peaks/{runName}_peakCounts.tsv
for f in output/align/*_nodups_filtered_sorted.bam; do NAME=$(basename $f _filter_sorted.bam); printf '%s\t' "$NAME" >> output/peaks/{runName}_peakCounts.tsv; done
printf "\n" >> output/peaks/{runName}_peakCounts.tsv
bedtools multicov -bams output/align/*_nodups_filtered_sorted.bam -bed output/peaks/{runName}_mergedPeaks.bed >> output/peaks/{runName}_peakCounts.tsvNote: Currently only compatible with RNApipe. Adjustments required for ATAC/ChIP.
Sometimes you want to create merged signal tracks using several different merging schemes. The mergeSignal workflow can be used following a previous run to generate new signal tracks in this way (requires at least existing .bam and .bai files for samples in the samplesheet). Merging will be determined using the mergeBy parameter in the config sheet used. The general steps are:
- Complete a core run, featuring at minimum the following output structure:
output/
└── align
├── sample1.bam
├── sample1.bam.bai
├── sample2.bam
└── sample2.bam.bai-
Adjust the
config.yamlfilemergeByparameter to merge accordingly. -
Launch the
mergeSignalworkflow (currently manual):
snakemake -s workflows/mergeSignal.snakefile --configfile "config/RNAconfig.yaml" --cluster-config "config/cluster.yaml" --cluster "sbatch -J {cluster.name} -p {cluster.partition} -t {cluster.time} -c {cluster.cpusPerTask} --mem-per-cpu={cluster.memPerCpu} -N {cluster.nodes} --output {cluster.output} --error {cluster.error} --parsable" --cluster-status ./workflows/utils/status.py -j 100 --max-jobs-per-second 5 --max-status-checks-per-second 5 --rerun-incomplete -p --latency-wait 500 Output directory structure:
output/
├── align
│ ├── sample_WT_1.bam
│ ├── sample_WT_1.bam.bai
│ ├── sample_WT_2.bam
│ └── sample_WT_2.bam.bai
├── mergeSignal
│ ├── sample_WT.bam
│ └── sample_WT.bam.bai
└── mergeAlign
└── sample_WT.bw
Run the benchmarking.py script within workflows/utils in order to summarize all existing benchmark files.
module load python/3.6.6
python3 ./workflows/utils/benchmarking.py
Use this command when you want to delete or add to a previously generated output.
./unlock.sh rna
./unlock.sh chip
./unlock.sh atacGeneral
- Move to-do list to github issues!
- Make creation of the samplesheet a rule
RNApipe
- Make
fastqcrule more robust to different file extensions (currently hardcoded to accept.fastq.gzonly; new param in config?) - (Optional) Make output for quant rule (Salmon) accept the directories themselves as outputs
- Consider trying to incorporate the mergeSignal into RNApipeCore
ChIPpipe + ATACpipe
- Test and facilitate compatibility with mergeSignal
- Consider editing ChIPpipe rule countMatrix to create header line from
{wildcards.sampleName} - Adjust chrM format to be a config parameter in rule align
- Move MACS2 peak calling settings into config