Skip to content

cmwilson24/Domain_Screening

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 

Repository files navigation

Domain Screening

Gilbert Lab domain screening analysis pipeline image

PacBio Demultiplexing with lima You will need a barcode fasta file that will look something like this:

lima --ccs --same --split-named --dump-clips -j 20 --log-level=INFO input.ccs.bam barcodes.fa output.ccs.demux.bam

Trim demultiplexed files - trimming is only necessary if sequencing on a SequelII/IIe. Otherwise skip.

seqtk trimfq -b 16 -e 16 input.fq > trimmed.fq

Gzip your files

pigz -p 8 file.fq

Create a sample sheet for your experiments (.yaml file) - see example

Alternatively, if you have a lot of samples, you can do the following set of commands (for batch/high throughput sample processing):* #Start with conda environment needed to demux and trim files. If using PacBio REVIO - We don't need to do CCS because these are HIFI reads conda activate PacBio

lima --same --split-named --dump-clips -j 25 --log-level=INFO m84066_231208_174057_s1.hifi_reads.bam  DCF007_barcodes.fa ccs.demux.bam 

#Run lima to demux files based on the PCR barcoded primers specified in the barcodes.fa file.

#Use the ccs.demux.report.counts to get a sense of read numbers per demuxed samples If you used multiple flowcells and now need to combine samples

find . -name "*.bam" -type f -exec basename {} \; | sort | uniq -c | awk '$1 == 3 {print $2}' > common_files.txt

Merging samples that have the same file name (in this example we have 3 directories/flow cells)

while read -r filename; do
    samtools merge -@ <#threads> combined_bams/"$filename" your_directory1/hifi_reads/"$filename" your_directory2/hifi_reads/"$filename" your_directory3/hifi_reads/"$filename"
done < common_files.txt

Convert bam file to fastq

for bam_file in combined_bams/*.bam; do
    output_prefix="fastq_output/$(basename "$bam_file" .bam)"
    samtools bam2fq -@ 25 "$bam_file" > "$output_prefix.fastq"
done

Trimming fastqs and zipping

conda actviate SeqTrim
mkdir trimmed_output  # Create a directory to store the trimmed FASTQ files

for fq_file in *.fastq; do
    output_file="trimmed_output_newbc/tr_${fq_file}"
    seqtk trimfq -b 16 -e 16 "$fq_file" > "$output_file"
done
#will need to gzip them all 
for trimmed_file in trimmed_output_newbc/*.fastq; do
    gzip "$trimmed_file"; done

Time for alignments In a new conda environment, install the following: First install 3 fusions from JH. See dCas9-fusions for next steps. They will start with the following...

git clone git@github.com:jeffhussmann/hits.git
git clone git@github.com:jeffhussmann/knock-knock.git
git clone git@github.com:jeffhussmann/dCas9-fusions.git
#conda activate JH_hits_newer
count_dCas9_fusion_domains parallel --batch DCF007_updated_bc ./ 30

About

Gilbert Lab domain screening pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published