Domain Screening

Gilbert Lab domain screening analysis pipeline

PacBio Demultiplexing with lima You will need a barcode fasta file that will look something like this:

lima --ccs --same --split-named --dump-clips -j 20 --log-level=INFO input.ccs.bam barcodes.fa output.ccs.demux.bam

Trim demultiplexed files - trimming is only necessary if sequencing on a SequelII/IIe. Otherwise skip.

seqtk trimfq -b 16 -e 16 input.fq > trimmed.fq

Gzip your files

pigz -p 8 file.fq

Create a sample sheet for your experiments (.yaml file) - see example

Alternatively, if you have a lot of samples, you can do the following set of commands (for batch/high throughput sample processing):* #Start with conda environment needed to demux and trim files. If using PacBio REVIO - We don't need to do CCS because these are HIFI reads conda activate PacBio

lima --same --split-named --dump-clips -j 25 --log-level=INFO m84066_231208_174057_s1.hifi_reads.bam  DCF007_barcodes.fa ccs.demux.bam

#Run lima to demux files based on the PCR barcoded primers specified in the barcodes.fa file.

#Use the ccs.demux.report.counts to get a sense of read numbers per demuxed samples If you used multiple flowcells and now need to combine samples

find . -name "*.bam" -type f -exec basename {} \; | sort | uniq -c | awk '$1 == 3 {print $2}' > common_files.txt

Merging samples that have the same file name (in this example we have 3 directories/flow cells)

while read -r filename; do
    samtools merge -@ <#threads> combined_bams/"$filename" your_directory1/hifi_reads/"$filename" your_directory2/hifi_reads/"$filename" your_directory3/hifi_reads/"$filename"
done < common_files.txt

Convert bam file to fastq

for bam_file in combined_bams/*.bam; do
    output_prefix="fastq_output/$(basename "$bam_file" .bam)"
    samtools bam2fq -@ 25 "$bam_file" > "$output_prefix.fastq"
done

Trimming fastqs and zipping

conda actviate SeqTrim
mkdir trimmed_output  # Create a directory to store the trimmed FASTQ files

for fq_file in *.fastq; do
    output_file="trimmed_output_newbc/tr_${fq_file}"
    seqtk trimfq -b 16 -e 16 "$fq_file" > "$output_file"
done
#will need to gzip them all 
for trimmed_file in trimmed_output_newbc/*.fastq; do
    gzip "$trimmed_file"; done

Time for alignments In a new conda environment, install the following: First install 3 fusions from JH. See dCas9-fusions for next steps. They will start with the following...

git clone git@github.com:jeffhussmann/hits.git
git clone git@github.com:jeffhussmann/knock-knock.git
git clone git@github.com:jeffhussmann/dCas9-fusions.git
#conda activate JH_hits_newer
count_dCas9_fusion_domains parallel --batch DCF007_updated_bc ./ 30

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain Screening

About

Uh oh!

Releases

Packages

cmwilson24/Domain_Screening

Folders and files

Latest commit

History

Repository files navigation

Domain Screening

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages