Snakemake pipeline for 3P-Seq Analysis

This repository contains a Snakemake pipeline for processing and analyzing sequencing data. The pipeline uses several tools, including ABS-Scan and 3PSeq Analysis, along with custom scripts for post-processing. It also takes care of sexual and asexual seperately.

Overview

This pipeline processes 3P-Seq data to analyze polyadenylation events in Schmidtea mediterranea. The workflow includes:

Quality control with fastqc
Read alignment using bowtie
Peak detection for 3' polyadenylation sites
Hexamer sequence analysis and secondary PAS signal detection

Installation and Running the Pipeline

Run the following command to install dependencies:

brew install fastqc bowtie bedtools samtools wget
pip3 install joblib scipy numpy

1. Environment Setup

Ensure you have Miniconda or Anaconda installed. Then, create and activate a conda environment for running Snakemake:

# Create the environment (if not already created)
conda create -n snakemake -c bioconda -c conda-forge snakemake

# Activate the environment
conda activate snakemake

(Note: If your environment is already named "snakemake", simply activate it with conda activate snakemake.)

2. Verifying the Installation

Confirm that Snakemake is installed by checking its version:

snakemake --version

A version number should be printed.

3. Configuring the Pipeline

Review the config.yaml file to ensure the sample data, genome paths, and output directories match your setup. Update any paths if needed.

4. Testing the Snakefile

Before running the full pipeline, perform a dry-run to detect any issues:

snakemake -n

This command will print the planned jobs without executing them. You can also check the workflow syntax with:

snakemake --lint

This command will generate graphical representation of the piepline

snakemake --dag | dot -Tsvg > dag.svg

5. Running the Pipeline

After verifying that the dry-run completes without errors, start the pipeline. Specify the number of cores (adjust based on your system’s capability):

snakemake --cores 4

If your workflow rules use individual conda environments, run:

snakemake --use-conda --cores 4

6. Monitoring and Output

Logs and Output Directories:
The pipeline writes output to directories specified in the config.yaml file (e.g., qc/, data/trimmed/, data/aligned/, results/). Check these directories to monitor the progress and results of your analysis.
Interrupting and Resuming:
You can interrupt the pipeline with Ctrl+C. Resume the analysis by rerunning the same snakemake command; it will pick up from where it left off.

Additional Notes

Make sure all input files (FASTQ, genome FASTA, GFF) are in the correct locations as specified in config.yaml.
Custom scripts (identify_last_exons.py, combine_results.py) and tools (ABS-Scan-master, 3PSeq_analysis-master) should have the proper permissions to execute.

making bowtie index

bash bowtie-build --wrapper basic-0 SmedSxl_genome_v4.0.nt.gz bowtie_index/genome

download fastq

Go to the 3P https://www.ncbi.nlm.nih.gov/sra/?term=SRP070102

Download Fastq files

SRR3168630 - Planaria Asexual Strain - single end Poly A transcriptomic - Illumina GAIIx
SRR3169012 - Planaria Sexual Strain - single end Poly A transcriptomic - Illumina GAIIx
SRR3168939 - Planaria Sexual Strain - single end Poly A transcriptomic - Illumina GAIIx
SRR3168624 - Planaria Sexual Strain - single end Poly A transcriptomic - Ion torrent PGM

fastqc look at the quality of the reads

brew install fastqc
sudo cp /opt/homebrew/bin/fastqc /usr/local/bin/
mkdir -p qc
fastqc -t 4 *fastq.gz -o qc/

By following these steps, you can install, test, and run the Snakemake-based workflow successfully.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
3PSeq_analysis-master		3PSeq_analysis-master
envs		envs
qc		qc
.gitignore		.gitignore
3P_analysis.old.dag		3P_analysis.old.dag
3P_analysis.old.png		3P_analysis.old.png
3P_analysis.png		3P_analysis.png
README.md		README.md
Snakefile		Snakefile
Snakefile.old		Snakefile.old
Snakefile.v0		Snakefile.v0
combine_results.py		combine_results.py
common.smk		common.smk
config.yaml		config.yaml
identify_last_exons.py		identify_last_exons.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake pipeline for 3P-Seq Analysis

Overview

Installation and Running the Pipeline

1. Environment Setup

2. Verifying the Installation

3. Configuring the Pipeline

4. Testing the Snakefile

5. Running the Pipeline

6. Monitoring and Output

Additional Notes

making bowtie index

download fastq

fastqc look at the quality of the reads

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snakemake pipeline for 3P-Seq Analysis

Overview

Installation and Running the Pipeline

1. Environment Setup

2. Verifying the Installation

3. Configuring the Pipeline

4. Testing the Snakefile

5. Running the Pipeline

6. Monitoring and Output

Additional Notes

making bowtie index

download fastq

fastqc look at the quality of the reads

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages