This repository contains a Snakemake pipeline for processing and analyzing sequencing data. The pipeline uses several tools, including ABS-Scan and 3PSeq Analysis, along with custom scripts for post-processing. It also takes care of sexual and asexual seperately.
This pipeline processes 3P-Seq data to analyze polyadenylation events in Schmidtea mediterranea. The workflow includes:
- Quality control with
fastqc - Read alignment using
bowtie - Peak detection for 3' polyadenylation sites
- Hexamer sequence analysis and secondary PAS signal detection
Run the following command to install dependencies:
brew install fastqc bowtie bedtools samtools wget
pip3 install joblib scipy numpyEnsure you have Miniconda or Anaconda installed. Then, create and activate a conda environment for running Snakemake:
# Create the environment (if not already created)
conda create -n snakemake -c bioconda -c conda-forge snakemake
# Activate the environment
conda activate snakemake(Note: If your environment is already named "snakemake", simply activate it with conda activate snakemake.)
Confirm that Snakemake is installed by checking its version:
snakemake --versionA version number should be printed.
Review the config.yaml file to ensure the sample data, genome paths, and output directories match your setup. Update any paths if needed.
Before running the full pipeline, perform a dry-run to detect any issues:
snakemake -nThis command will print the planned jobs without executing them. You can also check the workflow syntax with:
snakemake --lintThis command will generate graphical representation of the piepline
snakemake --dag | dot -Tsvg > dag.svgAfter verifying that the dry-run completes without errors, start the pipeline. Specify the number of cores (adjust based on your system’s capability):
snakemake --cores 4If your workflow rules use individual conda environments, run:
snakemake --use-conda --cores 4-
Logs and Output Directories:
The pipeline writes output to directories specified in theconfig.yamlfile (e.g.,qc/,data/trimmed/,data/aligned/,results/). Check these directories to monitor the progress and results of your analysis. -
Interrupting and Resuming:
You can interrupt the pipeline withCtrl+C. Resume the analysis by rerunning the samesnakemakecommand; it will pick up from where it left off.
- Make sure all input files (FASTQ, genome FASTA, GFF) are in the correct locations as specified in
config.yaml. - Custom scripts (
identify_last_exons.py,combine_results.py) and tools (ABS-Scan-master,3PSeq_analysis-master) should have the proper permissions to execute.
bash bowtie-build --wrapper basic-0 SmedSxl_genome_v4.0.nt.gz bowtie_index/genome
Go to the 3P https://www.ncbi.nlm.nih.gov/sra/?term=SRP070102
Download Fastq files
-
SRR3168630 - Planaria Asexual Strain - single end Poly A transcriptomic - Illumina GAIIx
-
SRR3169012 - Planaria Sexual Strain - single end Poly A transcriptomic - Illumina GAIIx
-
SRR3168939 - Planaria Sexual Strain - single end Poly A transcriptomic - Illumina GAIIx
-
SRR3168624 - Planaria Sexual Strain - single end Poly A transcriptomic - Ion torrent PGM
brew install fastqc
sudo cp /opt/homebrew/bin/fastqc /usr/local/bin/
mkdir -p qc
fastqc -t 4 *fastq.gz -o qc/
By following these steps, you can install, test, and run the Snakemake-based workflow successfully.