Skip to content

Micaella/RNA-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA-seq Scientific Workflow

Workflow for RNA sequencing using the Parallel Scripting Library - Parsl.

Requirements

In order to use RNA-seq Workflow the following tools must be available:

You can install Bowtie2 by running:

bowtie2-2.3.5.1-linux-x86_64.zip

Or

sudo yum install bowtie2-2.3.5-linux-x86_64

HTSeq is a native Python library that folows conventions of many Python packages. You can install it by running:

pip install HTSeq

HTSeq uses NumPy, Pysam and matplotlib. Be sure this tools are installed.

To use DESEq2 script make sure R language is also installed. You can install it by running:

sudo apt install r-base

The recommended way to install Parsl is the suggest approach from Parsl's documentation:

python3 -m pip install parsl

To use Parsl, you need Python 3.5 or above. You also need Python to use HTSeq, so you should load only one Python version.

Workflow invocation

First of all, make a Comma Separated Values (CSV) file. So, onto the first line type: sampleName,fileName,condition. Remember, there must be no spaces between items. You can use the file "table.csv" in this repository as an example. Your CSV file will be like this:

sampleName fileName condition
tissue control 1 SRR5445794.fastq control
tissue control 2 SRR5445795.fastq control
tissue control 3 SRR5445796.fastq control
tissue wntup 1 SRR5445797.fastq wntup
tissue wntup 2 SRR5445798.fastq wntup
tissue wntup 3 SRR5445799.fastq wntup

The list of command line arguments passed to Python script, beyond the script's name, must be: the indexed genome, the threads' number for bowtie task, read fastaq file, directory's name where the output files must be placed, GTF file and lastly the DESeq script. Make sure all the files necessary to run the workflow are in the same directory and the fastaq files in a dedicated folder, as a input directory. The command line will be like this:

python3 rna-seq.py ../mm9/mm9 6 ../inputs/SRR ../outputs ../Mus_musculus.NCBIM37.67.gtf ../DESeq.R

On this first version the workflow search, on the input files' directory, for a pattern on the prefix in the files' name. So, for running this workflow you need pass this pattern. In the table, as you can see, the pattern is "SRR".

About

Workflow científico utilizando Parsl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •