Skip to content

01 Sequence Alignment

Ryan Schubert edited this page Sep 25, 2018 · 4 revisions

Sequence alignment is a necessary step for users beginning with fastq files. Users who already have alignment files (bam/sam) can move on to the next step. The pipeline has two wrappers capable of performing alignment, star_loop and salmon_loop. Salmon is considerably faster than other aligners as it does not perform a true alignment. While this pseudo-alignment is incredibly fast, the trade off is a loss of metadata that other softwares may need (eg leafcutter). Accuracy may also be a concern compared to traditional aligners such as STAR which is currently one of the most accurate aligners in existence. For the softwares in use in this pipeline salmon is adequate for most analyses.

Alignment with Salmon_loop

Alignment with salmon is relatively fast and accurate, however it only performs alignment to the transcriptome not the genome. For the purposes of this pipeline salmon is preferred over other methods. The wrapper script serves as a lightweight interpreter of user inputs. To trigger alignment based analysis one only needs to provide the directory containing fastq files with the -f option. Currently this wrapper is only capable of interpreting paired end fastq files, single end files must be fed to STAR. Additionally, one needs to provide salmon with a list of samples (assumed to be paired end), a transcriptome file, an annotation file, and an index. If an index for the transcriptome file has not been generated, the --runindex option will create one prior to alignment. Please see the salmon_loop page for more thorough documentation

Example

# If paired end
./salmon_loop -t $PATH/$TO/transcriptome.fa --runindex -f $PATH/$TO/fastqdirectory/ -a $PATH/$TO/annotations.gencode.gtf -s $PATH/$TO/sample_list.txt
# If single end
./salmon_loop -t $PATH/$TO/transcriptome.fa --runindex -f $PATH/$TO/fastqdirectory/ -a $PATH/$TO/annotations.gencode.gtf -s $PATH/$TO/sample_list.txt --single-end

Alignment with star_loop

Alignments with STAR have very high fidelity to their reference file. The requirements for aligning with STAR are very similar to the alignments with salmon. Star_loop requires a sample list, the directory containing fastq files, a genome file, an annotation file, and an index. Please see the star_loop page for more thorough documentation.

Example

./star_loop -g $PATH/$TO/genome.fa -s $PATH/$TO/sample_list.txt --inputdirectory $PATH/$TO/fastqdirectory/  -a $PATH/$TO/annotations.gencode.gtf --runindex

Clone this wiki locally