Indexing for Mapping Based Quantification

Salmon has two modes, one works on aligned reads and one which works directly on fastqs. This capsule is intended for when you wish to run with unmapped reads and output an index for direct quantification.

Input Data

Genome Fasta

Genome for the particular organism being used (DNA sequence). If a genome is not specified, any file matching *.genome*.fa* in the data directory is used.

Transcript Fasta

Transcriptome for the particular organism being used (RNA transcripts). If a transcriptome is not specified any file matching *transcripts*.fa* is used.

Decoy.txt and gentrome.fa.gz

There are two main ways to generate decoys for Salmon. The first is by generating random sequences from a hard masked version of the organism's genome, the second is to just include the entire genome as a decoy sequence (see documentation). In this capsule we use the latter method. decoy.txt contains the chromosome names for the input genome and gentrome.fa.gz contains a concatenated file of the transcripts and genome. If these files are available in the data directory, they are not generated from scratch. This may be preferable since concatenating the transcript and genome together can be time consuming.

App Panel Parameters

Threads

Total threads used by Salmon. If not specified, all available will be used.

Output folder

Output folder for index. Should be located in results.

Transcripts reference

Fasta of all transcripts.

Genome primary assembly reference

Genome reference, use the same reference as transcripts.

Kmers length

Length of sequence to identify transcripts in reads. Longer is more strict. (--kmerLen)

Gencode Format

This flag will expect the input transcript fasta to be in GENCODE format, and will split the transcript name at the first '|' character (--gencode) (Default: True)

Keep Duplicates

Duplicate transcripts that appear in the input will be retained and quantified separately. (--keepDuplicates) (Default: False)

Outputs

output_folder: Output Salmon index

complete_ref_lens.bin
ctable.bin
ctg_offsets.bin
duplicate_clusters.tsv
info.json
mphf.bin
pos.bin
pre_indexing.log
rank.bin
refAccumLengths.bin
reflengths.bin
refseq.bin
seq.bin
versionInfo.json

decoys.txt: Decoys used in Salmon index (only if not found in data before run) gentrome.fa.gz: Input fasta used for index generation, usually (only if not found in data before run)

Source

https://salmon.readthedocs.io/en/latest/salmon.html

Cite

Roberts, Adam, and Lior Pachter. “Streaming fragment assignment for real-time analysis of sequencing experiments.” Nature Methods 10.1 (2013): 71-73.

Roberts, Adam, et al. “Improving RNA-Seq expression estimates by correcting for fragment bias.” Genome Biology 12.3 (2011): 1.

Patro, Rob, et al. “Salmon provides fast and bias-aware quantification of transcript expression.” Nature Methods (2017). Advanced Online Publication. doi: 10.1038/nmeth.4197..

Love, Michael I., Hogenesch, John B., Irizarry, Rafael A. “Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.” Nature Biotechnology 34.12 (2016). doi: 10.1038/nbt.368.2..

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
environment		environment
images		images
metadata		metadata
README.md		README.md
salmon_index_help.txt		salmon_index_help.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indexing for Mapping Based Quantification

Input Data

App Panel Parameters

Outputs

Source

Cite

About

Uh oh!

Releases

Packages

Languages

spirpinias/Salmon-Index

Folders and files

Latest commit

History

Repository files navigation

Indexing for Mapping Based Quantification

Input Data

App Panel Parameters

Outputs

Source

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages