PIPELINE for identifying and quantifying known and novel genes/isoforms in long-read RNA-seq data
link to description with pictures
Nextflow pipeline for identifying and quantifying known and novel genes/isoforms in long-read RNA-seq data. Now it works with human data from Oxford Nanopore platforms (in the future - PacBio and mice).
alignment Minimap2 mapping Oxford Nanopore reads to the genome
cleaning Transcriptclean corrects mismatches, microindels, and noncanonical splice junctions in long reads that have been mapped to the genome (to fix artifactual noncanonical splice junctions)
gene / isoform searching TALON identifying and quantifying known and novel genes / isoforms in long-read transcriptome data sets
In file params.config:
Variables to be changed
-
samples_file- csv-file with row_id (number), IDs of sample ( we use LN as ID) and full pathway to every file with reads for every sample. Header : row_id,ln,pathway 1 row - 1 pathway to read file Example - /net/seq/data2/projects/amuravyova/nf-long-reads-align/FETAL/11_20_fetal_with_pathways.csv -
outdir- directory where you want to put the results -
description- description of the date (does not affect the analysis) -
platform- platform that was used for generating the data (does not affect the analysis)
Variables could to be changed (please don’t touch them now)
genome_fasta- fasta file containing the reference genome used in mappinggenome_gtf- gtf file containing the reference annotationspl_jnk- high-confidence splice junction file This file is necessary if you want to correct noncanonical splice junctionsknown_variants_vcf- vcf file containing variantsconda
- create
samples_file - set the required variable values (
samples_file,outdir,description,platform) in file params.config
Caution
save file changes !
- run in tmux :
module load nextflow/22.04.3
nextflow run test_tuples.nf -profile Altius -entry tuple
Important
please check that nobody else runs it now !
Results will be in the folder you set as outdir in file params.config

