Skip to content

vierstralab/long-read-RNAseq

Repository files navigation

long-read-RNAseq

PIPELINE for identifying and quantifying known and novel genes/isoforms in long-read RNA-seq data

link to description with pictures

OVERVIEW

Nextflow pipeline for identifying and quantifying known and novel genes/isoforms in long-read RNA-seq data. Now it works with human data from Oxford Nanopore platforms (in the future - PacBio and mice).

alignment Minimap2 mapping Oxford Nanopore reads to the genome

cleaning Transcriptclean corrects mismatches, microindels, and noncanonical splice junctions in long reads that have been mapped to the genome (to fix artifactual noncanonical splice junctions)

gene / isoform searching TALON identifying and quantifying known and novel genes / isoforms in long-read transcriptome data sets

image text

INPUTS

In file params.config:

Variables to be changed

  • samples_file - csv-file with row_id (number), IDs of sample ( we use LN as ID) and full pathway to every file with reads for every sample. Header : row_id,ln,pathway 1 row - 1 pathway to read file Example - /net/seq/data2/projects/amuravyova/nf-long-reads-align/FETAL/11_20_fetal_with_pathways.csv

  • outdir - directory where you want to put the results

  • description - description of the date (does not affect the analysis)

  • platform - platform that was used for generating the data (does not affect the analysis)

Variables could to be changed (please don’t touch them now)

  • genome_fasta - fasta file containing the reference genome used in mapping
  • genome_gtf - gtf file containing the reference annotation
  • spl_jnk - high-confidence splice junction file This file is necessary if you want to correct noncanonical splice junctions
  • known_variants_vcf - vcf file containing variants
  • conda

HOW TO RUN

  1. create samples_file
  2. set the required variable values (samples_file, outdir, description, platform) in file params.config

Caution

save file changes !

  1. run in tmux :
module load nextflow/22.04.3 
nextflow run test_tuples.nf  -profile Altius -entry tuple

Important

please check that nobody else runs it now !

Results will be in the folder you set as outdir in file params.config

OUTPUTS

image text

QC description

About

PIPELINE for identifying and quantifying known and novel genes/isoforms in long-read RNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors