isoseq2orf is a pipeline designed to convert long-read sequencing data into a representative transcriptome for a specific cancer type and perform primary sequence characterization of novel open reading frames (ORFs).
The workflow of the pipeline is illustrated below:
The pipeline consists of the following key steps:
- Convert raw sequencing data (
000_ccs2gtf.sh) - Predict ORFs and perform quality control (
001_gtf2orf.sh) - Quantify the master transcriptome using short-read RNA-seq (
002_gtf2qnt.sh) - Validate predicted ORFs using MS/MS (
003_orf2ms.sh)
Function:
🔹 Converts raw data from the PacBio sequencer into the master transcriptome.
Function:
🔹 Performs quality control (QC).
🔹 Predicts open reading frames (ORFs).
🔹 Conducts primary sequence characterization of the master transcriptome.
Function:
🔹 Quantifies the master transcriptome based on external short-read RNA-seq datasets.
Function:
🔹 Performs MS/MS validation of the predicted novel ORFs.
This pipeline enables comprehensive analysis of novel ORFs derived from long-read sequencing, integrating transcriptomic, proteomic, and functional data.