Skip to content

Not retrieving full transcripts from GFF3, only coding segments (this is without the -C parameter) #140

@vaneet-lotay

Description

@vaneet-lotay

Hello,

I'm not sure if I'm missing something obvious or using the wrong parameters for gffread but when I run it to extract transcript sequences from a GFF3 file it doesn't seem to extract the full transcript from the start to the stop coordinate. I examined a few sequences from the output and as best as I can tell it might just be extracting the CDS segments and not the introns in between, so perhaps the coding sequences but that's not what I want. Here's the command I use (example filenames):

gffread -w transcripts.fa -g genomic_seq.fa gene_models.gff3

In the transcripts.fa file I noticed that the sequences are not the complete transcripts including introns, is there a particular set of parameters that will help me get that output? For example here's the mRNA/transcript line from the GFF3:

Chr1 Xenbase mRNA 329594 331864 . - . ID=mRNA099831;Name=XM_031895097.1;Dbxref=GeneID:116408318,Genbank:XM_031895097.1;Parent=XBXT10g022928;gbkey=mRNA;gene=bbc3;transcript_id=RefSeq:XM_031895097.1;curie=RefSeq:XM_031895097.1;Ontology_term=SO:0000234

I thought I should expect that the sequence length should be approximately the length between that start and end coordinate, unless I'm interpreting this wrongly?

Most of the transcripts I'm dealing with in the GFF3 have both CDS and exon segments that overlap identically except at the start and end of the transcripts since those act as 'implied UTRs'.

Any help you can provide would be appreciated, thanks!

Vaneet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions