Skip to content

Problems extraction proteins/cds when ? in strand column even though feature is not gene #147

@jd3234

Description

@jd3234

Hi,

we have some gff3s that contain ? in the strand column. These features are not genes (in our case oriC in bacteria). Nonetheless, when I try to extract protein and cds sequences it fails with an error:

gffread r1.gff3 -S -g chromosomes.fasta -y protein.fasta

Error parsing strand (?) from GFF line:
chr1	PRODIGALEXEX	oriC	663246	664269	.	?	.	ID=JMHBPODKGB_1056;Name=origin of replication;product=origin of replication;inference=similar to DNA sequence

Using ? is OK following GFF3 specs (https://github.com/the-sequence-ontology/specifications/blob/master/gff3.md):

Column 7: "strand"
The strand of the feature. + for positive strand (relative to the landmark), - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown.

Could it be an option that you ignore the ? and not output protein/cds in such cases or add a parameter to switch that behaviour on? Or is there maybe a way I have not seen in the help?

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions