Skip to content

Conversation

@j23414
Copy link
Contributor

@j23414 j23414 commented Mar 19, 2024

Description of proposed changes

Adds optional "--start" and "--end" arguments to provide 0-based start and end positions respective to a "--gene" of interest.

Since the GenBank sequences can contain extra sequences off the end of the polyprotein, the start and end positions are relative to the gene of interest which was deemed more stable behavior.

Example of only pulling out E gene in Dengue (original)

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E

Will generate a reference genbank with features:

FEATURES             Location/Qualifiers
     CDS             1..1485
                     /gene="E"
                     /product="envelope protein E"
                     /protein_id="NP_740317.1"
     source          1..1485
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"

Example of pulling E subgenic region (New Feature)

Run with new start and end region:

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E \
  --start 0 \
  --end 9

Will result in:

LOCUS       DENV4/NA/REFERENCE/2003    9 bp    DNA              UNK 01-JAN-1980
DEFINITION  Dengue virus 4, complete genome.
ACCESSION   NC_002640
VERSION     NC_002640.1
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     source          1..9
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"
     CDS             1..9
                     /gene="E_0_9"
ORIGIN
        1 atgcgatgc
//

Related issue(s)

Checklist

  • Checks pass

…enic phylogenetic trees

Adds "--start" and "--end" arguments to provide 0-based start and end positions
respective to a "--gene" of interest.

Since the GenBank sequences can contain extra sequences off the end of the polyprotein,
a the start and end positions are relative to the gene of interest was deemed more stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --start and --end flags to newreferences.py to allow for creating subgenic tree builds

2 participants