Skip to content

[FEATURE REQUEST] Different gene caller option(s) for anvi-gen-contigs-database #2298

@ivagljiva

Description

@ivagljiva

A small project to improve anvi'o, based upon feedback/ideas @FlorianTrigodet and I heard from our colleagues at the QIB in Norwich.

The need

There is interest in being able to use alternative gene calling software in addition to prodigal, within anvi'o (ie, instead of having to run gene calling outside of anvi'o and using external gene calls). We've heard specifically about prodigal-gv, a fork of prodigal that has additions to improve gene calling for viruses, and pyrodigal/pyrodigal-gv which are the respective Python modules for using these software directly in the code. However, there could be other gene callers of interest to the community.

The solution

This small project is flexible in scope depending on which gene calling software we want to support and how far you (the developer) want to go with the refactor. Here are some possibilities:

  • implementing prodigal-gv could be as simple as adding a variable to store either prodigal or prodigal-gv according to user input, and replacing all instances of calling prodigal with this variable. It would use the same driver/parser modules as prodigal uses, and in theory no further changes would be necessary
  • incorporating one (or both) of the pyrodigal options would require changes to how we actually run the gene calling step. We would no longer use a driver program that runs the prodigal binary, but would switch that to using the pyrodigal classes directly. Multi-threading and parsing of the results would also have to change to be compatible with those classes (they are thread-safe but it looks like we would still manage the multi-threading on our own).
  • This could be a good opportunity to refactor the way we store gene call information in the anvi'o databases, to incorporate additional data as suggested in [FEATURE REQUEST] Preserve prodigal metadata for anvi-export-gene-calls #2181 and [FEATURE REQUEST] Refactoring Anvio to be more eukaryote friendly/account for different genetic architectures #2297. That would require much more extensive code changes.

Beneficiaries

All users of anvi'o, but (in the case of prodigal-gv) especially those who work on viruses.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions