Skip to content

Difference between Pyrodigal 2.0.4 and Prodigal 2.6.3 #28

@pchaumeil

Description

@pchaumeil

Hello,

I have run another big test to compare prodigal and pyrodigal across ~400K genomes.

from this test ~17K genomes have a difference in their gene calling between the 2 softwares.
I have attached the list of all genomes with a difference here :
index_genomes.txt

here are few examples of these differences:

for GCA_934838455.1:

diff prodigal_2.6.3_amino.faa pyrodigal_2.0.4_amino.faa
>CAKWEX010000332.1_3 # 2936 # 3754 # -1 # ID=332_3;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.602
MSNHFEGLGKTWLTLLNDPEKEVPAVVMQVMKEGKTRDCWQRKDSKEETMVLAWPVETGF
RAGVTVHGNAGDQLRPVSTYPLLEGAPNDMTVNETYLWQNETEGEVSATCNEGANPLWFY
SPFLFRDRENLTPGVRHTFLIAGLAYGLRRALLDEMTITEGVEYERYVAEWLAQNPGKTR
LDVPQLTVDLRGARIVVPGDVASEYQIRVPVTSVEEMHIQNEKIYMLIVEFGLNTPNPLR
FPLYAPERVCKIVPQAGDEIDAIIWLQGRIID*
---
>CAKWEX010000332.1_3 # 2936 # 3763 # -1 # ID=332_3;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.603
IGPMSNHFEGLGKTWLTLLNDPEKEVPAVVMQVMKEGKTRDCWQRKDSKEETMVLAWPVE
TGFRAGVTVHGNAGDQLRPVSTYPLLEGAPNDMTVNETYLWQNETEGEVSATCNEGANPL
WFYSPFLFRDRENLTPGVRHTFLIAGLAYGLRRALLDEMTITEGVEYERYVAEWLAQNPG
KTRLDVPQLTVDLRGARIVVPGDVASEYQIRVPVTSVEEMHIQNEKIYMLIVEFGLNTPN
PLRFPLYAPERVCKIVPQAGDEIDAIIWLQGRIID*

for GCA_934561095.1

diff prodigal_2.6.3_amino.faa pyrodigal_2.0.4_amino.faa
>CAKTGG010000246.1_2 # 738 # 2552 # -1 # ID=246_2;partial=00;start_type=TTG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.618
MKLSARKDVPVNETWDLSLIFAAEADFEAAVEKTKALADTLEKTYKNALTTPESIAECLA
LYEELEILLYQTTSYTSLAVSVDYTDTEAQKKDAKMSALAAEIGSRLSFIESEIADAPEE
LIRAAMDKTERAKHYLAEILREKPHRLSAETEKVLAALRPVFNAPYDIYHMTKLADMKFD
SFTVNSKEYPLGYSLFEDEYEYEADTDVRRAAFRAFSDKLRQYENTTAATYNTYLTQQRI
MAHQRGFADMFEADLFADHVTREMYDRQIDLITEKLAPAMRKYARLVGKMNKLDRVTFAD
LKLPLDAEFDPRVTIEESREYVRSALSVLGQDYADMVDEAYDKRWIDFARNAGKETGGFC
SSPYGCNSFILLSWNNRMADVFTIAHELGHAGHFRLCNGAQSLFDTNVSGYLIEAPSTMN
ELLLAQDLLRKNTDKRFRRWVLSSLIGHTYYHNFVTHLREAWYQREAMNIIEQGGAVNAE
TLSGIFRRNLETFWGDAVELTEGCELTWMRQPHYYMGLYSYTYSAGLTLATQAALNIAAE
GESAVARWRAMLEAGSTRGPLGLAEIAGIDLSTPDALEHTIAYISDIIDEIAVLTEELDG
ITLD*
---
>CAKTGG010000246.1_2 # 738 # 2564 # -1 # ID=246_2;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.616
EDIHLKLSARKDVPVNETWDLSLIFAAEADFEAAVEKTKALADTLEKTYKNALTTPESIA
ECLALYEELEILLYQTTSYTSLAVSVDYTDTEAQKKDAKMSALAAEIGSRLSFIESEIAD
APEELIRAAMDKTERAKHYLAEILREKPHRLSAETEKVLAALRPVFNAPYDIYHMTKLAD
MKFDSFTVNSKEYPLGYSLFEDEYEYEADTDVRRAAFRAFSDKLRQYENTTAATYNTYLT
QQRIMAHQRGFADMFEADLFADHVTREMYDRQIDLITEKLAPAMRKYARLVGKMNKLDRV
TFADLKLPLDAEFDPRVTIEESREYVRSALSVLGQDYADMVDEAYDKRWIDFARNAGKET
GGFCSSPYGCNSFILLSWNNRMADVFTIAHELGHAGHFRLCNGAQSLFDTNVSGYLIEAP
STMNELLLAQDLLRKNTDKRFRRWVLSSLIGHTYYHNFVTHLREAWYQREAMNIIEQGGA
VNAETLSGIFRRNLETFWGDAVELTEGCELTWMRQPHYYMGLYSYTYSAGLTLATQAALN
IAAEGESAVARWRAMLEAGSTRGPLGLAEIAGIDLSTPDALEHTIAYISDIIDEIAVLTE
ELDGITLD*

Is this the normal to have that many difference across all these genomes? Is Pyrodigal more accurate in this case?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexternalIssue comes from a dependency or some external code.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions