Hello,
I have run another big test to compare prodigal and pyrodigal across ~400K genomes.
from this test ~17K genomes have a difference in their gene calling between the 2 softwares.
I have attached the list of all genomes with a difference here :
index_genomes.txt
here are few examples of these differences:
for GCA_934838455.1:
diff prodigal_2.6.3_amino.faa pyrodigal_2.0.4_amino.faa
>CAKWEX010000332.1_3 # 2936 # 3754 # -1 # ID=332_3;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.602
MSNHFEGLGKTWLTLLNDPEKEVPAVVMQVMKEGKTRDCWQRKDSKEETMVLAWPVETGF
RAGVTVHGNAGDQLRPVSTYPLLEGAPNDMTVNETYLWQNETEGEVSATCNEGANPLWFY
SPFLFRDRENLTPGVRHTFLIAGLAYGLRRALLDEMTITEGVEYERYVAEWLAQNPGKTR
LDVPQLTVDLRGARIVVPGDVASEYQIRVPVTSVEEMHIQNEKIYMLIVEFGLNTPNPLR
FPLYAPERVCKIVPQAGDEIDAIIWLQGRIID*
---
>CAKWEX010000332.1_3 # 2936 # 3763 # -1 # ID=332_3;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.603
IGPMSNHFEGLGKTWLTLLNDPEKEVPAVVMQVMKEGKTRDCWQRKDSKEETMVLAWPVE
TGFRAGVTVHGNAGDQLRPVSTYPLLEGAPNDMTVNETYLWQNETEGEVSATCNEGANPL
WFYSPFLFRDRENLTPGVRHTFLIAGLAYGLRRALLDEMTITEGVEYERYVAEWLAQNPG
KTRLDVPQLTVDLRGARIVVPGDVASEYQIRVPVTSVEEMHIQNEKIYMLIVEFGLNTPN
PLRFPLYAPERVCKIVPQAGDEIDAIIWLQGRIID*
for GCA_934561095.1
diff prodigal_2.6.3_amino.faa pyrodigal_2.0.4_amino.faa
>CAKTGG010000246.1_2 # 738 # 2552 # -1 # ID=246_2;partial=00;start_type=TTG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.618
MKLSARKDVPVNETWDLSLIFAAEADFEAAVEKTKALADTLEKTYKNALTTPESIAECLA
LYEELEILLYQTTSYTSLAVSVDYTDTEAQKKDAKMSALAAEIGSRLSFIESEIADAPEE
LIRAAMDKTERAKHYLAEILREKPHRLSAETEKVLAALRPVFNAPYDIYHMTKLADMKFD
SFTVNSKEYPLGYSLFEDEYEYEADTDVRRAAFRAFSDKLRQYENTTAATYNTYLTQQRI
MAHQRGFADMFEADLFADHVTREMYDRQIDLITEKLAPAMRKYARLVGKMNKLDRVTFAD
LKLPLDAEFDPRVTIEESREYVRSALSVLGQDYADMVDEAYDKRWIDFARNAGKETGGFC
SSPYGCNSFILLSWNNRMADVFTIAHELGHAGHFRLCNGAQSLFDTNVSGYLIEAPSTMN
ELLLAQDLLRKNTDKRFRRWVLSSLIGHTYYHNFVTHLREAWYQREAMNIIEQGGAVNAE
TLSGIFRRNLETFWGDAVELTEGCELTWMRQPHYYMGLYSYTYSAGLTLATQAALNIAAE
GESAVARWRAMLEAGSTRGPLGLAEIAGIDLSTPDALEHTIAYISDIIDEIAVLTEELDG
ITLD*
---
>CAKTGG010000246.1_2 # 738 # 2564 # -1 # ID=246_2;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.616
EDIHLKLSARKDVPVNETWDLSLIFAAEADFEAAVEKTKALADTLEKTYKNALTTPESIA
ECLALYEELEILLYQTTSYTSLAVSVDYTDTEAQKKDAKMSALAAEIGSRLSFIESEIAD
APEELIRAAMDKTERAKHYLAEILREKPHRLSAETEKVLAALRPVFNAPYDIYHMTKLAD
MKFDSFTVNSKEYPLGYSLFEDEYEYEADTDVRRAAFRAFSDKLRQYENTTAATYNTYLT
QQRIMAHQRGFADMFEADLFADHVTREMYDRQIDLITEKLAPAMRKYARLVGKMNKLDRV
TFADLKLPLDAEFDPRVTIEESREYVRSALSVLGQDYADMVDEAYDKRWIDFARNAGKET
GGFCSSPYGCNSFILLSWNNRMADVFTIAHELGHAGHFRLCNGAQSLFDTNVSGYLIEAP
STMNELLLAQDLLRKNTDKRFRRWVLSSLIGHTYYHNFVTHLREAWYQREAMNIIEQGGA
VNAETLSGIFRRNLETFWGDAVELTEGCELTWMRQPHYYMGLYSYTYSAGLTLATQAALN
IAAEGESAVARWRAMLEAGSTRGPLGLAEIAGIDLSTPDALEHTIAYISDIIDEIAVLTE
ELDGITLD*
Is this the normal to have that many difference across all these genomes? Is Pyrodigal more accurate in this case?
Thank you
Hello,
I have run another big test to compare prodigal and pyrodigal across ~400K genomes.
from this test ~17K genomes have a difference in their gene calling between the 2 softwares.
I have attached the list of all genomes with a difference here :
index_genomes.txt
here are few examples of these differences:
for GCA_934838455.1:
for GCA_934561095.1
Is this the normal to have that many difference across all these genomes? Is Pyrodigal more accurate in this case?
Thank you