Skip to content

Non-deterministic behaviour #29

@aaronmussig

Description

@aaronmussig

Hello,

Firstly, thanks for your work with Pyrodigal! I wasn't able to determine if Pyrodigal should be deterministic, if it is then ignore this ticket.

I have an extremely rare case that took quite some time to identify, but Pyrodigal will occasionally give a different result when running via Shell vs. Python subprocess. The strange part is that the likelihood of Pyrodigal giving a different result is higher when running via a Python subprocess though.

To replicate the issue:

Dockerfile

FROM python:3.10-slim

RUN apt-get update && apt-get install -y \
    curl \
    unzip \
    && rm -rf /var/lib/apt/lists/*

RUN python -m pip install pyrodigal==2.0.4

RUN mkdir -p /data /results /tmp/download

WORKDIR /tmp/download

RUN curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_009700405.1/download?include_annotation_type=GENOME_FASTA&filename=GCA_009700405.1.zip" -H "Accept: application/zip" && \
    unzip GCA_009700405.1.zip && \
    rm GCA_009700405.1.zip && \
    mv ncbi_dataset/data/GCA_009700405.1/GCA_009700405.1_ASM970040v1_genomic.fna /data/genome.fna && \
    rm -rf /tmp/download

WORKDIR /data

Entering container

docker build -t pyrodigal_test . && docker run -it pyrodigal_test /bin/bash

Running Pyrodigal

#!/bin/bash

for i in {1..100}
do
   python -c "import os; os.system('pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/python_$i.faa -d /dev/null -p single')";
   pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/shell_$i.faa -d /dev/null -p single;
done

Results

Over 200 trials I get the following results:

Hash Command Line (count) Python os.system (count)
a9f114 192 178
597610 8 22

The differences between the two hashes are:

7082,7090c7082,7090
< >WLMD01000046.1_10 # 9095 # 10405 # -1 # ID=101_10;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.487
< MSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGFDGS
< AIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVLRRI
< LNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAITML
< EAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIPKPF
< TDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWVNSY
< KRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVMIAA
< GLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEWNDY
< SRQVTAYELDRYLPIL*
---
> >WLMD01000046.1_10 # 9095 # 10414 # -1 # ID=101_10;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.487
> YVPMSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGF
> DGSAIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVL
> RRILNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAI
> TMLEAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIP
> KPFTDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWV
> NSYKRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVM
> IAAGLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEW
> NDYSRQVTAYELDRYLPIL*

i.e. the 597610 hash starts with Y instead of M.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexternalIssue comes from a dependency or some external code.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions