Hello,
Firstly, thanks for your work with Pyrodigal! I wasn't able to determine if Pyrodigal should be deterministic, if it is then ignore this ticket.
I have an extremely rare case that took quite some time to identify, but Pyrodigal will occasionally give a different result when running via Shell vs. Python subprocess. The strange part is that the likelihood of Pyrodigal giving a different result is higher when running via a Python subprocess though.
To replicate the issue:
Dockerfile
FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
curl \
unzip \
&& rm -rf /var/lib/apt/lists/*
RUN python -m pip install pyrodigal==2.0.4
RUN mkdir -p /data /results /tmp/download
WORKDIR /tmp/download
RUN curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_009700405.1/download?include_annotation_type=GENOME_FASTA&filename=GCA_009700405.1.zip" -H "Accept: application/zip" && \
unzip GCA_009700405.1.zip && \
rm GCA_009700405.1.zip && \
mv ncbi_dataset/data/GCA_009700405.1/GCA_009700405.1_ASM970040v1_genomic.fna /data/genome.fna && \
rm -rf /tmp/download
WORKDIR /data
Entering container
docker build -t pyrodigal_test . && docker run -it pyrodigal_test /bin/bash
Running Pyrodigal
#!/bin/bash
for i in {1..100}
do
python -c "import os; os.system('pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/python_$i.faa -d /dev/null -p single')";
pyrodigal -m -i /data/genome.fna -g 11 -o /dev/null -a /results/shell_$i.faa -d /dev/null -p single;
done
Results
Over 200 trials I get the following results:
| Hash |
Command Line (count) |
Python os.system (count) |
| a9f114 |
192 |
178 |
| 597610 |
8 |
22 |
The differences between the two hashes are:
7082,7090c7082,7090
< >WLMD01000046.1_10 # 9095 # 10405 # -1 # ID=101_10;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.487
< MSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGFDGS
< AIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVLRRI
< LNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAITML
< EAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIPKPF
< TDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWVNSY
< KRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVMIAA
< GLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEWNDY
< SRQVTAYELDRYLPIL*
---
> >WLMD01000046.1_10 # 9095 # 10414 # -1 # ID=101_10;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.487
> YVPMSADDQLRKQQEFVLRTIEERNIRFVRLWFTDVLGFLKSVAIAPAELANAFDEGIGF
> DGSAIEGFARITESDMLAKPDPSTFSVLPWRTEAPGAARMFCDIVMPDGSASHADPRHVL
> RRILNKAATMGYTCYTHPEIEFFLFKDRPEIGKRPTPVDQGGYFDHTPAVVGHDFRRTAI
> TMLEAMGISVEFSHHEGAPGQQEIDLRYADALTTADNIMTFRHVVKEVALDQGFHASFIP
> KPFTDHPGSGMHTHVSLFQGEKNAFYDAKAEYNLSKVGRSFIAGLLRHAPEITAVTNQWV
> NSYKRLHGGGEAPALVNWGHNNRGALVRVPMYKPNNENSTRVEFRSPDSACNPYLAYAVM
> IAAGLKGVEEGYELADSSDATVLPSNLNEAIIAMEKSALVRETLGEHVFEYVLRNKRAEW
> NDYSRQVTAYELDRYLPIL*
i.e. the 597610 hash starts with Y instead of M.
Hello,
Firstly, thanks for your work with Pyrodigal! I wasn't able to determine if Pyrodigal should be deterministic, if it is then ignore this ticket.
I have an extremely rare case that took quite some time to identify, but Pyrodigal will occasionally give a different result when running via Shell vs. Python subprocess. The strange part is that the likelihood of Pyrodigal giving a different result is higher when running via a Python subprocess though.
To replicate the issue:
Dockerfile
Entering container
Running Pyrodigal
Results
Over 200 trials I get the following results:
The differences between the two hashes are:
i.e. the 597610 hash starts with Y instead of M.