Verified run-to-run determinism: 50 prompts × 20 runs, temperature=0, two models: 100% bit-identical #1017
NullPointerDepressiveDisorder
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I ran a systematic determinism test on mlx-lm and wanted to share results since I couldn't find this documented anywhere beyond the parameter description saying temperature 0.0 equals "deterministic."
Setup
mlx-community/Meta-Llama-3.1-8B-Instruct-4bit,mlx-community/Qwen3.5-4B-4bitResults
Every prompt produced bit-identical output across all 20 runs for both models. Perfect run-to-run determinism in single-request mode.
Why tho?
A August 2024 study found that even with temperature=0 and fixed seeds, production serving engines show considerable output variation: Mixtral-8x7b had a 72 percentage-point accuracy range across 10 runs. The mlx-deterministic project documented that MLX inference can produce different outputs with different batch sizes due to reduction order changes in RMSNorm/MatMul/Softmax.
My results are consistent with this distinction: single-request, same batch size, same hardware produces perfect determinism. I have not tested batch-invariant determinism (whether outputs change when processed alongside other requests in a batch). That's a different and harder property.
Methodology
Testing done with infer-check (
pip install infer-check), a CLI I built for inference correctness testing on MLX engines.Full writeup: blog post link
Sample size is small (n=50 × 20 = 1,000 runs per model), so treat this as a positive signal rather than comprehensive proof. Would be interested to know if anyone's seen non-determinism in single-request mode under different conditions.
Beta Was this translation helpful? Give feedback.
All reactions