-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Flaky tests, such as test_conditional_prob_inf_given_vl_dist, are due to their non-deterministic nature. Monte Carlo simulations, such as those used in the baseline_exposure_model fixture and test_conditional_prob_inf_given_vl_dist, fail because they're probabilistic. For modelling this is great - but for unit tests its not 😅 It's a nice feeling to have a 🟢 CI.
All you'd have to do to fix this test is set the random seed, like np.random.seed(42). However, this is misleading if the goal (as I suspect) is to ensure the models accuracy within a certain tolerance.
If the goal is to measure the models accuracy, what if you removed the 3x retry and instead ran the model 10 times, gathering the results each time? Then you could do a statistical analysis against the mean/median/standard deviation/percentiles etc.
Another alternative is to change the fixed 0.002 tolerance. What if you calculated the tolerance based on the number of runs? Could an absolute tolerance of 0.002 be wishful thinking?
It could also help to log model deviations and store them with a timestamp. Then you can monitor deviations over time.
I'm happy to fix this test (and get that CI ✅). Its a cool project. Just let me know what the expected behavior is & how I can help out.