Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic#424
Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic#424lbh930 wants to merge 2 commits intooracle:mainfrom
Conversation
|
We had to do this for the regression infos some time ago to fix a nasty indexing bug there, however it required lots of juggling in the models to fix as different indices cause problems. I don't think this could cause similar problems, but the iteration order here of the new style ones is guaranteed to be in increasing string sort order, and old ones won't be (as they still will deserialize into a If we do do this then you should use the |
|
Thank you for reviewing! For now, I've updated to use TreeSet for the sorted keys. |
|
Update: As tested with NonDex this PR is also verified to be fixing flakiness for these tests as well:
NonDex output of these tests attached. |
Description
Modified
ImmutableLabelInfoto sort labels lexicographically before assigning IDs. This ensures that the mapping from Label to ID is deterministic and doesn't depend on the iteration order ofHashMap. Updated the regression testsTestSGDLinear,TestFMClassification,TestClassificationEnsemblesthat relied on the previous non-deterministic ID assignment. This made sure the comparisons in the tests are not order-dependent.Motivation
NonDex initially detected test flakiness in 4 tests:
org.tribuo.classification.mnb.TestMNB.testSingleClassTrainingorg.tribuo.classification.SerializationTest.load431Protobufsorg.tribuo.classification.sgd.fm.TestFMClassification.loadProtobufModelorg.tribuo.classification.sgd.linear.TestSGDLinear.testSingleClassTrainingUpdated list with 4 more related flaky tests detected by NonDex:
org.tribuo.reproducibility.ReproUtilTest#testOverrideConfigurablePropertyorg.tribuo.reproducibility.ReproUtilTest#testReproduceFromModelorg.tribuo.reproducibility.ReproUtilTest#testReproduceFromProvenanceNoSplitterorg.tribuo.reproducibility.ReproUtilTest#testReproduceFromProvenanceWithSplitterThis PR is verified by NonDex to fix them all. The root cause for all of them was that the order of label IDs depended on
HashMapiteration order, which has no guarantee of determinism. When the ID assignment changed, the resulting model parameters and predictions varied. By sorting the labels, the ID assignment will be consistent and make model training deterministic.