reproducibility - sample new levels, testing data

I find that even when a seed is set, extracted testing data predictions  will not be reproducible if the default "sample new levels" option is left at TRUE, at least when there are new group levels in the test set and the dataset is large (50k plus).  But if you set "sample new levels" to FALSE, the predictions _are_ reproducible, as confirmed by the `identical()` function. In the sample problem, which admittedly involves a very small dataset, it doesn't seem to matter which value is used.

It makes more sense to me for the testing data predictions to be identical either way, but that may not be what was intended. In any event, I wanted to alert you to it. Thanks again for the great package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducibility - sample new levels, testing data #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

reproducibility - sample new levels, testing data #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions