Skip to content

reproducibility - sample new levels, testing data #17

@bachlaw

Description

@bachlaw

I find that even when a seed is set, extracted testing data predictions will not be reproducible if the default "sample new levels" option is left at TRUE, at least when there are new group levels in the test set and the dataset is large (50k plus). But if you set "sample new levels" to FALSE, the predictions are reproducible, as confirmed by the identical() function. In the sample problem, which admittedly involves a very small dataset, it doesn't seem to matter which value is used.

It makes more sense to me for the testing data predictions to be identical either way, but that may not be what was intended. In any event, I wanted to alert you to it. Thanks again for the great package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions