-
Notifications
You must be signed in to change notification settings - Fork 8
Description
For ensemble-based scoring rules, we need flexible handling of missing values in ensemble members.
Currently, ensemble-based metrics such as the CRPS return NaN if there is one or more NaNs in the ensemble members. It may be the case that users have an ensemble with a few NaNs (e.g. with lagged ensembles you have NaNs for some timestamps) but still want to get a valid score.
Also ensures consistency with https://github.com/nci/scores/blob/develop/src/scores/probability/crps_impl.py
Proposed Solution
We plan on allowing users to specify a "NaN policy" using a new argument added to ensemble-based scores.
nan_policy: Controls how NaN values are handled
"propagate"(default): Current behavior - return NaN if any ensemble member is NaN"omit": ignores NaN values during ensemble before computation"raise": Raise an error if NaN values are encountered
Optionally (likely in the future, if we realise it would be useful):
min_ensemble_size: Minimum number of non-NaN ensemble members required (only applies when nan_policy="omit")
- Default:
1(compute score with any number of non-NaN members) - If fewer than
min_ensemble_sizenon-NaN members exist, return NaN