Skip to content

Flexible NaN-handling  #34

@frazane

Description

@frazane

For ensemble-based scoring rules, we need flexible handling of missing values in ensemble members.

Currently, ensemble-based metrics such as the CRPS return NaN if there is one or more NaNs in the ensemble members. It may be the case that users have an ensemble with a few NaNs (e.g. with lagged ensembles you have NaNs for some timestamps) but still want to get a valid score.

Also ensures consistency with https://github.com/nci/scores/blob/develop/src/scores/probability/crps_impl.py

Proposed Solution

We plan on allowing users to specify a "NaN policy" using a new argument added to ensemble-based scores.

nan_policy: Controls how NaN values are handled

  • "propagate" (default): Current behavior - return NaN if any ensemble member is NaN
  • "omit": ignores NaN values during ensemble before computation
  • "raise": Raise an error if NaN values are encountered

Optionally (likely in the future, if we realise it would be useful):

min_ensemble_size: Minimum number of non-NaN ensemble members required (only applies when nan_policy="omit")

  • Default: 1 (compute score with any number of non-NaN members)
  • If fewer than min_ensemble_size non-NaN members exist, return NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions