Skip to content

Regenerate forecast_outputs.rda: contains stale data with negative sample values #62

@annakrystalli

Description

@annakrystalli

Summary

forecast_outputs.rda contains a negative sample value (-2) that no longer exists in the source CSV. The example-complex-forecast-hub source data was fixed in commit 4a75b82 ("Regenerate model output with sample values >= 0"), but the .rda in hubExamples was never regenerated afterwards.

Reproduction

# Packaged data has a negative value:
data <- hubExamples::forecast_outputs
neg <- data[data$output_type == "sample" & data$value < 0, ]
neg
#>            model_id reference_date          target horizon location
#> 73 Flusight-baseline     2022-11-19 wk inc flu hosp       0       25
#>    target_end_date output_type output_type_id value
#> 73      2022-11-19      sample           2101    -2

# But regenerating from source produces no negatives:
hub_path <- "../example-complex-forecast-hub"
fresh <- hubData::connect_hub(hub_path) |>
  dplyr::filter(
    location %in% c("25", "48"),
    output_type == "sample",
    reference_date == "2022-11-19"
  ) |>
  hubData::collect_hub()
any(fresh$value < 0)
#> [1] FALSE

Impact

The hub's tasks.json specifies "minimum": 0 for sample values, so the negative value shouldn't be present. This causes issues downstream when applying scale transformations (e.g., sqrt, log_shift) that expect non-negative inputs.

Fix

Rerun data-raw/generate_example_forecast_data.R to regenerate the .rda file from the current source data.

Context

Discovered while implementing sample scoring in hubverse-org/hubEvals#94.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions