-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
forecast_outputs.rda contains a negative sample value (-2) that no longer exists in the source CSV. The example-complex-forecast-hub source data was fixed in commit 4a75b82 ("Regenerate model output with sample values >= 0"), but the .rda in hubExamples was never regenerated afterwards.
Reproduction
# Packaged data has a negative value:
data <- hubExamples::forecast_outputs
neg <- data[data$output_type == "sample" & data$value < 0, ]
neg
#> model_id reference_date target horizon location
#> 73 Flusight-baseline 2022-11-19 wk inc flu hosp 0 25
#> target_end_date output_type output_type_id value
#> 73 2022-11-19 sample 2101 -2
# But regenerating from source produces no negatives:
hub_path <- "../example-complex-forecast-hub"
fresh <- hubData::connect_hub(hub_path) |>
dplyr::filter(
location %in% c("25", "48"),
output_type == "sample",
reference_date == "2022-11-19"
) |>
hubData::collect_hub()
any(fresh$value < 0)
#> [1] FALSEImpact
The hub's tasks.json specifies "minimum": 0 for sample values, so the negative value shouldn't be present. This causes issues downstream when applying scale transformations (e.g., sqrt, log_shift) that expect non-negative inputs.
Fix
Rerun data-raw/generate_example_forecast_data.R to regenerate the .rda file from the current source data.
Context
Discovered while implementing sample scoring in hubverse-org/hubEvals#94.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Todo