The goal of hubEvals is to provide tools for evaluating infectious disease model outputs. This package is part of the Hubverse project, which aims to provide a suite of tools for infectious disease modeling hubs.
You can install the latest version of hubEvals from the R-universe:
install.packages("hubEvals", repos = c("https://hubverse-org.r-universe.dev", "https://cloud.r-project.org"))If you want to test out new features that have not yet been released, you can install the development version of hubEvals from GitHub with:
# install.packages("remotes")
remotes::install_github("hubverse-org/hubEvals")Predictions can be evaluated directly using the scoring function in
hubEvals, which assumes a hubverse format for the model outputs and
target data:
library(hubEvals)
# compute default metrics (in this case, absolute error) for
# median forecasts, summarized by the mean score for each model
median_scores <- score_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "median"), # only one output type allowed
oracle_output = hubExamples::forecast_oracle_output,
by = "model_id"
)
median_scores
#> model_id ae_point
#> <char> <num>
#> 1: Flusight-baseline 401.875
#> 2: MOBS-GLEAM_FLUH 416.375
#> 3: PSI-DICE 277.000
# compute WIS and interval coverage rates at 80% and 90% levels based on
# quantile forecasts, summarized by the mean score for each model
quantile_scores <- score_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "quantile"), # only one output type allowed
oracle_output = hubExamples::forecast_oracle_output,
metrics = c("wis", "interval_coverage_80", "interval_coverage_90"),
relative_metrics = "wis",
by = "model_id"
)
quantile_scores
#> Key: <model_id>
#> model_id wis interval_coverage_80 interval_coverage_90 wis_relative_skill
#> <char> <num> <num> <num> <num>
#> 1: Flusight-baseline 329.4545 0.0 0.1250 1.1473659
#> 2: MOBS-GLEAM_FLUH 315.2393 0.5 0.5625 1.0978597
#> 3: PSI-DICE 227.9527 0.5 0.5000 0.7938733
# compute log scores based on pmf predictions for categorical targets,
# summarized by the mean score for each combination of model and location.
# Note: if the model_out_tbl had forecasts for multiple targets using a
# pmf output_type with different bins, it would be necessary to score the
# predictions for those targets separately.
pmf_scores <- score_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "pmf"), # only one output type allowed
oracle_output = hubExamples::forecast_oracle_output,
metrics = c("log_score", "rps"),
by = c("model_id", "location", "horizon"),
output_type_id_order = c("low", "moderate", "high", "very high")
)
head(pmf_scores)
#> model_id location horizon log_score rps
#> <char> <char> <int> <num> <num>
#> 1: Flusight-baseline 25 0 0.02107606 0.0008531043
#> 2: Flusight-baseline 25 1 6.69652380 0.5029240066
#> 3: Flusight-baseline 25 2 17.73313203 1.0057355863
#> 4: Flusight-baseline 25 3 Inf 1.8665126816
#> 5: Flusight-baseline 48 0 2.18418007 0.4873966597
#> 6: Flusight-baseline 48 1 7.49960792 0.9659026096Sample forecasts can be scored marginally (each modeling task scored independently) or jointly using compound scoring:
# marginal sample scoring with CRPS
sample_scores <- hubExamples::forecast_outputs |>
dplyr::filter(output_type == "sample") |>
score_model_out(
oracle_output = hubExamples::forecast_oracle_output,
metrics = "crps",
by = "model_id"
)
sample_scores
#> model_id crps
#> <char> <num>
#> 1: Flusight-baseline 351.5888
#> 2: MOBS-GLEAM_FLUH 347.1502
#> 3: PSI-DICE 247.3640Compound
scoring
uses the energy score to evaluate the joint distribution across task
dimensions that vary within a sample draw. The compound_taskid_set
specifies which task IDs stay constant within a sample group and can be
found by referencing the hub’s tasks.json configuration file. Here,
each draw spans all horizons for a given reference date and location
(i.e., a trajectory over time).
compound_scores <- hubExamples::forecast_outputs |>
dplyr::filter(output_type == "sample") |>
score_model_out(
oracle_output = hubExamples::forecast_oracle_output,
compound_taskid_set = c("reference_date", "location"),
by = "model_id"
)
compound_scores
#> model_id energy_score variogram_score
#> <char> <num> <num>
#> 1: Flusight-baseline 772.7587 1523.954
#> 2: MOBS-GLEAM_FLUH 811.4625 1695.037
#> 3: PSI-DICE 571.0879 1264.238Or, users may transform predictions into a forecast object that can be
used as an input to scoringutils functions and use their tooling
directly.
median_forecast <- transform_point_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "median"),
oracle_output = hubExamples::forecast_oracle_output,
output_type = "median"
)
median_forecast
quantile_forecast <- transform_quantile_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "quantile"),
oracle_output = hubExamples::forecast_oracle_output
)
quantile_forecast
pmf_forecasts <- transform_pmf_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(output_type == "pmf"),
oracle_output = hubExamples::forecast_oracle_output,
output_type_id_order = c("low", "moderate", "high", "very high")
)
pmf_forecasts
sample_forecast <- hubExamples::forecast_outputs |>
dplyr::filter(output_type == "sample") |>
transform_sample_model_out(
oracle_output = hubExamples::forecast_oracle_output
)
sample_forecastPlease note that the hubEvals package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Interested in contributing back to the open-source Hubverse project? Learn more about how to get involved in the Hubverse Community or how to contribute to the hubEvals package.
