dropping CAN backtesting example b/c ~no revisions

dsweber2 · dsweber2 · commit 4723d89f6a58 · 2025-03-03T12:49:51.000-06:00
diff --git a/vignettes/backtesting.Rmd b/vignettes/backtesting.Rmd
@@ -391,129 +391,6 @@ can expect in practice.
 p2
 ```
 
-## Comparing version faithful and faithless in Canada
-
-
-By leveraging the flexibility of `{epipredict}`, we can apply the same
-techniques to data from other sources.
-Since some collaborators are in British Columbia, Canada, we'll do essentially
-the same thing for Canadian Provincial data as we did above.
-
-The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects
-daily time series data on COVID-19 cases, deaths, recoveries, testing and
-vaccinations at the health region and province levels.
-Data are collected from publicly available sources such as government datasets
-and news releases.
-Unfortunately, there is no simple versioned source, so we have created our own
-from the Github commit history and stored it in `epidatasets::can_prov_cases`.
-
-First, we load versioned case rates at the provincial level.
-
-```{r get-can-fc, warning = FALSE}
-aheads <- c(7, 14, 21, 28)
-canada_archive <- epidatasets::can_prov_cases
-revis_can <- epidatasets::can_prov_cases %>% revision_summary()
-revis_can %>% group_by(geo_value) %>% summarize(n_revisions = mean(n_revisions)) %>% print(n=13)
-canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
-  mutate(version = time_value) %>%
-  as_epi_archive()
-```
-
-We run more or less the same forecasting method as above, but with the addition
-of 7-day averaging for each snapshot before forecasting (due to highly variable
-provincial reporting mismatches).
-
-```{r smoothing, warning = FALSE}
-smooth_cases <- function(epi_df) {
-  epi_df %>%
-    group_by(geo_value) %>%
-    epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
-}
-forecast_dates <- seq.Date(
-  from = min(canada_archive$DT$version),
-  to = max(canada_archive$DT$version),
-  by = "1 month"
-)
-
-canada_version_faithless <-
-  canada_archive_faux %>%
-      epix_slide(
-        ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
-        .before = 120,
-        .versions = forecast_dates
-      ) %>% 
-      mutate(version_faithful = FALSE)
-canada_version_faithful <-
-  canada_archive %>%
-      epix_slide(
-        ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
-        .before = 120,
-        .versions = forecast_dates
-      ) %>%
-      mutate(version_faithful = TRUE)
-canada_forecasts <- bind_rows(
-  canada_version_faithless,
-  canada_version_faithful
-)
-```
-
-The figures below shows the results for a single province.
-<details>
-<summary> Plotting </summary>
-First prepping some data to make plotting more informative.
-```{r plot-can-fc-lr-data, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
-geo_choose <- "Saskatchewan"
-forecasts_filtered <- canada_forecasts %>%
-  filter(geo_value == geo_choose) %>%
-  mutate(time_value = version)
-case_rate_data <- bind_rows(
-  # Snapshotted data for the version-faithful forecasts
-  map(
-    forecast_dates,
-    ~ canada_archive %>%
-      epix_as_of(.x) %>%
-      smooth_cases() %>%
-      mutate(case_rate = case_rate_7dav, version = .x)
-  ) %>%
-    bind_rows() %>%
-    mutate(version_faithful = TRUE),
-  # Latest data for the version-faithless forecasts
-  canada_archive %>%
-    epix_as_of(doctor_visits$versions_end) %>%
-    smooth_cases() %>%
-    mutate(case_rate = case_rate_7dav, version_faithful = FALSE)
-) %>%
-  filter(geo_value == geo_choose)
-case_rate_data %>% filter(time_value == "2021-01-13") %>% print(n=12)
-canada_archive %>% revision_summary()
-```
-And actually generating the plot
-```{r plot-can-fc-lr, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
-p3 <-
-  ggplot(data = forecasts_filtered, aes(x = target_date, group = time_value)) +
-  geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`, fill = factor(time_value)), alpha = 0.4) +
-  geom_line(aes(y = .pred, color = factor(time_value)), linetype = 2L) +
-  geom_point(aes(y = .pred, color = factor(time_value)), size = 0.75) +
-  # the forecast date
-  geom_vline(data = case_rate_data %>% filter(geo_value == geo_choose) %>% select(-version_faithful), aes(color = factor(version), xintercept = version), lty = 2) +
-  # the underlying data
-  geom_line(
-    data = case_rate_data %>% filter(geo_value == geo_choose),
-    aes(x = time_value, y = case_rate, color = factor(version)),
-    inherit.aes = FALSE, na.rm = TRUE
-  ) +
-  facet_grid(version_faithful ~ geo_value, scales = "free") +
-  scale_x_date(breaks = "2 months", date_labels = "%b %Y") +
-  scale_y_continuous(expand = expansion(c(0, 0.05))) +
-  labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
-  theme(legend.position = "none")
-```
-</details>
-
-```{r show-can-plot, warning = FALSE, echo = FALSE}
-p3
-```
-
 
 [^1]: For forecasting a single day like this, we could have actually just used
     `doctor_visits %>% epix_as_of(forecast_date)` to get the relevant snapshot, and then fed that into `arx_forecaster()` as we did in the [landing