@@ -391,129 +391,6 @@ can expect in practice.
391391p2
392392```
393393
394- ## Comparing version faithful and faithless in Canada
395-
396-
397- By leveraging the flexibility of ` {epipredict} ` , we can apply the same
398- techniques to data from other sources.
399- Since some collaborators are in British Columbia, Canada, we'll do essentially
400- the same thing for Canadian Provincial data as we did above.
401-
402- The [ COVID-19 Canada Open Data Working Group] ( https://opencovid.ca/ ) collects
403- daily time series data on COVID-19 cases, deaths, recoveries, testing and
404- vaccinations at the health region and province levels.
405- Data are collected from publicly available sources such as government datasets
406- and news releases.
407- Unfortunately, there is no simple versioned source, so we have created our own
408- from the Github commit history and stored it in ` epidatasets::can_prov_cases ` .
409-
410- First, we load versioned case rates at the provincial level.
411-
412- ``` {r get-can-fc, warning = FALSE}
413- aheads <- c(7, 14, 21, 28)
414- canada_archive <- epidatasets::can_prov_cases
415- revis_can <- epidatasets::can_prov_cases %>% revision_summary()
416- revis_can %>% group_by(geo_value) %>% summarize(n_revisions = mean(n_revisions)) %>% print(n=13)
417- canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
418- mutate(version = time_value) %>%
419- as_epi_archive()
420- ```
421-
422- We run more or less the same forecasting method as above, but with the addition
423- of 7-day averaging for each snapshot before forecasting (due to highly variable
424- provincial reporting mismatches).
425-
426- ``` {r smoothing, warning = FALSE}
427- smooth_cases <- function(epi_df) {
428- epi_df %>%
429- group_by(geo_value) %>%
430- epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
431- }
432- forecast_dates <- seq.Date(
433- from = min(canada_archive$DT$version),
434- to = max(canada_archive$DT$version),
435- by = "1 month"
436- )
437-
438- canada_version_faithless <-
439- canada_archive_faux %>%
440- epix_slide(
441- ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
442- .before = 120,
443- .versions = forecast_dates
444- ) %>%
445- mutate(version_faithful = FALSE)
446- canada_version_faithful <-
447- canada_archive %>%
448- epix_slide(
449- ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
450- .before = 120,
451- .versions = forecast_dates
452- ) %>%
453- mutate(version_faithful = TRUE)
454- canada_forecasts <- bind_rows(
455- canada_version_faithless,
456- canada_version_faithful
457- )
458- ```
459-
460- The figures below shows the results for a single province.
461- <details >
462- <summary > Plotting </summary >
463- First prepping some data to make plotting more informative.
464- ``` {r plot-can-fc-lr-data, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
465- geo_choose <- "Saskatchewan"
466- forecasts_filtered <- canada_forecasts %>%
467- filter(geo_value == geo_choose) %>%
468- mutate(time_value = version)
469- case_rate_data <- bind_rows(
470- # Snapshotted data for the version-faithful forecasts
471- map(
472- forecast_dates,
473- ~ canada_archive %>%
474- epix_as_of(.x) %>%
475- smooth_cases() %>%
476- mutate(case_rate = case_rate_7dav, version = .x)
477- ) %>%
478- bind_rows() %>%
479- mutate(version_faithful = TRUE),
480- # Latest data for the version-faithless forecasts
481- canada_archive %>%
482- epix_as_of(doctor_visits$versions_end) %>%
483- smooth_cases() %>%
484- mutate(case_rate = case_rate_7dav, version_faithful = FALSE)
485- ) %>%
486- filter(geo_value == geo_choose)
487- case_rate_data %>% filter(time_value == "2021-01-13") %>% print(n=12)
488- canada_archive %>% revision_summary()
489- ```
490- And actually generating the plot
491- ``` {r plot-can-fc-lr, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
492- p3 <-
493- ggplot(data = forecasts_filtered, aes(x = target_date, group = time_value)) +
494- geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`, fill = factor(time_value)), alpha = 0.4) +
495- geom_line(aes(y = .pred, color = factor(time_value)), linetype = 2L) +
496- geom_point(aes(y = .pred, color = factor(time_value)), size = 0.75) +
497- # the forecast date
498- geom_vline(data = case_rate_data %>% filter(geo_value == geo_choose) %>% select(-version_faithful), aes(color = factor(version), xintercept = version), lty = 2) +
499- # the underlying data
500- geom_line(
501- data = case_rate_data %>% filter(geo_value == geo_choose),
502- aes(x = time_value, y = case_rate, color = factor(version)),
503- inherit.aes = FALSE, na.rm = TRUE
504- ) +
505- facet_grid(version_faithful ~ geo_value, scales = "free") +
506- scale_x_date(breaks = "2 months", date_labels = "%b %Y") +
507- scale_y_continuous(expand = expansion(c(0, 0.05))) +
508- labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
509- theme(legend.position = "none")
510- ```
511- </details >
512-
513- ``` {r show-can-plot, warning = FALSE, echo = FALSE}
514- p3
515- ```
516-
517394
518395[ ^ 1 ] : For forecasting a single day like this, we could have actually just used
519396 ` doctor_visits %>% epix_as_of(forecast_date) ` to get the relevant snapshot, and then fed that into ` arx_forecaster() ` as we did in the [ landing
0 commit comments