11---
2- title : " Get started with `epipredict`"
2+ title : " Get started with `{ epipredict} `"
33output : rmarkdown::html_vignette
44vignette : >
5- %\VignetteIndexEntry{Get started with `epipredict`}
5+ %\VignetteIndexEntry{Get started with `{ epipredict} `}
66 %\VignetteEngine{knitr::rmarkdown}
77 %\VignetteEncoding{UTF-8}
88---
@@ -26,45 +26,52 @@ used_locations <- c("ca", "ma", "ny", "tx")
2626library(epidatr)
2727```
2828
29- At a high level, the goal of ` {epipredict} ` is to make running simple machine
30- learning / statistical forecasters for epidemiology easy.
31- To do this, we have extended several [ tidymodels] ( https://www.tidymodels.org/ )
32- packages to handle the case of panel time-series data.
33- Our hope is that it is easy for users with epi training and some statistics to
34- fit baseline models while still allowing those with more nuanced statistical
35- understanding to create complicated specializations using the same framework.
36- Towards that end, epipredict provides two main classes of tools:
37-
38- 1 . A set of basic, easy-to-use "canned" forecasters that work out of the box.
39- For the basic forecasters, we currently provide:
40- * Flatline forecaster: predicts a median that is the last value
41- with increasingly wide quantiles.
42- * Autoregressive forecaster: fits a model (e.g. linear regression) on
43- lagged data to predict quantiles for continuous values.
44- * Autoregressive classifier: fits a model (e.g. logistic regression) on
45- lagged data to predict a binned version of the growth rate.
46- * CDC FluSight flatline forecaster: a variant of the flatline forecaster as
47- used by the CDC in FluSight.
48- 2 . A framework for creating custom forecasters out of modular components, from
49- which the canned forecasters were created. There are three types of
50- components:
51- * Preprocessor: do things to the data before model training, such as convert
52- counts to rates, create smoothed columns, or [ any of the recipes
53- steps] ( https://recipes.tidymodels.org/reference/index.html )
54- * Trainer: train a model on data, resulting in a fitted model object.
55- Examples include linear regression, quantile regression, or [ any parsnip
56- engine] ( https://parsnip.tidymodels.org/ ) .
57- * Postprocessor: unique to this package, and used to do things to the
58- predictions after the model has been fit, such as
59- - generate quantiles from purely point-prediction models,
60- - undo operations done in the steps, such as convert back to counts from
61- rates
62- - generally adapt the format of the prediction to it's eventual use.
63-
64- The rest of the getting started will focus on using and modifying the canned forecasters.
65- If you need a more complicated model, check out the [ Guts
66- vignette] ( preprocessing-and-models ) for examples of using the forecaster
67- framework.
29+ At a high level, the goal of ` {epipredict} ` is to make it easy to run simple machine
30+ learning and statistical forecasters for epidemiological data.
31+ To do this, we have extended the [ tidymodels] ( https://www.tidymodels.org/ )
32+ framework to handle the case of panel time-series data.
33+
34+ Our hope is that it is easy for users with epidemiological training and some statistical knowledge to
35+ fit baseline models, while also allowing those with more nuanced statistical
36+ understanding to create complex custom models using the same framework.
37+ Towards that end, ` {epipredict} ` provides two main classes of tools:
38+
39+ ## Canned forecasters
40+
41+ A set of basic, easy-to-use "canned" forecasters that work out of the box.
42+ We currently provide the following basic forecasters:
43+
44+ * _ Flatline forecaster_ : predicts as the median the most recently seen value
45+ with increasingly wide quantiles.
46+ * _ Autoregressive forecaster_ : fits a model (e.g. linear regression) on
47+ lagged data to predict quantiles for continuous values.
48+ * _ Autoregressive classifier_ : fits a model (e.g. logistic regression) on
49+ lagged data to predict a binned version of the growth rate.
50+ * _ CDC FluSight flatline forecaster_ : a variant of the flatline forecaster that is
51+ used as a baseline in the CDC's [ FluSight forecasting competition] ( https://www.cdc.gov/flu-forecasting/about/index.html ) .
52+
53+ ## Forecasting framework
54+
55+ A framework for creating custom forecasters out of modular components, from
56+ which the canned forecasters were created. There are three types of
57+ components:
58+
59+ * _ Preprocessor_ : transform the data before model training, such as converting
60+ counts to rates, creating smoothed columns, or [ any ` {recipes} `
61+ ` step ` ] ( https://recipes.tidymodels.org/reference/index.html )
62+ * _ Trainer_ : train a model on data, resulting in a fitted model object.
63+ Examples include linear regression, quantile regression, or [ any ` {parsnip} `
64+ engine] ( https://parsnip.tidymodels.org/reference/index.html ) .
65+ * _ Postprocessor_ : unique to ` {epipredict} ` ; used to transform the
66+ predictions after the model has been fit, such as
67+ - generating quantiles from purely point-prediction models,
68+ - reverting operations done in the ` step ` s, such as converting from
69+ rates back to counts
70+ - generally adapting the format of the prediction to its eventual use.
71+
72+ The rest of the "getting started" vignette will focus on using and modifying the canned forecasters.
73+ Check out the [ "guts" vignette] ( preprocessing-and-models ) for examples of using the forecaster
74+ framework to make more complex, custom forecasters.
6875
6976If you are interested in time series in a non-panel data context, you may also
7077want to look at ` {timetk} ` and ` {modeltime} ` for some related techniques.
@@ -76,24 +83,26 @@ For a more in-depth treatment with some practical applications, see also the
7683## Example data
7784
7885The forecasting methods in this package are designed to work with panel time
79- series data, specifically in the form of an ` epi_df ` from the ` {epiprocess} `
86+ series data in ` epi_df ` format as made available in the ` {epiprocess} `
8087package.
81- This is a collection of one or more time-series indexed by one or more
88+ An ` epi_df ` is a collection of one or more time-series indexed by one or more
8289categorical variables.
83- For example, on the landing page:
90+ The [ ` {epidatasets} ` ] ( https://cmu-delphi.github.io/epidatasets/ ) package makes several
91+ pre-compiled example datasets available.
92+ Let's look at an example ` epi_df ` :
8493
8594``` {r data_ex}
8695covid_case_death_rates
8796```
8897
89- ` geo_value ` is the only key for this dataset, while there are two separate
90- time- series, ` case_rate ` and ` death_rate ` .
91- The keys are represented in "long" format (so separate columns for the key and
92- the value) , while separate time series are represented in "wide" format (so each
93- time- series has a separate column) .
98+ This dataset uses a single key, ` geo_value ` , and two separate
99+ time series, ` case_rate ` and ` death_rate ` .
100+ The keys are represented in "long" format, with separate columns for the key and
101+ the value, while separate time series are represented in "wide" format with each
102+ time series stored in a separate column.
94103
95104` {epiprocess} ` is designed to handle data that always has a geographic key, and
96- potentially other key values, such as age, ethnicity or other demographic
105+ potentially other key values, such as age, ethnicity, or other demographic
97106information.
98107For example, ` grad_employ_subset ` from ` {epidatasets} ` also has both ` age_group `
99108and ` edu_qual ` as additional keys:
@@ -102,20 +111,20 @@ and `edu_qual` as additional keys:
102111grad_employ_subset
103112```
104113
105- See ` {epiprocess} ` for more details
106- on the format.
114+ See ` {epiprocess} ` for [ more details on the ` epi_df ` format] ( https://cmu-delphi.github.io/epiprocess/articles/epi_df.html ) .
107115
108116Panel time series are ubiquitous in epidemiology, but are also common in
109117economics, psychology, sociology, and many other areas.
110118While this package was designed with epidemiology in mind, many of the
111119techniques are more broadly applicable.
112120
113121## Customizing ` arx_forecaster() `
114- Moving on from the example on the [ landing
115- page] ( ../index.html#motivating-example ) , let's adjust some parameters for
122+ Let's expand on the basic example presented on the [ landing
123+ page] ( ../index.html#motivating-example ) , starting with adjusting some parameters in
116124` arx_forecaster() ` .
117- ` trainer ` allows us to set a different fitting engines, either one of the
118- included ones, such as ` quantile_reg() ` , or one of the relevant [ parsnip
125+
126+ The ` trainer ` argument allows us to set the fitting engine. We can use either one of the
127+ included engines, such as ` quantile_reg() ` , or one of the relevant [ parsnip
119128models] ( https://www.tidymodels.org/find/parsnip/ ) :
120129
121130``` {r make-forecasts, warning=FALSE}
@@ -135,9 +144,9 @@ hardhat::extract_fit_engine(two_week_ahead$epi_workflow)
135144The default trainer is ` parsnip::linear_reg() ` , which generates quantiles after
136145the fact in the post-processing layers, rather than as part of the model.
137146While this does work, it is generally preferable to use ` quantile_reg() ` , as the
138- quantiles generated in this manner can be poorly behaved.
139- ` quantile_reg() ` on the other hand directly estimates different linear models
140- for each quantile, reflected in the 3 different columns for ` tau ` above.
147+ quantiles generated in post-processing can be poorly behaved.
148+ ` quantile_reg() ` on the other hand directly estimates a different linear model
149+ for each quantile, reflected in the several different columns for ` tau ` above.
141150
142151Because of the flexibility of ` {parsnip} ` , there are a whole host of models
143152available to us[ ^ 5 ] ; as an example, we could have just as easily substituted a
@@ -177,16 +186,15 @@ two_week_ahead <- arx_forecaster(
177186hardhat::extract_fit_engine(two_week_ahead$epi_workflow)
178187```
179188
180- See the function documentation of ` arx_args_list() ` for more examples of the modifications available.
181- If you are looking for even further modifications, you will want a custom
182- workflow, for which you should see the [ guts
183- vignette] ( preprocessing-and-models ) .
189+ See the function documentation for ` arx_args_list() ` for more examples of the modifications available.
190+ If you want to make further modifications, you will need a custom
191+ workflow; see the [ "guts" vignette] ( custom_epiworkflows ) for details.
184192
185193## Generating multiple aheads
186194Frequently, one doesn't want just a forecast for a single day, but a trajectory
187195of forecasts for several weeks.
188- The way to do this using ` arx_forecaster() ` is by looping over aheads; for
189- example, to predict every day over 4 weeks :
196+ We can do this with ` arx_forecaster() ` by looping over aheads; for
197+ example, to predict every day over a 4-week time period :
190198
191199``` {r temp-thing}
192200all_canned_results <- lapply(
@@ -428,7 +436,7 @@ An `epi_workflow()` consists of 3 parts:
428436 5 of as these well. You can inspect the layers more closely by running
429437 ` epipredict::extract_layers(four_week_ahead$epi_workflow) ` .
430438
431- See the [ Guts vignette] ( custom_epiworkflows ) for recreating and then
439+ See the [ "guts" vignette] ( custom_epiworkflows ) for recreating and then
432440extending ` four_week_ahead ` using the custom forecaster framework.
433441
434442## Mathematical description
0 commit comments