@@ -111,20 +111,16 @@ We evaluate the following four models:
111111
112112$$
113113\begin{aligned}
114- &\text{Cases:} \\
115- & h(Y_{\ell,t+d})
116- \approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
117- &\text{Cases + Facebook:} \\
118- & h(Y_{\ell,t+d})
119- \approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
114+ h(Y_{\ell,t+d})
115+ &\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
116+ h(Y_{\ell,t+d})
117+ &\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
120118\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
121- &\text{Cases + Google:} \\
122- & h(Y_{\ell,t+d})
123- \approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
119+ h(Y_{\ell,t+d})
120+ &\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
124121\sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
125- &\text{Cases + Facebook + Google:} \\
126- & h(Y_{\ell,t+d})
127- \approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
122+ h(Y_{\ell,t+d})
123+ &\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
128124\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
129125\sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
130126\end{aligned}
@@ -134,14 +130,15 @@ Here $d=7$ or $d=14$, depending on the target value
134130(number of days we predict ahead),
135131and $h$ is a transformation to be specified later.
136132
137- Informally, the first model bases its predictions of future case rates
138- on the following three features:
133+ Informally, the first model, which we'll call the "Cases" model,
134+ bases its predictions of future case rates on the following three features:
139135current COVID-19 case rates, and those 1 and 2 weeks back.
140- The second model additionally incorporates the current Facebook signal,
141- and the Facebook signal from 1 and 2 weeks back.
142- The third model is exactly same but substitutes the Google signal
143- instead of the Facebook one.
144- Finally, the fourth model uses both Facebook and Google signals.
136+ The second model, "Cases + Facebook", additionally incorporates the
137+ current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
138+ The third model, "Cases + Google", is exactly the same but substitutes the
139+ Google signal instead of the Facebook one.
140+ Finally, the fourth model, "Cases + Facebook + Google",
141+ uses both Facebook and Google signals.
145142For each model, in order to make a forecast at time $t_0$
146143(to predict case rates at time $t_0+d$),
147144we fit a linear model using least absolute deviations (LAD) regression,
@@ -293,8 +290,8 @@ is much bigger but still below 0.01.
293290 test] ( https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test )
294291 (for paired data, as we have here) is more popular,
295292 because it tends to be more powerful than the sign test.
296- Applied here, it does indeed give smaller p-values pretty much across the board.
297- However, it assumes symmetry of the distribution in question
293+ Applied here, it does indeed give smaller p-values pretty much across the
294+ board. However, it assumes symmetry of the distribution in question
298295 (in our case, the difference in scaled errors),
299296 whereas the sign test does not, and thus we show results from the latter.
300297
0 commit comments