Skip to content

Commit 884e93e

Browse files
authored
Merge pull request #412 from cmu-delphi/docs-contingency-tables
Add page to symptom surveys block to document contingency tables
2 parents 4b9768f + b1ad619 commit 884e93e

File tree

7 files changed

+146
-21
lines changed

7 files changed

+146
-21
lines changed

docs/symptom-survey/coding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Questions and Coding
33
parent: COVID Symptom Survey
4-
nav_order: 5
4+
nav_order: 6
55
---
66

77
# Questions and Coding
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Contingency Tables
3+
parent: COVID Symptom Survey
4+
nav_order: 4
5+
---
6+
7+
# Contingency Tables
8+
{: .no_toc}
9+
10+
This documentation describes the fine-resolution contingency tables produced by
11+
grouping [COVID Symptom Survey](./index.md) individual responses by various
12+
demographic features:
13+
14+
* [Weekly files](https://cmu.box.com/s/xwjulq0pteen52d4upni9ikagu7d8bl2)
15+
* [Monthly files](https://cmu.box.com/s/vh4gs3j541tt9pqn2pn72bktu0op8tpo)
16+
17+
These contingency tables provide demographic breakdowns of COVID-related topics such as
18+
vaccine uptake and acceptance. They are more detailed than the
19+
[coarse aggregates reported in the COVIDcast Epidata API](../api/covidcast-signals/fb-survey.md),
20+
which are grouped only by geographic region. [Individual response data](survey-files.md)
21+
for the survey is available, but only to academic or nonprofit researchers who
22+
sign a Data Use Agreement, whereas these contingency tables are available to the
23+
general public.
24+
25+
Important updates for data users, including corrections to data or updates on
26+
data processing delays, are posted as `OUTAGES.txt` in the directory where the
27+
data is hosted.
28+
29+
## Table of contents
30+
{: .no_toc .text-delta}
31+
32+
1. TOC
33+
{:toc}
34+
35+
## Available Data Files
36+
37+
We provide two types of data files, weekly and monthly. Users who need the most
38+
up-to-date data or are interested in timeseries should use the weekly files,
39+
while those who want to study groups with smaller sample sizes should use the
40+
monthly files. Because monthly aggregates include more responses, they have
41+
lower missingness when grouping by several features at a time.
42+
43+
## Dates
44+
45+
The included files provide estimates for various metrics of interest over a
46+
period of either a full epiweek (or [MMWR
47+
week](https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf), a
48+
standardized numbering of weeks throughout the year) or a full month.
49+
50+
## Aggregation
51+
52+
The aggregates are filtered to only include estimates for a particular group if
53+
that group includes 100 or more responses. Especially in the weekly aggregates,
54+
many of the state-level groups have been filtered out due to low sample size. In
55+
such cases, the state marginal files, which group by a single demographic of
56+
interest at a time, will likely provide more coverage.
57+
58+
## File Format
59+
60+
### Naming
61+
62+
Each CSV is named as follows:
63+
64+
{date}_{region}_{vars}.csv
65+
66+
Dates are of the form `YYYYmmdd`. `date` refers to the first day of the time
67+
period survey responses were aggregated over, in the Pacific time zone (UTC -
68+
7). Unless noted otherwise, the time period is always a complete month or
69+
epiweek. `region` is the geographic level responses were aggregated over. At the
70+
moment, only nation-wide and state groupings are available. `vars` is a list all
71+
other grouping variables used in the aggregation, ordered alphabetically.
72+
73+
### Columns
74+
75+
Within a CSV, the first few columns are the grouping variables, ordered
76+
alphabetically. Each aggregate reports four columns (unrounded):
77+
78+
* `val_<indicator name>`: the main value of interest, e.g., percent, average, or
79+
count, estimated using the [survey weights](weights.md) to better match state
80+
demographics
81+
* `se_<indicator name>`: the standard error of `val_<indicator name>`
82+
* `sample_size_<indicator name>`: the number of survey responses used to
83+
calculate `val_<indicator name>`
84+
* `represented_<indicator name>`: the number of people in the population that
85+
`val_<indicator name>` represents over all days in the given time period. This
86+
is the sum of [survey weights](./weights.md) for all survey responses
87+
used.
88+
89+
All aggregates using the same set of grouping variables appear in a single CSV.
90+
91+
## Indicators
92+
93+
The files contain [weighted
94+
estimates](../api/covidcast-signals/fb-survey.md#survey-weighting) of percent of
95+
respondents who fulfill one or several criteria. Estimates are broken out by
96+
state, age, gender, race, and ethnicity.
97+
98+
| Indicator | Description | Survey Item |
99+
| --- | --- | --- |
100+
| `pct_vaccinated` | Estimated percentage of respondents who have already received a COVID vaccine. <br/> **Earliest date available:** 2021-01-01 | V1 |
101+
| `pct_accepting` | Estimated percentage of respondents who would definitely or probably choose to get vaccinated, if a vaccine were offered to them today, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V3 |
102+
| `pct_concerned_sideeffects` | Estimated percentage of respondents who are very or moderately concerned that they would "experience a side effect from a COVID-19 vaccination." (Asked of all respondents, including those who have already received one or more doses of a COVID-19 vaccine.) <br/> **Earliest date available:** 2021-01-01 | V9 |
103+
| `pct_hesitant_sideeffects` | Estimated percentage of respondents who are very or moderately concerned that they would "experience a side effect from a COVID-19 vaccination" *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V9 and V3 |
104+
| `pct_trust_fam` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by friends and family, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V4 |
105+
| `pct_trust_healthcare` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by local health workers, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V4 |
106+
| `pct_trust_who` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by the World Health Organization, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V4 |
107+
| `pct_trust_govt` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by government health officials, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V4 |
108+
| `pct_trust_politicians` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by politicians, among respondents who have not yet been vaccinated. <br/> **Earliest date available:** 2021-01-01 | V4 |
109+
| `pct_hesitant_trust_fam` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by friends and family, among respondents who have not yet been vaccinated *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V3 and V4 |
110+
| `pct_hesitant_trust_healthcare` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by local health workers, among respondents who have not yet been vaccinated *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V3 and V4 |
111+
| `pct_hesitant_trust_who` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by the World Health Organization, among respondents who have not yet been vaccinated *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V3 and V4 |
112+
| `pct_hesitant_trust_govt` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by government health officials, among respondents who have not yet been vaccinated *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V3 and V4 |
113+
| `pct_trust_politicians` | Estimated percentage of respondents who would be more likely to get a COVID-19 vaccine if it were recommended to them by politicians, among respondents who have not yet been vaccinated *and* would "definitely not" or "probably not" get a COVID-19 vaccine if offered. <br/> **Earliest date available:** 2021-01-01 | V3 and V4 |
114+
115+
Note: CSVs for the month of January 2021 only use data from January 6-31 due to
116+
a [definitional change in a major vaccine item on January 6](./coding.md#new-items-2).
117+
Indicators based on [item V9 use data starting January 12](./coding.md#new-items-2).

docs/symptom-survey/data-access.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,16 @@ spread of COVID-19 and its effects on public health and well-being. This may
1212
help improve our local and national responses to the pandemic and our
1313
understanding of how it has affected society.
1414

15-
De-identified data can be made available to researchers associated with
15+
De-identified individual survey responses can be made available to researchers associated with
1616
universities or non-profit organizations. To request access to the data please
1717
submit the information requested in [Facebook's page on obtaining data
1818
access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/),
1919
which sets out the basic conditions and provides a form to request access. An
2020
[international version of the COVID Symptom Survey](https://covidmap.umd.edu/)
2121
is conducted by the University of Maryland (UMD) and access can be requested
2222
through the same form.
23+
24+
[High-level aggregates](../api/covidcast.md) of select survey items are
25+
publicly available in the [COVIDcast API](../api/covidcast-signals/fb-survey.md).
26+
[Finer aggregates](./contingency-tables.md) grouped by various demographic
27+
characteristics are available for download.

docs/symptom-survey/index.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,16 @@ social distancing), mental health, and economic and health impacts they have
1313
experienced as a result of the pandemic. A high-level overview of the survey is
1414
posted [on the COVIDcast website](https://delphi.cmu.edu/covidcast/surveys/).
1515

16-
Aggregate data from this survey is available through the [COVIDcast API](../api/covidcast.md)
17-
as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md).
16+
Geographically aggregated data from this survey is publicly available through
17+
the [COVIDcast API](../api/covidcast.md) as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md).
18+
Demographic breakdowns of survey data are publicly available as
19+
[downloadable contingency tables](contingency-tables.md).
1820

19-
This documentation is for users who have a signed Data Use Agreement to receive
20-
individual response data from the survey. It describes the survey items, data
21-
coding, data distribution, and the survey weights computed by Facebook. If you
22-
are a researcher and would like to get access to the data, see our page on
23-
getting [data access](data-access.md).
21+
This documentation describes the survey items, data coding, data distribution,
22+
and the survey weights computed by Facebook. It also documents the individual
23+
response data, which is available to researchers with a signed Data Use
24+
Agreement. If you are a researcher and would like to get access to the data, see
25+
our page on getting [data access](data-access.md).
2426

2527
If you have questions about the survey or getting access to data, contact us at
2628
<delphi-survey-info@lists.andrew.cmu.edu>.
@@ -49,9 +51,8 @@ and others. If you are interested in getting involved, see our
4951

5052
## Citing the Survey
5153

52-
Researchers who use the survey microdata for research are asked to credit and
53-
cite the survey in publications based on the data. Specifically, we ask that
54-
you:
54+
Researchers who use the survey data for research are asked to credit and cite
55+
the survey in publications based on the data. Specifically, we ask that you:
5556

5657
1. Include the acknowledgment "This research is based on survey results from
5758
Carnegie Mellon University’s Delphi Group."

docs/symptom-survey/server-access.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ nav_order: 2
99
Researchers with data use agreements to access the raw data from the COVID-19
1010
symptom survey can access the data over SFTP. (If you do not have a data use
1111
agreement, see the [main survey page](index.md) for information about getting
12-
access.)
12+
access and about aggregate data that is available for public download.)
1313

1414
If you're not familiar with SFTP, it is a protocol for securely accessing and downloading
1515
large amounts of data from remote servers. The instructions below explain how to

docs/symptom-survey/survey-files.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ are stored. To connect to the server, see the [server access documentation](serv
1313
This documentation describes the survey data available on that server.
1414

1515
You must sign a Data Use Agreement with Facebook and with CMU to gain
16-
access to the individual survey responses. If you have not done so, aggregate data is available
17-
[through the COVIDcast API](../api/covidcast-signals/fb-survey.md).
16+
access to the individual survey responses. If you have not done so, aggregate
17+
data is publicly available; see the [survey overview for details](index.md).
1818

1919
Important updates for data users, including corrections to data or updates on
2020
data processing delays, are posted as `OUTAGES.txt` in the SFTP server directory

docs/symptom-survey/weights.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,17 @@
11
---
22
title: Survey Weights
33
parent: COVID Symptom Survey
4-
nav_order: 4
4+
nav_order: 5
55
---
66

77
# Survey Weights
88
{: .no_toc}
99

1010
The symptom survey individual response files contain survey weights calculated
11-
by Facebook. Facebook has provided documentation to describe the calculation and
12-
usage of these weights, [available here](symptom-survey-weights.pdf). This
13-
documentation explains the weight methodology, gives examples of how to use the
14-
weights when calculating estimates, and states the known limitations of the
15-
weights.
11+
by Facebook. These weights are also used to produce our [public contingency tables](contingency-tables.md)
12+
and the geographic aggregates [in the COVIDcast Epidata API](../api/covidcast-signals/fb-survey.md).
13+
14+
Facebook has provided documentation to describe the calculation and usage of
15+
these weights, [available here](symptom-survey-weights.pdf). This documentation
16+
explains the weight methodology, gives examples of how to use the weights when
17+
calculating estimates, and states the known limitations of the weights.

0 commit comments

Comments
 (0)