Skip to content

Commit a3aa123

Browse files
Jingjing TangJingjing Tang
authored andcommitted
add code
1 parent cf6f4c1 commit a3aa123

File tree

75 files changed

+12419
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+12419
-0
lines changed

google_symptoms/.gitignore

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# You should hard commit a prototype for this file, but we
2+
# want to avoid accidental adding of API tokens and other
3+
# private data parameters
4+
params.json
5+
6+
# Do not commit output files
7+
receiving/*.csv
8+
tests/receiving/*.csv
9+
10+
# Remove macOS files
11+
.DS_Store
12+
13+
# virtual environment
14+
dview/
15+
16+
# Byte-compiled / optimized / DLL files
17+
__pycache__/
18+
*.py[cod]
19+
*$py.class
20+
21+
# C extensions
22+
*.so
23+
24+
# Distribution / packaging
25+
coverage.xml
26+
.Python
27+
build/
28+
develop-eggs/
29+
dist/
30+
downloads/
31+
eggs/
32+
.eggs/
33+
lib/
34+
lib64/
35+
parts/
36+
sdist/
37+
var/
38+
wheels/
39+
*.egg-info/
40+
.installed.cfg
41+
*.egg
42+
MANIFEST
43+
44+
# PyInstaller
45+
# Usually these files are written by a python script from a template
46+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
47+
*.manifest
48+
*.spec
49+
50+
# Installer logs
51+
pip-log.txt
52+
pip-delete-this-directory.txt
53+
54+
# Unit test / coverage reports
55+
htmlcov/
56+
.tox/
57+
.coverage
58+
.coverage.*
59+
.cache
60+
nosetests.xml
61+
coverage.xml
62+
*.cover
63+
.hypothesis/
64+
.pytest_cache/
65+
66+
# Translations
67+
*.mo
68+
*.pot
69+
70+
# Django stuff:
71+
*.log
72+
.static_storage/
73+
.media/
74+
local_settings.py
75+
76+
# Flask stuff:
77+
instance/
78+
.webassets-cache
79+
80+
# Scrapy stuff:
81+
.scrapy
82+
83+
# Sphinx documentation
84+
docs/_build/
85+
86+
# PyBuilder
87+
target/
88+
89+
# Jupyter Notebook
90+
.ipynb_checkpoints
91+
92+
# pyenv
93+
.python-version
94+
95+
# celery beat schedule file
96+
celerybeat-schedule
97+
98+
# SageMath parsed files
99+
*.sage.py
100+
101+
# Environments
102+
.env
103+
.venv
104+
env/
105+
venv/
106+
ENV/
107+
env.bak/
108+
venv.bak/
109+
110+
# Spyder project settings
111+
.spyderproject
112+
.spyproject
113+
114+
# Rope project settings
115+
.ropeproject
116+
117+
# mkdocs documentation
118+
/site
119+
120+
# mypy
121+
.mypy_cache/

google_symptoms/.pylintrc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[DESIGN]
2+
3+
min-public-methods=1
4+
5+
6+
[MESSAGES CONTROL]
7+
8+
disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702, W0707

google_symptoms/DETAILS.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# USA Facts Cases and Deaths
2+
3+
We import the confirmed case and deaths data from USA Facts website and export
4+
the county-level data as-is. We also aggregate the data to the MSA, HRR, and
5+
State levels.
6+
7+
In order to avoid confusing public consumers of the data, we maintain
8+
consistency how USA Facts reports the data, please refer to [Exceptions](#Exceptions).
9+
10+
## Geographical Levels (`geo`)
11+
* `county`: reported using zero-padded FIPS codes. There are some exceptions
12+
that lead to inconsistency with the other COVIDcast data (but are necessary
13+
for internal consistency), noted below.
14+
* `msa`: reported using cbsa (consistent with all other COVIDcast sensors)
15+
* `hrr`: reported using HRR number (consistent with all other COVIDcast sensors)
16+
* `state`: reported using two-letter postal code
17+
18+
## Metrics, Level 1 (`m1`)
19+
* `confirmed`: Confirmed cases
20+
* `deaths`
21+
22+
Recoveries are _not_ reported.
23+
24+
## Metrics, Level 2 (`m2`)
25+
* `new_counts`: number of new {confirmed cases, deaths} on a given day
26+
* `cumulative_counts`: total number of {confirmed cases, deaths} up until the
27+
first day of data (January 22nd)
28+
* `incidence`: `new_counts` / population * 100000
29+
30+
All three `m2` are ultimately derived from `cumulative_counts`, which is first
31+
available on January 22nd. In constructing `new_counts`, we take the first
32+
discrete difference of `cumulative_counts`, and assume that the
33+
`cumulative_counts` for January 21st is uniformly zero. This should not be a
34+
problem, because there there is only one county with a nonzero
35+
`cumulative_count` on January 22nd, with a value of 1.
36+
37+
For deriving `incidence`, we use the estimated 2019 county population values
38+
from the US Census Bureau. https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html
39+
40+
## Exceptions
41+
42+
At the County (FIPS) level, we report the data _exactly_ as USA Facts reports their
43+
data, to prevent confusing public consumers of the data.
44+
The visualization and modeling teams should take note of these exceptions.
45+
46+
### New York City
47+
48+
New York City comprises of five boroughs:
49+
50+
|Borough Name |County Name |FIPS Code |
51+
|-------------------|-------------------|---------------|
52+
|Manhattan |New York County |36061 |
53+
|The Bronx |Bronx County |36005 |
54+
|Brooklyn |Kings County |36047 |
55+
|Queens |Queens County |36081 |
56+
|Staten Island |Richmond County |36085 |
57+
58+
**New York City Unallocated cases/deaths are reported by USA Facts independently.** We split them evenly among the five NYC FIPS, which results in float numbers.
59+
60+
All NYC counts are mapped to the MSA with CBSA ID 35620, which encompasses
61+
all five boroughs. All NYC counts are mapped to HRR 303, which intersects
62+
all five boroughs (297 also intersects the Bronx, 301 also intersects
63+
Brooklyn and Queens, but absent additional information, We are leaving all
64+
counts in 303).
65+
66+
67+
### Mismatched FIPS Codes
68+
69+
There are two FIPS codes that were changed in 2015, leading to
70+
mismatch between us and USA Facts. We report the data using the FIPS code used
71+
by USA Facts, again to promote consistency and avoid confusion by external users
72+
of the dataset. For the mapping to MSA, HRR, these two counties are
73+
included properly.
74+
75+
|County Name |State |"Our" FIPS |USA Facts FIPS |
76+
|-------------------|---------------|-------------------|---------------|
77+
|Oglala Lakota |South Dakota |46113 |46102 |
78+
|Kusilvak |Alaska |02270 |02158 \& 02270 |
79+
80+
Documentation for the changes made by the US Census Bureau in 2015:
81+
https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html
82+
83+
Besides, Wade Hampton Census Area and Kusilvak Census Area are reported by USA Facts with FIPS 02270 and 02158 respectively, though there is always 0 cases/deaths reported for Wade Hampton Census Area (02270). According to US Census Bureau, Wade Hampton Census Area has changed name and code from Wade Hampton Census Area, Alaska (02270) to Kusilvak Census Area, Alaska (02158) effective July 1, 2015.
84+
https://www.census.gov/quickfacts/kusilvakcensusareaalaska
85+
86+
### Grand Princess Cruise Ship
87+
Data from Grand Princess Cruise Ship is given its own dedicated line, with FIPS code 6000. We just ignore these cases/deaths.
88+
89+
90+
91+
92+
## Negative incidence
93+
94+
Negative incidence is possible because figures are sometimes revised
95+
downwards, e.g., when a public health authority moves cases from County X
96+
to County Y, County X may have negative incidence.
97+
98+
## Non-integral counts
99+
100+
Because the MSA and HRR numbers are computed by taking population-weighted
101+
averages, the count data at those geographical levels may be non-integral.
102+
103+
## Counties not in our canonical dataset
104+
105+
Some FIPS codes do not appear as the primary FIPS for any ZIP code in our
106+
canonical `02_20_uszips.csv`; they appear in the `county` exported files, but
107+
for the MSA/HRR mapping, we disburse them equally to the counties with whom
108+
they appear as a secondary FIPS code. The identification of such "secondary"
109+
FIPS codes are documented in `notebooks/create-mappings.ipynb`. The full list
110+
of `secondary, [mapped]` is:
111+
112+
```
113+
SECONDARY_FIPS = [ # generated by notebooks/create-mappings.ipynb
114+
('51620', ['51093', '51175']),
115+
('51685', ['51153']),
116+
('28039', ['28059', '28041', '28131', '28045', '28059', '28109',
117+
'28047']),
118+
('51690', ['51089', '51067']),
119+
('51595', ['51081', '51025', '51175', '51183']),
120+
('51600', ['51059', '51059', '51059']),
121+
('51580', ['51005']),
122+
('51678', ['51163']),
123+
]
124+
```

google_symptoms/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Google Symptoms
2+
3+
We import the confirmed case and deaths data from the Google Research's
4+
Open COVID-19 Data project and export the county-level and state-level data
5+
as-is. For detailed information see the files `DETAILS.md` contained
6+
in this directory.
7+
8+
## Running the Indicator
9+
10+
The indicator is run by directly executing the Python module contained in this
11+
directory. The safest way to do this is to create a virtual environment,
12+
installed the common DELPHI tools, and then install the module and its
13+
dependencies. To do this, run the following code from this directory:
14+
15+
```
16+
python -m venv env
17+
source env/bin/activate
18+
pip install ../_delphi_utils_python/.
19+
pip install .
20+
```
21+
22+
All of the user-changable parameters are stored in `params.json`. (NOTE: In
23+
production we specify `"export_start_date": "latest",`). To execute the module
24+
and produce the output datasets (by default, in `receiving`), run the following.
25+
26+
```
27+
env/bin/python -m delphi_google_symptoms
28+
```
29+
30+
Once you are finished with the code, you can deactivate the virtual environment
31+
and (optionally) remove the environment itself.
32+
33+
```
34+
deactivate
35+
rm -r env
36+
```
37+
38+
## Testing the code
39+
40+
To do a static test of the code style, it is recommended to run **pylint** on
41+
the module. To do this, run the following from the main module directory:
42+
43+
```
44+
env/bin/pylint delphi_google_symptoms
45+
```
46+
47+
The most aggressive checks are turned off; only relatively important issues
48+
should be raised and they should be manually checked (or better, fixed).
49+
50+
Unit tests are also included in the module. To execute these, run the following
51+
command from this directory:
52+
53+
```
54+
(cd tests && ../env/bin/pytest --cov=delphi_google_symptoms --cov-report=term-missing)
55+
```
56+
57+
The output will show the number of unit tests that passed and failed, along
58+
with the percentage of code covered by the tests. None of the tests should
59+
fail and the code lines that are not covered by unit tests should be small and
60+
should not include critical sub-routines.
61+
62+
- Jenkins test #1

google_symptoms/REVIEW.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Code Review (Python)
2+
3+
A code review of this module should include a careful look at the code and the
4+
output. To assist in the process, but certainly not in replace of it, please
5+
check the following items.
6+
7+
**Documentation**
8+
9+
- [ ] the README.md file template is filled out and currently accurate; it is
10+
possible to load and test the code using only the instructions given
11+
- [ ] minimal docstrings (one line describing what the function does) are
12+
included for all functions; full docstrings describing the inputs and expected
13+
outputs should be given for non-trivial functions
14+
15+
**Structure**
16+
17+
- [ ] code should use 4 spaces for indentation; other style decisions are
18+
flexible, but be consistent within a module
19+
- [ ] any required metadata files are checked into the repository and placed
20+
within the directory `static`
21+
- [ ] any intermediate files that are created and stored by the module should
22+
be placed in the directory `cache`
23+
- [ ] final expected output files to be uploaded to the API are placed in the
24+
`receiving` directory; output files should not be committed to the respository
25+
- [ ] all options and API keys are passed through the file `params.json`
26+
- [ ] template parameter file (`params.json.template`) is checked into the
27+
code; no personal (i.e., usernames) or private (i.e., API keys) information is
28+
included in this template file
29+
30+
**Testing**
31+
32+
- [ ] module can be installed in a new virtual environment
33+
- [ ] pylint with the default `.pylint` settings run over the module produces
34+
minimal warnings; warnings that do exist have been confirmed as false positives
35+
- [ ] reasonably high level of unit test coverage covering all of the main logic
36+
of the code (e.g., missing coverage for raised errors that do not currently seem
37+
possible to reach are okay; missing coverage for options that will be needed are
38+
not)
39+
- [ ] all unit tests run without errors

google_symptoms/cache/.gitignore

Whitespace-only changes.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to pull and clean indicators from the Google Research's Open
3+
COVID-19 Data project.
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import pull
12+
from . import run
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m MODULE_NAME`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from .run import run_module # pragma: no cover
10+
11+
run_module() # pragma: no cover

0 commit comments

Comments
 (0)