cmu-delphi
diff --git a/‎google_symptoms/.gitignore‎
Lines changed: 121 additions & 0 deletions b/‎google_symptoms/.gitignore‎
Lines changed: 121 additions & 0 deletions
diff --git a/‎google_symptoms/.pylintrc‎
Lines changed: 8 additions & 0 deletions b/‎google_symptoms/.pylintrc‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎google_symptoms/DETAILS.md‎
Lines changed: 124 additions & 0 deletions b/‎google_symptoms/DETAILS.md‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎google_symptoms/README.md‎
Lines changed: 62 additions & 0 deletions b/‎google_symptoms/README.md‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎google_symptoms/REVIEW.md‎
Lines changed: 39 additions & 0 deletions b/‎google_symptoms/REVIEW.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎google_symptoms/cache/.gitignore‎ b/‎google_symptoms/cache/.gitignore‎
diff --git a/‎google_symptoms/delphi_google_symptoms/__init__.py‎
Lines changed: 12 additions & 0 deletions b/‎google_symptoms/delphi_google_symptoms/__init__.py‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎google_symptoms/delphi_google_symptoms/__main__.py‎
Lines changed: 11 additions & 0 deletions b/‎google_symptoms/delphi_google_symptoms/__main__.py‎
Lines changed: 11 additions & 0 deletions
@@ -0,0 +1,121 @@
+# You should hard commit a prototype for this file, but we
+# want to avoid accidental adding of API tokens and other
+# private data parameters
+params.json
+
+# Do not commit output files
+receiving/*.csv
+tests/receiving/*.csv
+
+# Remove macOS files
+.DS_Store
+
+# virtual environment
+dview/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+coverage.xml
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+.static_storage/
+.media/
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
@@ -0,0 +1,8 @@
+[DESIGN]
+
+min-public-methods=1
+
+
+[MESSAGES CONTROL]
+
+disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702, W0707
@@ -0,0 +1,124 @@
+# USA Facts Cases and Deaths
+
+We import the confirmed case and deaths data from USA Facts website and export
+the county-level data as-is.  We also aggregate the data to the MSA, HRR, and
+State levels.
+
+In order to avoid confusing public consumers of the data, we maintain
+consistency how USA Facts reports the data, please refer to [Exceptions](#Exceptions).
+
+## Geographical Levels (`geo`)
+* `county`: reported using zero-padded FIPS codes.  There are some exceptions
+  that lead to inconsistency with the other COVIDcast data (but are necessary
+  for internal consistency), noted below.  
+* `msa`: reported using cbsa (consistent with all other COVIDcast sensors)
+* `hrr`: reported using HRR number (consistent with all other COVIDcast sensors)
+* `state`: reported using two-letter postal code
+
+## Metrics, Level 1 (`m1`)
+* `confirmed`: Confirmed cases
+* `deaths`
+
+Recoveries are _not_ reported.
+
+## Metrics, Level 2 (`m2`)
+* `new_counts`: number of new {confirmed cases, deaths} on a given day
+* `cumulative_counts`: total number of {confirmed cases, deaths} up until the
+  first day of data (January 22nd)
+* `incidence`: `new_counts` / population * 100000
+
+All three `m2` are ultimately derived from `cumulative_counts`, which is first
+available on January 22nd.  In constructing `new_counts`, we take the first
+discrete difference of `cumulative_counts`,  and assume that the
+`cumulative_counts` for January 21st is uniformly zero.  This should not be a
+problem, because there there is only one county with a nonzero
+`cumulative_count` on January 22nd, with a value of 1.
+
+For deriving `incidence`, we use the estimated 2019 county population values
+from the US Census Bureau.  https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html
+
+## Exceptions
+
+At the County (FIPS) level, we report the data _exactly_ as USA Facts reports their
+data, to prevent confusing public consumers of the data.
+The visualization and modeling teams should take note of these exceptions.
+
+### New York City
+
+New York City comprises of five boroughs:
+
+|Borough Name       |County Name        |FIPS Code      |
+|-------------------|-------------------|---------------|
+|Manhattan          |New York County    |36061          |
+|The Bronx          |Bronx County       |36005          |
+|Brooklyn           |Kings County       |36047          |
+|Queens             |Queens County      |36081          |
+|Staten Island      |Richmond County    |36085          |
+
+**New York City Unallocated cases/deaths are reported by USA Facts independently.** We split them evenly among the five NYC FIPS, which results in float numbers. 
+
+All NYC counts are mapped to the MSA with CBSA ID 35620, which encompasses
+all five boroughs.  All NYC counts are mapped to HRR 303, which intersects
+all five boroughs (297 also intersects the Bronx, 301 also intersects
+Brooklyn and Queens, but absent additional information, We are leaving all
+counts in 303).
+
+
+### Mismatched FIPS Codes
+
+There are two FIPS codes that were changed in 2015, leading to
+mismatch between us and USA Facts.  We report the data using the FIPS code used
+by USA Facts, again to promote consistency and avoid confusion by external users
+of the dataset.  For the mapping to MSA, HRR, these two counties are
+included properly.
+
+|County Name        |State          |"Our" FIPS         |USA Facts FIPS       |
+|-------------------|---------------|-------------------|---------------|
+|Oglala Lakota      |South Dakota   |46113              |46102          |
+|Kusilvak           |Alaska         |02270              |02158 \& 02270         |
+
+Documentation for the changes made by the US Census Bureau in 2015:
+https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html
+
+Besides, Wade Hampton Census Area and Kusilvak Census Area are reported by USA Facts with FIPS 02270 and 02158 respectively, though there is always 0 cases/deaths reported for Wade Hampton Census Area (02270). According to US Census Bureau, Wade Hampton Census Area has changed name and code from Wade Hampton Census Area, Alaska (02270) to Kusilvak Census Area, Alaska (02158) effective July 1, 2015. 
+https://www.census.gov/quickfacts/kusilvakcensusareaalaska
+
+### Grand Princess Cruise Ship
+Data from Grand Princess Cruise Ship is given its own dedicated line, with FIPS code 6000. We just ignore these cases/deaths. 
+
+
+
+
+## Negative incidence
+
+Negative incidence is possible because figures are sometimes revised
+downwards, e.g., when a public health authority moves cases from County X
+to County Y, County X may have negative incidence.
+
+## Non-integral counts
+
+Because the MSA and HRR numbers are computed by taking population-weighted
+averages, the count data at those geographical levels may be non-integral.
+
+## Counties not in our canonical dataset
+
+Some FIPS codes do not appear as the primary FIPS for any ZIP code in our
+canonical `02_20_uszips.csv`; they appear in the `county` exported files, but
+for the MSA/HRR mapping, we disburse them equally to the counties with whom
+they appear as a secondary FIPS code.  The identification of such "secondary"
+FIPS codes are documented in `notebooks/create-mappings.ipynb`.  The full list
+of `secondary, [mapped]` is:
+
+```
+SECONDARY_FIPS = [   # generated by notebooks/create-mappings.ipynb
+	('51620', ['51093', '51175']),
+	('51685', ['51153']),
+	('28039', ['28059', '28041', '28131', '28045', '28059', '28109',
+                    '28047']),
+	('51690', ['51089', '51067']),
+	('51595', ['51081', '51025', '51175', '51183']),
+	('51600', ['51059', '51059', '51059']),
+	('51580', ['51005']),
+	('51678', ['51163']),
+    ]
+```
@@ -0,0 +1,62 @@
+# Google Symptoms
+
+We import the confirmed case and deaths data from the Google Research's
+Open COVID-19 Data project and export the county-level and state-level data
+as-is.  For detailed information see the files `DETAILS.md` contained
+in this directory.
+
+## Running the Indicator
+
+The indicator is run by directly executing the Python module contained in this
+directory. The safest way to do this is to create a virtual environment,
+installed the common DELPHI tools, and then install the module and its
+dependencies. To do this, run the following code from this directory:
+
+```
+python -m venv env
+source env/bin/activate
+pip install ../_delphi_utils_python/.
+pip install .
+```
+
+All of the user-changable parameters are stored in `params.json`. (NOTE: In
+production we specify `"export_start_date": "latest",`). To execute the module
+and produce the output datasets (by default, in `receiving`), run the following.
+
+```
+env/bin/python -m delphi_google_symptoms
+```
+
+Once you are finished with the code, you can deactivate the virtual environment
+and (optionally) remove the environment itself.
+
+```
+deactivate
+rm -r env
+```
+
+## Testing the code
+
+To do a static test of the code style, it is recommended to run **pylint** on
+the module. To do this, run the following from the main module directory:
+
+```
+env/bin/pylint delphi_google_symptoms
+```
+
+The most aggressive checks are turned off; only relatively important issues
+should be raised and they should be manually checked (or better, fixed).
+
+Unit tests are also included in the module. To execute these, run the following
+command from this directory:
+
+```
+(cd tests && ../env/bin/pytest --cov=delphi_google_symptoms --cov-report=term-missing)
+```
+
+The output will show the number of unit tests that passed and failed, along
+with the percentage of code covered by the tests. None of the tests should
+fail and the code lines that are not covered by unit tests should be small and
+should not include critical sub-routines.
+
+- Jenkins test #1
@@ -0,0 +1,39 @@
+## Code Review (Python)
+
+A code review of this module should include a careful look at the code and the
+output. To assist in the process, but certainly not in replace of it, please
+check the following items.
+
+**Documentation**
+
+- [ ] the README.md file template is filled out and currently accurate; it is
+possible to load and test the code using only the instructions given
+- [ ] minimal docstrings (one line describing what the function does) are
+included for all functions; full docstrings describing the inputs and expected
+outputs should be given for non-trivial functions
+
+**Structure**
+
+- [ ] code should use 4 spaces for indentation; other style decisions are
+flexible, but be consistent within a module
+- [ ] any required metadata files are checked into the repository and placed
+within the directory `static`
+- [ ] any intermediate files that are created and stored by the module should
+be placed in the directory `cache`
+- [ ] final expected output files to be uploaded to the API are placed in the
+`receiving` directory; output files should not be committed to the respository
+- [ ] all options and API keys are passed through the file `params.json`
+- [ ] template parameter file (`params.json.template`) is checked into the
+code; no personal (i.e., usernames) or private (i.e., API keys) information is
+included in this template file
+
+**Testing**
+
+- [ ] module can be installed in a new virtual environment
+- [ ] pylint with the default `.pylint` settings run over the module produces
+minimal warnings; warnings that do exist have been confirmed as false positives
+- [ ] reasonably high level of unit test coverage covering all of the main logic
+of the code (e.g., missing coverage for raised errors that do not currently seem
+possible to reach are okay; missing coverage for options that will be needed are
+not)
+- [ ] all unit tests run without errors
@@ -0,0 +1,12 @@
+# -*- coding: utf-8 -*-
+"""Module to pull and clean indicators from the Google Research's Open
+   COVID-19 Data project.
+This file defines the functions that are made public by the module. As the
+module is intended to be executed though the main method, these are primarily
+for testing.
+"""
+
+from __future__ import absolute_import
+
+from . import pull
+from . import run
@@ -0,0 +1,11 @@
+# -*- coding: utf-8 -*-
+"""Call the function run_module when executed.
+
+This file indicates that calling the module (`python -m MODULE_NAME`) will
+call the function `run_module` found within the run.py file. There should be
+no need to change this template.
+"""
+
+from .run import run_module  # pragma: no cover
+
+run_module()  # pragma: no cover