A repo to coordinate scripts to link data in the RODIS project.
There are three main scripts in this repo:
- A script that connects authorized persons to a restricted UW Google Drive to facilitate data downloads (
download_from_google_drive.R), - A script that preps data for subsequent linkages in AWS Glue (
prep_for_glue.R), and - A helper script sourced by the other two scripts (
helpers.R).
NOTES:
- The
linkage_fieldsdirectory contains a list of fields that are used to streamline the unioning of source data across multiple years and schemas. - The
raw_datadirectory contains a file structure that is expected by the various scripts. Restricted data will be downloaded into these directories when the scripts are run by authorized persons. DO NOT remove the.gitignorefiles from these directories! If you don't know why, DO NOT push to this repo until you do. - The
prep_for_glue.Rscript only produces unioned files for parents and children pulled from birth and chars data. This logic should be repeated with Famlink and death data once we are satisfied with the matches across birth and chars data.