Skip to content

PTAXSIM 2024 data update #59

@kyrasturgill

Description

@kyrasturgill

2024 PTAXSIM Data List and Sources/Notes

Start with PIN and equalizer, which are the only data we have so far.

Data list from 2023 update #39 w/ notes on where sources will/might change due to ias world migration:

  • Tax code agency rates - Clerk (format could change)
  • Agency tax rates - Clerk (format could change)
  • CPI - IDOR (unchanged)
  • Equalization factors - IDOR (unchanged)
  • TIF data - Clerk (format could change)
  • PIN-level AV and exemptions - CCAO/Clerk (CLERKVALUES internal DB table) (new source from ias world)
  • PIN-level total tax bills - CCAO/Treasurer (TAXBILLAMOUNTS internal DB table) (new source from ias world)
  • Parcel geometries - Cook County GIS (unknown)

Additionally, unit/integration tests and update PTAXSIM vignettes:

  • Update vignettes with 2024 data (where necessary)
  • Update unit tests with 2024 data
  • Update README with new package/DB version
  • Bump DB and package version
  • Pull at least 10 real second-installment 2024 tax bills to use for integration test

Some work to account for changes to the tax bill process in 2024:

Misc nitpicks

  • exe_abate is named like an exemption but is actually an abatement. I've been told (but haven't verified) that the number stored is actually the dollar amount of tax savings (unlike the other exe columns, for which the number is the exemption EAV reduction). Let's confirm this is true and decide whether to address.
  • Update Authors config defined in the DESCRIPTION file and the pkgdown config file to include contact info for current maintainers

Database Release Checklist

  • Make sure all PRs are merged into the 2024-data-update branch
  • Make any necessary updates to the raw data. If necessary, force add the raw data files if they are ignored by git. Be sure to update .gitattributes such that the raw data files are tracked by git LFS
  • Run the raw data scripts (anything in data-raw/) that prepare and clean the data. These scripts will save the cleaned data to a staging area in S3. Ensure that the relevant S3 keys in the PTAXSIM bucket are updated using the AWS console or API
  • Inside data-raw/create_db.R, increment the db_version variable following the schema outlined above
  • If necessary, also increment the requires_pkg_version variable in data-raw/create_db.R
  • Increment the database versions in DESCRIPTION file:
    • Config/Requires_DB_Version: This is the minimum database version required for this version of the package. It should be incremented whenever there is a breaking change
    • Config/Wants_DB_Version: This is the maximum database version required for this version of the package. It is the version of the database pulled from S3 during CI/testing on GitHub
  • If necessary, be sure to update the SQL statements in data-raw/create_db.sql. These statements define the structure of the database
  • Run the database generation script data-raw/create_db.R. This will create the SQLite database file by pulling data from S3. The file will be generated in a temporary directory (usually /tmp/Rtmp...), then compressed using pbzip2 (required for this script)
  • Using the command line, grab the final compressed database file from the temporary directory (found at db_path after running data-raw/create_db.R) and move it to the project directory. Rename the file ptaxsim-<TAX_YEAR>.<MAJOR VERSION>.<MINOR VERSION>.db.bz2
  • Decompress the database file for local testing using pbzip2. The typical command will be something like pbzip2 -d -k ptaxsim-2021.0.2.db.bz2
  • Rename the decompressed local database file to ptaxsim.db for local testing. This is the file name that the unit tests and vignettes expect
  • Use sqldiff or a similar tool to compare the new database file to the previous version. Ensure that the changes are expected
  • Restart R. Then run the unit tests (devtools::test() in the console) and vignettes (pkgdown::build_site() in the console) locally
  • Knit the README.Rmd file to update the database link at the top of the README. The link is pulled from the ptaxsim.db file's metadata table
  • If necessary, update the database diagrams in the README with any new fields or tables
  • Move the compressed database file to S3 for public distribution. The typical command will be something like aws s3 mv ptaxsim-2021.0.2.db.bz2 s3://ccao-data-public-us-east-1/ptaxsim/ptaxsim-2021.0.2.db.bz2
  • Use the S3 console (or API) to make the database file public via an ACL
  • Push the code updates on GitHub. Wait for the resulting CI pipeline to finish
  • If there are no pipeline errors, merge 2024-data-update to master
    • ⚠️ Important note: While the default configuration for PRs in this repo is to squash merges, we should instead choose the "merge commit" option, so that we can preserve our PR history in the commit history of this repo.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions