diff --git a/DESCRIPTION b/DESCRIPTION index 274b04f..e7562df 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -3,7 +3,7 @@ Type: Package Title: Calculate Cook County Property Tax Bills and Simulate Scenarios Version: 1.1.0 Authors@R: c( - person(given = "Kyra", family = "Sturgill", email = "kyra.sturgill@cookcountyil.gov", role = c("aut", "cre")), + person(given = "Kyra", family = "Sturgill", email = "Assessor.Data@cookcountyil.gov", role = c("aut", "cre")), person(given = "Dan", family = "Snow", role = c("aut")), person(given = "Jean", family = "Cochrane", role = c("aut")), person(given = "Rob", family = "Ross", role = c("aut")), @@ -67,4 +67,4 @@ Remotes: paleolimbot/geoarrow, ropensci/tabulapdf Config/Requires_DB_Version: 2024.0.0 -Config/Wants_DB_Version: 2024.0.0-alpha.2 +Config/Wants_DB_Version: 2024.0.0-alpha.3 diff --git a/NEWS.md b/NEWS.md index 5734a69..1fa1171 100644 --- a/NEWS.md +++ b/NEWS.md @@ -41,11 +41,15 @@ The main changes that the Clerk and the Treasurer made in 2024 include: - General assistance funds - Road and bridge funds - Library funds - - To see the full list of agencies that changed in this manner in 2024, - run the following SQL query against the 2024 PTAXSIM database: + - To see the full list of agencies and funds that changed in this manner in + 2024, run the following SQL queries against the 2024 PTAXSIM database: ```sql -SELECT * FROM agency_info WHERE agency_change_24 +-- See all agencies that have changed to funds +SELECT * FROM agency_crosswalk; + +-- See the same change at the fund level +SELECT * FROM agency_fund_crosswalk; ``` - **Switched from three-digit to six-digit fund numbers to add a greater level @@ -96,25 +100,19 @@ database and functions to handle these changes in the source data. vignette](https://ccao-data.github.io/ptaxsim/articles/tifs.html), which we have updated to include a TIF counterfactual with data for tax year 2024. -- **Added new `agency_info.agency_*_24` columns to handle agencies that have - changed to funds in 2024**. You can use these columns to construct a crosswalk - to analyze agencies over time, even if they changed to a fund in 2024. - - The new columns include: - - `agency_info.agency_change_24` (boolean, required): Whether the agency's - number changed in 2024, due to becoming a fund. - - `agency_info.agency_num_24` (string, optional): The agency's new number - starting in 2024. Null if the agency number did not change in 2024. - - `agency_info.agency_name_24` (string, optional): The agency's name - starting in 2024. Null if the agency number did not change in 2024. +- **Added new tables `agency_crosswalk` and `agency_fund_crosswalk` to support + tracking agencies that have changed to funds in 2024**. You can use these + tables to analyze agencies over time, even if the Clerk switched to reporting + them as funds in 2024. - **How this change affects you**: If you maintain code that analyzes - agencies over time, and you want to update your code to include 2024 data, - you should use the `agency_info.agency_change_24` column to determine - whether the Clerk changed any of the agencies you analyze to funds in - 2024. If any of your agencies have changed to funds, you will need to use - the `agency_num_24` column to join pre- and post-2024 data. See [this - vignette](https://ccao-data.github.io/ptaxsim/articles/agencies.html) - for an example using the City of Chicago Library Fund to show how to - handle this type of change. + agencies or funds over time, and you want to update your code to include + 2024 data, you should use the crosswalk tables to determine whether the + Clerk changed any of the agencies or funds that interest you in 2024. If + any of your agencies or funds have changed, you will need to use + the `agency_num_final` and `fund_num_final` columns to join pre- and + post-2024 data. For an example using the City of Chicago Library Fund to + show how to handle this type of change, see the vignette [Tracking taxing + agency revenue over time](https://ccao-data.github.io/ptaxsim/articles/agencies.html). - **Added a new column `agency_fund.fund_type_num` to handle changing fund numbers in 2024**. In 2024, the Clerk changed their fund numbers so that they consist of six digits instead of three, and they are no longer @@ -205,8 +203,8 @@ database and functions to handle these changes in the source data. ([#77](https://github.com/ccao-data/ptaxsim/pull/77)). - **How this change affects you**: You should read the latest version of the vignette if you use PTAXSIM for TIF counterfactuals. -- **Added [a new - vignette](https://ccao-data.github.io/ptaxsim/articles/agencies.html) +- **Added a new vignette [Tracking taxing agency revenue over + time](https://ccao-data.github.io/ptaxsim/articles/agencies.html) to demonstrate the correct way to analyze agencies and funds over time given the 2024 change that switched some agencies to funds**. ([#84](https://github.com/ccao-data/ptaxsim/pull/84)). diff --git a/README.Rmd b/README.Rmd index 4d676f6..c3feb4d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -240,20 +240,22 @@ The PTAXSIM backend database contains cleaned data from the Cook County Clerk, T ### Data sources -| Table Name | Source Agency | Source Link | Ingest Script | Contains | -|------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|-------------------------------------------------------------------| -| agency | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district extensions, limits, and base EAV | -| agency_info | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district name, type, and subtype | -| agency_fund | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Funds and line-items that contribute to each district's extension | -| agency_fund_info | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Fund name and whether the fund is statutorily capped | -| cpi | IDOR | [History of CPI's Used for the PTELL](https://tax.illinois.gov/localgovernments/property/cpihistory.html) | [data-raw/cpi/cpi.R](data-raw/cpi/cpi.R) | CPI-U used to calculate PTELL limits | -| eq_factor | IDOR | Manually created from [IDOR press releases](https://tax.illinois.gov/research/press-releases-archive.html) | [data-raw/eq_factor/eq_factor.R](data-raw/eq_factor/eq_factor.R) | Equalization factor applied to AV to get EAV | -| pin | Clerk + Treasurer | CLERKVALUES and TAXBILLAMOUNTS internal SQL tables | [data-raw/pin/pin.R](data-raw/pin/pin.R) | PIN-level tax code, AV, and exemptions | -| tax_code | Clerk | [Tax Extension - Tax Code Agency Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/tax_code/tax_code.R](data-raw/tax_code/tax_code.R) | Crosswalk of tax codes by district | -| tif | Clerk | [TIF Reports - Cook County Summary Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF revenue, start year, and cancellation year | -| tif_crosswalk | Clerk | Manually created from TIF summary and distribution reports | [data-raw/tif/tif.R](data-raw/tif/tif.R) | Fix for data issue identified in #39 | -| tif_distribution | Clerk | [TIF Reports - Tax Increment Agency Distribution Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF EAV, frozen EAV, and distribution percentage by tax code | -| pin_tif_distribution | Clerk | [TIF Reports - Tax Increment Agency Distribution Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF EAV, frozen EAV, and distribution percentage by PIN | +| Table Name | Source Agency | Source Link | Ingest Script | Contains | +|-----------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|-------------------------------------------------------------------| +| agency | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district extensions, limits, and base EAV | +| agency_info | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district name, type, and subtype | +| agency_crosswalk | Clerk + imputed | [Tax Extension - Tax Code Agency Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Mapping to handle changes to agency numbers over time | +| agency_fund | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Funds and line-items that contribute to each district's extension | +| agency_fund_info | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Fund name and whether the fund is statutorily capped | +| agency_fund_crosswalk | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Mapping to handle changes to fund numbers over time | +| cpi | IDOR | [History of CPI's Used for the PTELL](https://tax.illinois.gov/localgovernments/property/cpihistory.html) | [data-raw/cpi/cpi.R](data-raw/cpi/cpi.R) | CPI-U used to calculate PTELL limits | +| eq_factor | IDOR | Manually created from [IDOR press releases](https://tax.illinois.gov/research/press-releases-archive.html) | [data-raw/eq_factor/eq_factor.R](data-raw/eq_factor/eq_factor.R) | Equalization factor applied to AV to get EAV | +| pin | Clerk + Treasurer | CLERKVALUES and TAXBILLAMOUNTS internal SQL tables | [data-raw/pin/pin.R](data-raw/pin/pin.R) | PIN-level tax code, AV, and exemptions | +| tax_code | Clerk | [Tax Extension - Tax Code Agency Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/tax_code/tax_code.R](data-raw/tax_code/tax_code.R) | Crosswalk of tax codes by district | +| tif | Clerk | [TIF Reports - Cook County Summary Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF revenue, start year, and cancellation year | +| tif_crosswalk | Clerk | Manually created from TIF summary and distribution reports | [data-raw/tif/tif.R](data-raw/tif/tif.R) | Fix for data issue identified in #39 | +| tif_distribution | Clerk | [TIF Reports - Tax Increment Agency Distribution Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF EAV, frozen EAV, and distribution percentage by tax code | +| pin_tif_distribution | Clerk | [TIF Reports - Tax Increment Agency Distribution Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF EAV, frozen EAV, and distribution percentage by PIN | ### Database diagram @@ -267,7 +269,7 @@ The PTAXSIM backend database contains cleaned data from the Cook County Clerk, T ## Notes and caveats -- PTAXSIM's tax year 2024 update required significant changes to the database and package. Please see the PTAXSIM [changelog](https://ccao-data.github.io/ptaxsim/news) for more details. +- PTAXSIM's tax year 2024 update required significant changes to the database and package. Please see the PTAXSIM [changelog](https://ccao-data.github.io/ptaxsim/news) for more details. - The per-district tax calculation using `tax_bill(simplify = TRUE)` for properties in transit TIFs do not match the amounts that the Treasurer reports on their tax bills. We believe the amounts we report are correct, however. See issues [#4](https://github.com/ccao-data/ptaxsim/issues/4) and [#56](https://github.com/ccao-data/ptaxsim/issues/56) for more information, as well as PR [#58](https://github.com/ccao-data/ptaxsim/pull/58). - Special Service Area (SSA) rates must be calculated manually when creating counterfactual bills. See issue [#3](https://github.com/ccao-data/ptaxsim/issues/3) for more information. - In rare instances, a TIF can have multiple `agency_num` identifiers (usually there's only one per TIF). The `tif_crosswalk` table determines what the "main" `agency_num` is for each TIF and pulls the name and TIF information using that identifier. See issue [GitLab #39](https://gitlab.com/ccao-data-science---modeling/packages/ptaxsim/-/issues/39) for more information. diff --git a/README.md b/README.md index 15b7ebe..db88a38 100644 --- a/README.md +++ b/README.md @@ -36,8 +36,8 @@ Table of Contents > installation](#database-installation) for details. > > [**Link to PTAXSIM -> database**](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2024.0.0-alpha.2.db.bz2) -> (DB version: 2024.0.0; Last updated: 2026-04-14 22:42:59) +> database**](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2024.0.0-alpha.3.db.bz2) +> (DB version: 2024.0.0; Last updated: 2026-04-30 15:24:09) PTAXSIM is an R package/database to approximate Cook County property tax bills. It uses real assessment, exemption, TIF, and levy data to @@ -172,7 +172,7 @@ database: 1. Download the compressed database file from the CCAO’s public S3 bucket. [Link - here](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2024.0.0-alpha.2.db.bz2). + here](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2024.0.0-alpha.3.db.bz2). 2. (Optional) Rename the downloaded database file by removing the version number, i.e. ptaxsim-2024.0.0.db.bz2 becomes `ptaxsim.db.bz2`. @@ -660,8 +660,10 @@ data was available in mid-2020. |----|----|----|----|----| | agency | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district extensions, limits, and base EAV | | agency_info | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district name, type, and subtype | +| agency_crosswalk | Clerk + imputed | [Tax Extension - Tax Code Agency Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Mapping to handle changes to agency numbers over time | | agency_fund | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Funds and line-items that contribute to each district’s extension | | agency_fund_info | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Fund name and whether the fund is statutorily capped | +| agency_fund_crosswalk | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Mapping to handle changes to fund numbers over time | | cpi | IDOR | [History of CPI’s Used for the PTELL](https://tax.illinois.gov/localgovernments/property/cpihistory.html) | [data-raw/cpi/cpi.R](data-raw/cpi/cpi.R) | CPI-U used to calculate PTELL limits | | eq_factor | IDOR | Manually created from [IDOR press releases](https://tax.illinois.gov/research/press-releases-archive.html) | [data-raw/eq_factor/eq_factor.R](data-raw/eq_factor/eq_factor.R) | Equalization factor applied to AV to get EAV | | pin | Clerk + Treasurer | CLERKVALUES and TAXBILLAMOUNTS internal SQL tables | [data-raw/pin/pin.R](data-raw/pin/pin.R) | PIN-level tax code, AV, and exemptions | diff --git a/data-raw/agency/agency.R b/data-raw/agency/agency.R index 1a08efb..fe1714d 100644 --- a/data-raw/agency/agency.R +++ b/data-raw/agency/agency.R @@ -28,12 +28,18 @@ remote_path_agency <- file.path( remote_path_agency_info <- file.path( remote_bucket, "agency_info", "part-0.parquet" ) +remote_path_agency_crosswalk <- file.path( + remote_bucket, "agency_crosswalk", "part-0.parquet" +) remote_path_agency_fund <- file.path( remote_bucket, "agency_fund", "part-0.parquet" ) remote_path_agency_fund_info <- file.path( remote_bucket, "agency_fund_info", "part-0.parquet" ) +remote_path_agency_fund_crosswalk <- file.path( + remote_bucket, "agency_fund_crosswalk", "part-0.parquet" +) # Get a list of all levy report spreadsheets file_names <- list.files( @@ -576,59 +582,91 @@ agency_info <- agency_info %>% ) ) -# Load 2024 tax code agency rate file to import legacy-new agency_num crosswalk -agency_legacy_cw <- +# Write both data sets to S3 +arrow::write_parquet( + x = agency %>% select(-agency_name), + sink = remote_path_agency, + compression = "zstd" +) +arrow::write_parquet( + x = agency_info, + sink = remote_path_agency_info, + compression = "zstd" +) + + +# agency_crosswalk ------------------------------------------------------------ + +agency_crosswalk <- openxlsx::read.xlsx( "data-raw/tax_code/2024-tax-code-agency-rate-file.xlsx" ) %>% set_names(snakecase::to_snake_case(names(.))) %>% + mutate(year = as.character(ty_2024)) %>% select( - agency_num_24 = agency, + year, agency_num = legacy_num, - authority_num = authority, - agency_name_24 = authority_name + agency_num_final = agency ) %>% - unique() %>% - # Account for error in Clerk's report which lists Village of Skokie Library - # Fund twice - filter(!(agency_num == "031170001" & agency_num_24 == "031170000")) - - -agency_info <- agency_info %>% - left_join(agency_legacy_cw, by = "agency_num") %>% - mutate( - agency_change_24 = coalesce(agency_num != agency_num_24, FALSE), - agency_num_24 = - ifelse(agency_change_24, - agency_num_24, - NA - ), - agency_name_24 = - ifelse(agency_change_24, - agency_name_24, - NA - ) + distinct() %>% + filter(agency_num != agency_num_final) %>% + # Account for error in Clerk's report which lists the Skokie Library fund + # levy in the legacy fund line. Because of this, it continues to follow the + # pre-2024 pattern for library fund reporting, so we need to remove it from + # the crosswalk + filter(!(agency_num == "031170001" & agency_num_final == "031170000")) + + +# agency_fund_crosswalk -------------------------------------------------------- + +changed_funds <- agency_fund_info %>% + left_join( + agency_info %>% + select(agency_num, agency_name, minor_type), + by = "agency_num" ) %>% - select( - agency_num, - agency_name, - agency_name_short, - agency_name_original, - major_type, - minor_type, - agency_num_24, - agency_name_24, - agency_change_24 + left_join(agency_crosswalk, by = "agency_num") %>% + filter(!is.na(agency_num_final)) %>% + mutate( + fund_num_final = case_when( + # Levy adjustments (408) have the same fund numbers across all years, so + # handle them separately + fund_type_num == "408" ~ fund_num, + minor_type == "LIBRARY" ~ paste0(fund_type_num, "001"), + minor_type == "GEN ASST" ~ paste0(fund_type_num, "002"), + minor_type == "INFRA" ~ paste0(fund_type_num, "003"), + minor_type == "HEALTH" & + str_detect(agency_name, "MENTAL") ~ paste0(fund_type_num, "004"), + minor_type == "HEALTH" & + str_detect(agency_name, "PUBLIC") ~ paste0(fund_type_num, "005") + ) + ) + +# Perform some quick data integrity checks +if (any(is.na(changed_funds$agency_name))) { + stop( + "Could not join fund to agency info when constructing agency_fund_crosswalk" ) +} +if (any(is.na(changed_funds$fund_num_final))) { + stop( + "Could not parse final fund number for some funds in agency_fund_crosswalk" + ) +} + +agency_fund_crosswalk <- changed_funds %>% + mutate(year = "2024") %>% + select(year, agency_num, agency_num_final, fund_num, fund_num_final) # Write both data sets to S3 arrow::write_parquet( - x = agency %>% select(-agency_name), - sink = remote_path_agency, + x = agency_crosswalk, + sink = remote_path_agency_crosswalk, compression = "zstd" ) + arrow::write_parquet( - x = agency_info, - sink = remote_path_agency_info, + x = agency_fund_crosswalk, + sink = remote_path_agency_fund_crosswalk, compression = "zstd" ) diff --git a/data-raw/create_db.R b/data-raw/create_db.R index 0ea9591..e4c42e7 100644 --- a/data-raw/create_db.R +++ b/data-raw/create_db.R @@ -36,17 +36,18 @@ db_send_queries <- function(conn, sql) { # Set the database version. This gets incremented manually whenever the database # changes. This is checked against Config/Requires_DB_Version in the DESCRIPTION # file via check_db_version(). Schema is: -# "MAX_YEAR_OF_DATA.MAJOR_VERSION.MINOR_VERSION-PRE_RELEASE_VERSION" +# "MAX_YEAR_OF_DATA.MAJOR_VERSION.MINOR_VERSION" db_version <- "2024.0.0" # Optional pre-release identifier. Informational only, not compared. # Set this to an empty string for a public release, or to a string like -# "alpha.1" for a release candidate -db_pre_release_version <- "alpha.2" +# "alpha.1" for a release candidate. Setting this to a non-empty string will +# append it to `db_version`, separated by a hyphen +db_pre_release_version <- "alpha.3" # Set the package version required to use this database. This is checked against # Version in the DESCRIPTION file. Basically, we have a two-way check so that # both the package version and DB version are synced. Schema is SemVer. -requires_pkg_version <- "0.6.0" +requires_pkg_version <- "1.1.0" # Create tables ---------------------------------------------------------------- @@ -125,6 +126,7 @@ DBI::dbAppendTable(conn, "metadata", metadata_df) # Load tables contained in a single file files <- c( "agency", "agency_info", "agency_fund", "agency_fund_info", + "agency_crosswalk", "agency_fund_crosswalk", "cpi", "eq_factor", "tif", "tif_crosswalk", "tif_distribution", "pin_tif_distribution" ) @@ -139,10 +141,15 @@ datasets <- c("pin", "tax_code") for (dataset in datasets) { message("Now loading: ", dataset) df <- collect(arrow::open_dataset(file.path(remote_bucket, dataset), - # Starting in 2024, there are some major changes regarding the columns - # that are present in these data files. That means we need to unify the - # schemas across files, since otherwise arrow will take the schema from - # the first file it finds in the dataset + # Starting in 2024, the data source for the `pin` table has changed. + # The current package maintainers do not have access to the old data source, + # which was a snapshot mirror of a mainframe system that has been + # decommissioned. To facilitate future updates, we copied over pre-2024 + # `pin` files without edits. These legacy files are missing some columns + # that we added in 2024, so we need to unify the schemas across files, since + # otherwise arrow will take the schema from the first file it finds in the + # dataset. In the future, we could also consider editing the old files to + # add empty values for the new columns unify_schemas = TRUE )) DBI::dbAppendTable(conn, dataset, df) diff --git a/data-raw/create_db.sql b/data-raw/create_db.sql index 59cb516..835cce4 100644 --- a/data-raw/create_db.sql +++ b/data-raw/create_db.sql @@ -46,9 +46,6 @@ CREATE TABLE agency_info ( agency_name_original varchar NOT NULL, major_type varchar(21) NOT NULL, minor_type varchar(10) NOT NULL, - agency_num_24 varchar(9) , - agency_name_24 varchar , - agency_change_24 boolean NOT NULL, PRIMARY KEY (agency_num) ) WITHOUT ROWID; @@ -56,6 +53,15 @@ CREATE INDEX ix_agency_info_major_type ON agency_info(major_type); CREATE INDEX ix_agency_info_minor_type ON agency_info(minor_type); +/** agency_crosswalk **/ +CREATE TABLE agency_crosswalk ( + year int NOT NULL, + agency_num varchar(9) NOT NULL, + agency_num_final varchar(9) NOT NULL, + PRIMARY KEY (agency_num) +) WITHOUT ROWID; + + /** agency_fund **/ CREATE TABLE agency_fund ( year int NOT NULL, @@ -94,6 +100,17 @@ CREATE TABLE agency_fund_info ( CREATE INDEX ix_agency_fund_info_capped_ind ON agency_fund_info(capped_ind); +/** agency_fund_crosswalk **/ +CREATE TABLE agency_fund_crosswalk ( + year int NOT NULL, + agency_num varchar(9) NOT NULL, + agency_num_final varchar(9) NOT NULL, + fund_num varchar(6) NOT NULL, + fund_num_final varchar(6) NOT NULL, + PRIMARY KEY (agency_num, fund_num) +) WITHOUT ROWID; + + /** cpi **/ CREATE TABLE cpi ( year int NOT NULL, diff --git a/inst/mermaid/er-diagram-big.mmd b/inst/mermaid/er-diagram-big.mmd index fd240a1..0983df5 100644 --- a/inst/mermaid/er-diagram-big.mmd +++ b/inst/mermaid/er-diagram-big.mmd @@ -32,9 +32,12 @@ erDiagram varchar agency_name_original varchar major_type varchar minor_type - varchar agency_num_24 - varchar agency_name_24 - boolean agency_change_24 + } + + agency_crosswalk { + int year + varchar agency_num PK + varchar agency_num_final } agency_fund { @@ -61,6 +64,14 @@ erDiagram boolean capped_ind } + agency_fund_crosswalk { + int year + varchar agency_num PK + varchar agency_num_final + varchar fund_num PK + varchar fund_num_final + } + cpi { int year double cpi @@ -186,7 +197,9 @@ erDiagram tax_code }|--|| agency : "has" tax_code ||--o| tif_distribution : "may have" agency ||--|{ agency_fund : "contains" + agency ||--o| agency_crosswalk : "may have" agency_info ||--|{ agency : "describes" + agency_fund ||--o| agency_fund_crosswalk : "may have" agency_fund_info ||--|{ agency_fund : "describes" tif ||--|| tif_crosswalk : "in" tif_distribution }|--|| tif_crosswalk : "in" diff --git a/inst/mermaid/er-diagram-small.mmd b/inst/mermaid/er-diagram-small.mmd index 169ceef..7ef764e 100644 --- a/inst/mermaid/er-diagram-small.mmd +++ b/inst/mermaid/er-diagram-small.mmd @@ -14,6 +14,12 @@ erDiagram varchar major_type } + agency_crosswalk { + int year + varchar agency_num PK + varchar agency_num_final + } + agency_fund { int year PK varchar agency_num PK @@ -29,6 +35,14 @@ erDiagram boolean capped_ind } + agency_fund_crosswalk { + int year + varchar agency_num PK + varchar agency_num_final + varchar fund_num PK + varchar fund_num_final + } + cpi { int year double cpi @@ -106,7 +120,9 @@ erDiagram tax_code }|--|| agency : "has" tax_code ||--o| tif_distribution : "may have" agency ||--|{ agency_fund : "contains" + agency ||--o| agency_crosswalk : "may have" agency_info ||--|{ agency : "describes" + agency_fund ||--o| agency_fund_crosswalk : "may have" agency_fund_info ||--|{ agency_fund : "describes" tif ||--|| tif_crosswalk : "in" tif_distribution }|--|| tif_crosswalk : "in" diff --git a/vignettes/agencies.Rmd b/vignettes/agencies.Rmd index 22b63c4..1c2affa 100644 --- a/vignettes/agencies.Rmd +++ b/vignettes/agencies.Rmd @@ -32,6 +32,7 @@ library(DT) library(here) library(ggplot2) library(ptaxsim) +library(stringr) library(tidyr) ptaxsim_db_conn <- DBI::dbConnect(RSQLite::SQLite(), here("./ptaxsim.db")) @@ -47,74 +48,126 @@ ptaxsim_db_conn <- DBI::dbConnect( ## Accounting for 2024 changes to agency fund reporting -Before we query the relevant agency data, we first will need to check if any of the agencies of interest were impacted by the Clerk's 2024 updates to the data structure. We added new fields to the `agency_info` table to document these changes - we'll query this table and select only the agencies where the field `agency_change_24 = TRUE`. +In 2024, the Clerk switched to reporting 78 agencies as funds underneath a separate agency. These agencies had always represented funds in the real world, but the Clerk reported them as independent taxing agencies prior to 2024. We need to account for this change when analyzing agencies and funds over time. + +The following types of funds were affected by this change: + +- Library funds +- General assistance funds +- Infrastructure funds (road and bridge) +- Mental health and public health funds +- Levy adjustments + +Most tax codes contain at least one of these types of agencies. + +### Using the agency crosswalk + +Before we query the relevant agency data for our analysis, we first will need to check if any of the agencies of interest changed to funds in 2024. To do this, we can query the `agency_crosswalk` table: ```{r} -# Query agency_info table for all agencies with the 2024 update -agency_cw_24 <- DBI::dbGetQuery( +agency_crosswalk <- DBI::dbGetQuery( ptaxsim_db_conn, " - SELECT * - FROM agency_info + SELECT agency_num, agency_num_final + FROM agency_crosswalk " -) %>% - select(agency_num, agency_name, agency_num_24, agency_name_24) +) -datatable(agency_cw_24 %>% - filter(!is.na(agency_num_24))) +datatable(agency_crosswalk) ``` -From this table it appears that the updated agencies are specific funds for various municipalities - the Clerk had previously reported certain municipal fund levies as separate taxing agencies. This updated data structure begins in 2024 while years prior to 2024 remain the same, meaning users will need to account for this discrepancy if ever analyzing these taxing agency data over time. +The crosswalk contains the following columns: -To make this process possible, we added new fields to the `agency_info` table in the PTAXSIM database which identify the agencies that have been folded into their parent agencies. `agency_num_24` and `agency_name_24` contain the agency info that the "sub-agency" has been merged into. With this table we can create a crosswalk, as we did above, calling it `agency_cw_24`. Note that the user can still see details about these former agencies, now funds, by querying the `agency_fund` table. +- **`year`**: The year that the agency changed + - This is not relevant to our analysis, so we exclude it from our query; however, it may be useful for other types of analysis +- **`agency_num`**: The agency number prior to the change, when the fund was reported as an independent agency +- **`agency_num_final`**: The agency number after the change, representing the agency under which the fund is now reported -A quick search of the crosswalk shows that the `CITY OF CHICAGO LIBRARY FUND` was previously defined as an independent agency and has now been folded into the `CITY OF CHICAGO`. This is in fact aligned with how the City of Chicago reports its property tax levy in its own budget [documentation](https://chicityclerk.s3.us-west-2.amazonaws.com/s3fs-public/O2023-0005291_Tax_Levy.pdf). When we eventually query the agency data for `CITY OF CHICAGO`, we'll want to include the `CITY OF CHICAGO LIBRARY FUND` as well. - -Next, we'll search the database table `agency_info` to find the right `agency_num` key assigned to the taxing agencies we want to learn more about. The table below displays all taxing agencies and TIFs with `CHICAGO` present in the `agency_name`. +Note that the user can still see details about these former agencies, now funds, by querying the `agency_fund` and `agency_fund_info` tables. Let's join the agency crosswalk to `agency_fund_info` to confirm that the `CITY OF CHICAGO LIBRARY FUND` was previously defined as an independent agency and has now been folded into the `CITY OF CHICAGO` agency: ```{r} -chi_agency_nums <- DBI::dbGetQuery( +agency_info <- DBI::dbGetQuery( ptaxsim_db_conn, " SELECT agency_num, agency_name FROM agency_info - WHERE agency_name LIKE '%CHICAGO%' " ) -datatable(chi_agency_nums) +agency_crosswalk_info <- agency_crosswalk %>% + left_join( + agency_info, + by = c("agency_num" = "agency_num") + ) %>% + rename(agency_name_prev = agency_name) %>% + left_join( + agency_info, + by = c("agency_num_final" = "agency_num") + ) %>% + rename(agency_name_final = agency_name) %>% + rename(agency_num_prev = agency_num) %>% + select( + agency_num_prev, agency_name_prev, + agency_num_final, agency_name_final + ) + +agency_crosswalk_info %>% + filter(str_detect(agency_name_prev, "CITY OF CHICAGO LIBRARY FUND")) %>% + datatable() ``` -Using the table above, we learn that the `agency_num` values for the agencies `CITY OF CHICAGO`, `CITY OF CHICAGO LIBRARY FUND`, `BOARD OF EDUCATION`, and `CHICAGO PARK DISTRICT` are `030210000`, `030210001`, `050200000`, and `044060000` respectively. We'll query all fields from the `agency` table and filter by their `agency_num`. +Now let's see how to use this crosswalk to track Chicago agencies over time. + +## Tracking Chicago taxing agencies over time + +To track Chicago taxing agencies over time, let's first filter `agency_info` to find the agency numbers for the agencies that we are interested in[^1]: + +[^1]: If you don't know the names of the agencies you are interested in, you can browse `agency_info.agency_name` to find them. ```{r} -chi_agencies <- DBI::dbGetQuery( +chi_agency_info <- agency_info %>% + filter( + agency_name %in% c( + "CITY OF CHICAGO", + "CITY OF CHICAGO LIBRARY FUND", + "BOARD OF EDUCATION", + "CHICAGO PARK DISTRICT" + ) + ) + +datatable(chi_agency_info) +``` + +Now that we have the agency numbers for the agencies we are interested in, we'll use them to extract levies and extensions[^2] from the `agency` table: + +[^2]: *Terminology note*: The **levy** (called `total_final_levy` in the database) is the amount of money a local government budgets for and requests to receive from property taxes. Many taxing agencies are subject to limits imposed by Illinois state law. The **extension** (called `total_ext` in the database) is the total, final amount a taxing body is allowed to receive which is calculated and confirmed by the Cook County Clerk. + +```{r} +agency <- DBI::dbGetQuery( ptaxsim_db_conn, " - SELECT DISTINCT * + SELECT * FROM agency - WHERE agency_num IN ('030210000', '030210001', '050200000', '044060000') " ) -``` -Before we do anything, we need to fold the `CITY OF CHICAGO LIBRARY FUND` into the `CITY OF CHICAGO` levy total for all years prior to 2024. We can do this by joining `agency_cw_24` by `agency_num` to `chi_agencies` and replacing the old `agency_num` with `agency_num_24`, then grouping by agency and year and then summing the fields `total_final_levy` and `total_ext`.[^1] +chi_agencies_raw <- chi_agency_info %>% + left_join(agency, by = "agency_num") -[^1]: *Terminology note*: The **levy** is the amount of money a local government budgets for and requests to receive from property taxes. Many taxing agencies are subject to limits imposed by State law. The **extension** (called `final_ext` in the database) is the total, final amount a taxing body is allowed to receive which is calculated and confirmed by the Cook County Clerk. +chi_agencies_raw %>% + select(year, agency_num, agency_name, total_final_levy, total_ext) %>% + datatable() +``` + +Before we do anything with these levies and extensions, we need to fold the `CITY OF CHICAGO LIBRARY FUND` into the `CITY OF CHICAGO` levy total for all years prior to 2024. We can do this using the agency crosswalk and a simple summation: ```{r} -chi_agencies <- chi_agencies %>% +chi_agencies <- chi_agencies_raw %>% # Join the agency crosswalk to get the parent agency number for pre-2024 years - left_join(agency_cw_24, "agency_num") %>% - # for the agencies that did have an agency number change in 2024, replace the + left_join(agency_crosswalk, "agency_num") %>% + # For the agencies that did have an agency number change in 2024, replace the # old agency_num with the new one - mutate( - agency_num = - ifelse(!is.na(agency_num_24), - agency_num_24, - agency_num - ) - ) %>% + mutate(agency_num = coalesce(agency_num_final, agency_num)) %>% group_by(year, agency_num) %>% summarize( total_final_levy = sum(total_final_levy), @@ -126,9 +179,9 @@ chi_agencies <- chi_agencies %>% Now that we have the correct total levies for the City of Chicago, Chicago Public Schools and Chicago Park District across all years, we can look at how those levies have changed from 2006 to 2024. -CPS and CPKD are both subject to PTELL (Property Tax Extension Law Limit), which ensures certain taxing agencies do not increase their levies beyond the rate of inflation (with some exceptions[^2]). The City of Chicago is not by virtue of being a non-home rule agency. However, the City of Chicago imposes [its own limits](https://codelibrary.amlegal.com/codes/chicago/latest/chicago_il/0-0-0-2608573) which mirror PTELL's and prohibit a taxing agency from increasing its levy more than the rate of inflation or 5%, whichever is less. +CPS and CPKD are both subject to PTELL (Property Tax Extension Law Limit), which ensures certain taxing agencies do not increase their levies beyond the rate of inflation (with some exceptions[^3]). The City of Chicago is not by virtue of being a non-home rule agency. However, the City of Chicago imposes [its own limits](https://codelibrary.amlegal.com/codes/chicago/latest/chicago_il/0-0-0-2608573) which mirror PTELL's and prohibit a taxing agency from increasing its levy more than the rate of inflation or 5%, whichever is less. -[^2]: PTELL, as well as Chicago's property tax limitation ordinance, contain loopholes that allow taxing agencies to [increase their levies beyond the rate of inflation](https://civicfed.org/press/new-report-finds-illinois-property-tax-cap-law-not-working-intended). Let's see how Chicago's levy has fared compared to inflation since 2006. +[^3]: PTELL, as well as Chicago's property tax limitation ordinance, contain loopholes that allow taxing agencies to [increase their levies beyond the rate of inflation](https://civicfed.org/press/new-report-finds-illinois-property-tax-cap-law-not-working-intended). Let's see how Chicago's levy has fared compared to inflation since 2006. To do so, we'll calculate the rate at which the levies have grown compared to the [CPI-U](https://www.bls.gov/cpi/). Fortunately CPI data, as reported by the Illinois Department of Revenue (IDOR) for purpose of PTELL calculatios, is available in the PTAXSIM data base in the `cpi` table. @@ -295,18 +348,16 @@ chi_levy_plot_2 <- ggplot(plot_2_df, aes(year, total_ext, group = agency_num)) + chi_levy_plot_2 ``` -## Agency fund data updates and query demo - -The PTAXSIM database also contains information related to taxing agency's property tax funds so we can understand in greater detail what they intend to spend their property tax revenue on. This information can be found in the `agency_fund` and `agency_fund_info` tables. This data is not utilized by any of PTAXSIM's functions, but it is available to be queried an analyzed. +## Tracking specific fund revenue over time -In 2024, the Cook County Clerk's agency fund identifier keys were updated, now showing a greater level of detail than in prior years. You can learn more about this change and how we account for it in the PTAXSIM database by reading out 2024 update [changelog](https://ccao-data.github.io/ptaxsim/news/). +The PTAXSIM database also contains information related to taxing agency's property tax funds so we can understand in greater detail what they intend to spend their property tax revenue on. This information can be found in the `agency_fund` and `agency_fund_info` tables. This data is not utilized by any of PTAXSIM's functions, but it is available to be queried and analyzed. To demonstrate working with the agency fund data, we will query the fund information for the `CITY OF CHICAGO` and the `CITY OF CHICAGO LIBRARY FUND`. The level of detail provided in `fund_name` varies across years and agencies which can make analysis or plotting the data tricky. To simplify `agency_fund` data further, we opted below to add broader categories to define funds. In this case, we'll tag any fund with "A & B" (Annuities and Benefits) in the name as a pension fund. We'll label funds related to bond and interest payments, as well as note redemption, as "Bond Payments". ```{r} -chi_agency_fund <- DBI::dbGetQuery( +chi_agency_fund_raw <- DBI::dbGetQuery( ptaxsim_db_conn, " SELECT * @@ -316,7 +367,7 @@ chi_agency_fund <- DBI::dbGetQuery( " ) -chi_agency_fund_info <- DBI::dbGetQuery( +chi_agency_fund_info_raw <- DBI::dbGetQuery( ptaxsim_db_conn, " SELECT * @@ -326,8 +377,8 @@ chi_agency_fund_info <- DBI::dbGetQuery( " ) -chi_agency_fund <- chi_agency_fund %>% - left_join(chi_agency_fund_info) %>% +chi_agency_fund <- chi_agency_fund_raw %>% + left_join(chi_agency_fund_info_raw) %>% # Remove funds that levy equal to 0 filter( final_levy > 0, @@ -462,4 +513,113 @@ fund_type_table %>% suffix = "%" ) ``` + +### Using the fund crosswalk to track funds over time + +The PTAXSIM database includes a table called `agency_fund_crosswalk` that allows you to track specific funds over time, similar to `agency_crosswalk`. If you know the specific agency and fund numbers that you want to analyze, you can use this crosswalk to handle the 2024 change that turned some agencies into funds. + +As an example, let's look at the City of Chicago Library Fund again. We know that the agency number for this fund prior to 2024 was `030210001`. Let's look at the pension fund specifically: + +```{r} +chi_library_pension_fund_num <- chi_agency_fund_raw %>% + filter(agency_num == "030210001") %>% + select(year, agency_num, fund_num) %>% + left_join(chi_agency_fund_info_raw, by = c("agency_num", "fund_num")) %>% + filter(str_detect(fund_name, "A & B")) %>% + select(year, agency_num, fund_num, fund_name) + +datatable( + chi_library_pension_fund_num, + options = list(pageLength = nrow(chi_library_pension_fund_num)) +) +``` + +This table shows us that the fund number for the library pension fund was `128000` prior to 2024. + +Let's use the agency fund crosswalk to handle the Clerk's 2024 change that switched from reporting the City of Chicago Library Fund as an independent agency to reporting it as a fund. We can query the agency fund crosswalk to see that it looks quite similar to `agency_crosswalk`: + +```{r} +agency_fund_crosswalk <- DBI::dbGetQuery( + ptaxsim_db_conn, + " + SELECT agency_num, agency_num_final, fund_num, fund_num_final + FROM agency_fund_crosswalk + " +) + +datatable(agency_fund_crosswalk) +``` + +The crosswalk contains the following columns, similar to `agency_crosswalk`: + +- **`year`**: The year that the fund changed + - This is not relevant to our analysis, so we exclude it from our query; however, it may be useful for other types of analysis +- **`agency_num`**: The agency number prior to the change, when the fund was reported as an independent agency +- **`agency_num_final`**: The agency number after the change, representing the agency under which the fund is now reported +- **`fund_num`**: The fund number prior to the change +- **`fund_num_final`**: The fund number after the change + +Using the agency fund crosswalk, we can find the final fund number for the pension fund: + +```{r} +chi_library_pension_agency_fund_crosswalk <- agency_fund_crosswalk %>% + filter(agency_num == "030210001", fund_num == "128000") + +datatable(chi_library_pension_agency_fund_crosswalk) +``` + +Then, we can track the pension fund across all years using its post-2024 agency and fund number. Note the change to the agency and fund numbers in 2024: + +```{r} +chi_library_pension_fund <- chi_agency_fund_raw %>% + # Left join to the full agency fund crosswalk by `agency_num` in order to + # resolve the final agency/fund numbers for all funds + left_join(agency_fund_crosswalk, by = c("agency_num", "fund_num")) %>% + mutate( + # In most use cases, it makes more sense to resolve the final agency number + # to the `agency_num` column. However, for the purposes of this demo, we + # want to show the original agency numbers, so we resolve the final number + # to `agency_num_final` instead + agency_num_final = coalesce(agency_num_final, agency_num), + fund_num_final = coalesce(fund_num_final, fund_num) + ) %>% + # Inner join to the version of the agency fund crosswalk that only contains + # the CPL pension fund so that we can filter for only that fund + inner_join( + chi_library_pension_agency_fund_crosswalk %>% + select(agency_num_final, fund_num_final), + by = c("agency_num_final", "fund_num_final") + ) %>% + select(year, agency_num, fund_num, final_levy) %>% + arrange(year) + +datatable( + chi_library_pension_fund, + options = list(pageLength = nrow(chi_library_pension_fund)) +) +``` + +Here's a chart showing the same data: + +```{r} +chi_library_pension_fund_plot <- chi_library_pension_fund %>% + ggplot(aes(x = as.integer(year), y = final_levy)) + + geom_line(linewidth = 0.5, color = "black") + + geom_point(color = "black") + + scale_x_continuous(n.breaks = 10) + + scale_y_continuous( + labels = scales::label_dollar(scale = 1e-6, suffix = "M"), + limits = c(5e6, NA) + ) + + labs( + x = NULL, + y = "Final Levy (Millions)" + ) + + theme_minimal(base_size = 12) + +chi_library_pension_fund_plot +``` + +## Conclusion + The code provided in this vignette are very simple tutorials on how to query taxing agency data from the PTAXSIM database. We hope access to a free and open data source that aggregates data from multiple disparate sources for the first time enables rigorous analysis of the Cook County property tax system!