Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
d780d76
First draft of sales script
wrridgeway Jun 6, 2024
00909fd
File renaming
wrridgeway Jun 6, 2024
2ac5982
Cleaner for loop
wrridgeway Jun 6, 2024
2107d2a
First draft taxes and exemptions table
wrridgeway Jun 11, 2024
c56aaaf
Wrap assessment_roll
wrridgeway Jun 12, 2024
6c81308
Correct size, count calculations
wrridgeway Jun 13, 2024
1bf9b9c
Wrap sales table
wrridgeway Jun 18, 2024
0a9e1f3
Correct stage grouping, counting
wrridgeway Jun 18, 2024
030a7c5
Fix assessment roll stage grouping
wrridgeway Jun 18, 2024
1c2adae
Clean output before writing
wrridgeway Jun 18, 2024
672bd1e
Begin dbt building
wrridgeway Jun 18, 2024
0c42e23
Merge branch 'master' into 387-reporting-sot
wrridgeway Jun 18, 2024
3f60a77
Attempt to build assessment_roll table
wrridgeway Jun 20, 2024
fdff457
Testing build on smaller input
wrridgeway Jun 20, 2024
6abd074
Trying to build on limited sample
wrridgeway Jun 20, 2024
fd342b6
Try to build sales table
wrridgeway Jun 20, 2024
cccf8e1
Try to build taxes and exemptions table
wrridgeway Jun 20, 2024
3656964
Try to build taxes and exemptions table
wrridgeway Jun 20, 2024
8b0f95f
Try to build taxes table
wrridgeway Jun 20, 2024
9383bdc
Try to build ratio stats table
wrridgeway Jun 20, 2024
08d3bd6
Add assesspy to ratio_stats table
wrridgeway Jun 20, 2024
d2cac22
ratio_stats builds in dbt, excluding assesspy funcs
wrridgeway Jun 24, 2024
f559753
sot_ratio_stats table building in dbt
wrridgeway Jun 26, 2024
1f8ad1f
Add res_other group
wrridgeway Jun 26, 2024
063591c
Add reassessment year indicator for assessment roll
wrridgeway Jun 27, 2024
a9ffc64
Retry assessment_year indicator
wrridgeway Jun 27, 2024
62dd68e
Assessment_roll should run with reassessment year indicator
wrridgeway Jun 28, 2024
c185e81
Add schema to assessment_roll table
wrridgeway Jun 28, 2024
d08bc3d
Correct output from sales and taxes tables
wrridgeway Jun 28, 2024
4808aa4
Add table schemas
wrridgeway Jun 28, 2024
08c8d53
Fix schemas
wrridgeway Jun 28, 2024
2f8dc3d
Resolve sales table column type issues
wrridgeway Jul 1, 2024
88ce049
Add exe_total to exemptions table
wrridgeway Jul 2, 2024
271576d
Add more ratio stats
wrridgeway Jul 2, 2024
c39a2d8
Clean sales table columns
wrridgeway Jul 2, 2024
20c9bd6
Clean taxes table columns
wrridgeway Jul 3, 2024
adc16ea
Clean assessment_roll columns
wrridgeway Jul 7, 2024
f8b87ab
Fix delta columns
wrridgeway Jul 7, 2024
54ebab8
Clean ratio table columns
wrridgeway Jul 7, 2024
d2dddab
Attempt to fix pin_n_tot type error that doesn't trigger locally
wrridgeway Jul 7, 2024
00e790c
Try again to fix pin_n_tot
wrridgeway Jul 7, 2024
408de56
Change ass roll sample to be able to compare across stages
wrridgeway Jul 7, 2024
fd95fcb
Add commenting for input tables, try to partion assessment_roll table
wrridgeway Jul 7, 2024
f296292
Comment python scripts
wrridgeway Jul 7, 2024
a23ff72
Clean up ratio_stats script
wrridgeway Jul 8, 2024
07f6dfe
Back to fixing pin_n_tot
wrridgeway Jul 8, 2024
b78a072
Replace nan with None
wrridgeway Jul 8, 2024
337954e
Partition input tables by year
wrridgeway Jul 8, 2024
1031144
Fix year partitioning
wrridgeway Jul 8, 2024
45ea305
Use double for nullable columns
wrridgeway Jul 8, 2024
ca139f3
Move data year specification to dbt seed
wrridgeway Jul 9, 2024
788f971
Formatting
wrridgeway Jul 9, 2024
4ea6718
Merge branch 'master' into 387-reporting-sot
wrridgeway Jul 9, 2024
5449d8c
Improve diff and pct_change syntax
wrridgeway Jul 9, 2024
c87713f
Simplify reassessment year syntax
wrridgeway Jul 9, 2024
d1079f0
More commenting
wrridgeway Jul 10, 2024
b4316b2
Merge branch 'master' into 387-reporting-sot
wrridgeway Mar 18, 2025
28ba90c
Lint
wrridgeway Mar 18, 2025
cb50a51
Clean up
wrridgeway Mar 18, 2025
978ad93
Use new assesspy inputs
wrridgeway Mar 18, 2025
a471a42
Update assesspy version
wrridgeway Mar 18, 2025
d28f02c
Add back documentation
wrridgeway Mar 18, 2025
f8258cf
Improve documentation
wrridgeway Mar 19, 2025
e213a32
Add outlier sales filtering
wrridgeway Mar 19, 2025
1c8f1b3
Count outlier sales
wrridgeway Mar 19, 2025
d26fed0
Exclude outliers from sales char stats
wrridgeway Mar 19, 2025
91c5040
Clarify bldg and land sf
wrridgeway Mar 19, 2025
c3cc7ba
Improve schema declaration
wrridgeway Mar 19, 2025
c8230ea
Update schema declarations
wrridgeway Mar 19, 2025
34010b8
Merge branch 'master' into 387-reporting-sot
wrridgeway Mar 25, 2025
023c341
Store testing
wrridgeway Mar 26, 2025
57e4cc3
Remove test script
wrridgeway Apr 30, 2025
68d8c2d
Merge branch 'master' into 387-reporting-sot
wrridgeway Apr 30, 2025
b0d9ad7
Test script back to working
wrridgeway May 1, 2025
b7ee5bd
Merge branch 'master' into 387-reporting-sot
wrridgeway Jun 12, 2025
f056835
Temp changes
wrridgeway Jun 17, 2025
744ddd6
Everything but delta cols
wrridgeway Jun 18, 2025
ae79bc3
Remove vestigial objects
wrridgeway Jun 18, 2025
080d5b7
Simplify schema creation
wrridgeway Jun 18, 2025
de27eb8
Aggregate spark dfs
wrridgeway Jun 18, 2025
98f1bed
Merge branch 'master' into 387-reporting-sot
wrridgeway Jun 18, 2025
214ec7a
Remove temp limit on ass roll table
wrridgeway Jun 18, 2025
26f00e3
Try table build with spark
wrridgeway Jun 18, 2025
2509e40
Remove old table, rerun build to gen error log
Jun 23, 2025
23c6fb8
Debugging input pyspark dataframe
Jun 23, 2025
763915e
Pass geography to aggregate
wrridgeway Jun 23, 2025
29c15a1
Reduce input size to test runner memory limits
wrridgeway Jun 24, 2025
b99dd24
Really reduce input size
wrridgeway Jun 24, 2025
76a7df5
Further reduce input size
wrridgeway Jun 24, 2025
8fa0e4e
Try a really small input
wrridgeway Jun 24, 2025
8f1ee19
Change int type for pyarrow
wrridgeway Jun 25, 2025
b8bdf39
Try coercing expected string columns
wrridgeway Jun 25, 2025
f33a2e4
Remove string coersion for output table
wrridgeway Jun 26, 2025
2264e1d
Try to increase max driver result for spark session
wrridgeway Jun 26, 2025
dfb7d1d
Change spark driver config access
wrridgeway Jun 26, 2025
2323aea
One more driver attempt
wrridgeway Jun 26, 2025
8cfc713
Try new engine config
wrridgeway Jun 26, 2025
db21c18
Test smaller amount of collection
wrridgeway Jun 26, 2025
e6681fe
Remove config without permission
wrridgeway Jun 26, 2025
34fa863
Test using entire input
wrridgeway Jun 27, 2025
9b5d48f
Revert for now
wrridgeway Jun 27, 2025
e4aeb0e
Remove limit again for testing
wrridgeway Jun 28, 2025
e146066
Attempt to collect more often
wrridgeway Jun 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dbt/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,5 +78,7 @@ seeds:
+schema: location
model:
+schema: model
reporting:
+schema: reporting
spatial:
+schema: spatial
64 changes: 64 additions & 0 deletions dbt/models/reporting/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,70 @@ for every possible geography and reporting group combination.
**Primary Key**: `pin`, `year`
{% enddocs %}

# sot_assessment_roll
{% docs table_sot_assessment_roll %}
Aggregated summary stats of assessed values across a number of geographies,
class combinations, and time.

**Primary Key**: `year`, `stage_name`, `geography_id`, `group_id`
{% enddocs %}

# sot_assessment_roll_input
{% docs table_sot_assessment_roll_input %}
Table to feed the Python dbt job that creates the
`reporting.sot_assessment_roll` table. Feeds public reporting assets.

**Primary Key**: `year`, `stage_name`, `geography_id`, `group_id`
{% enddocs %}

# sot_ratio_stat
{% docs table_sot_ratio_stat %}
Aggregated summary stats of sales ratios across a number of geographies, class
combinations, and time.

**Primary Key**: `year`, `stage_name`, `geography_id`, `group_id`
{% enddocs %}

# sot_ratio_stat_input
{% docs table_sot_ratio_stat_input %}
Table to feed the Python dbt job that creates the
`reporting.sot_ratio_stats` table. Feeds public reporting assets.

**Primary Key**: `year`, `stage_name`, `geography_id`, `group_id`
{% enddocs %}

# sot_sale
{% docs table_sot_sale %}
Aggregated summary stats of sales across a number of geographies, class
combinations, and time.

**Primary Key**: `year`, `geography_id`, `group_id`
{% enddocs %}

# sot_sale_input
{% docs table_sot_sale_input %}
Table to feed the Python dbt job that creates the
`reporting.sot_sale` table. Feeds public reporting assets.

**Primary Key**: `year`, `geography_id`, `group_id`
{% enddocs %}

# sot_taxes_exemptions
{% docs table_sot_taxes_exemptions %}
Aggregated summary stats of taxes and exemptions data across a number of
geographies, class combinations, and time.

**Primary Key**: `year`, `geography_id`, `group_id`
{% enddocs %}

# sot_taxes_exemptions_input
{% docs table_sot_taxes_exemptions_input %}
Table to feed the Python dbt job that creates the
`reporting.sot_taxes_exemptions` table. Feeds public reporting assets.

**Primary Key**: `year`, `geography_id`, `group_id`
{% enddocs %}

# vw_assessment_roll

{% docs view_vw_assessment_roll %}
Expand Down
165 changes: 165 additions & 0 deletions dbt/models/reporting/reporting.sot_assessment_roll.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# This script generates aggregated summary stats on assessed values across a
# number of geographies, class combinations, and time.

# Import libraries

import pandas as pd
from pyspark.sql.functions import lit


# Define aggregation functions. These are just wrappers for basic python
# functions that make using them easier to use with pandas.agg().
def q10(x):
return x.quantile(0.1)


def q25(x):
return x.quantile(0.25)


def q75(x):
return x.quantile(0.75)


def q90(x):
return x.quantile(0.9)


def first(x):
if len(x) >= 1:
output = x.iloc[0]
else:
output = None

return output


def reassessment_year(year, geography, triad):
if geography in ["triad", "township", "nbhd"]:
year = int(year) % 3

if (
((year == 0) & (triad == "North"))
| ((year == 1) & (triad == "South"))
| ((year == 2) & (triad == "City"))
):
out = "Yes"
else:
out = "No"
else:
out = ""

return out


def aggregate_geography(geography):
def aggregate(key, pdf):
columns = ["av_tot", "av_bldg", "av_land"]

out = ()
out += (
reassessment_year(pdf["year"][0], geography, pdf["triad"][0]),
first(pdf[years[geography]]),
len(pdf["av_tot"]),
pdf["av_tot"].count(),
pdf["av_tot"].count() / pdf["av_tot"].size,
)
for column in columns:
out += (
pdf[column].min(),
q10(pdf[column]),
q25(pdf[column]),
pdf[column].median(),
q75(pdf[column]),
q90(pdf[column]),
pdf[column].max(),
pdf[column].mean(),
pdf[column].sum(),
)

return pd.DataFrame([key + out])

return aggregate


groups = [
"res_other",
"major_class",
"no_group",
"class",
"modeling_group",
]

years = {
"county": "year",
"triad": "year",
"township": "year",
"nbhd": "year",
"tax_code": "year",
"zip_code": "year",
"community_area": "community_area_data_year",
"census_place": "census_data_year",
"census_tract": "census_data_year",
"census_congressional_district": "census_data_year",
"census_zcta": "census_data_year",
"cook_board_of_review_district": "cook_board_of_review_district_data_year",
"cook_commissioner_district": "cook_commissioner_district_data_year",
"cook_judicial_district": "cook_judicial_district_data_year",
"ward_num": "ward_data_year",
"police_district": "police_district_data_year",
"school_elementary_district": "school_data_year",
"school_secondary_district": "school_data_year",
"school_unified_district": "school_data_year",
"tax_municipality": "tax_data_year",
"tax_park_district": "tax_data_year",
"tax_library_district": "tax_data_year",
"tax_fire_protection_district": "tax_data_year",
"tax_community_college_district": "tax_data_year",
"tax_sanitation_district": "tax_data_year",
"tax_special_service_area": "tax_data_year",
"tax_tif_district": "tax_data_year",
"central_business_district": "central_business_district_data_year",
}

geographies = list(years.keys())
geographies = [
geographies[0]
] # For testing purposes, only use the first geography

output_schema = "stage_name string, group_id string, geography_id string, year string, reassessment_year string, geography_data_year string, pin_n_tot bigint, pin_n_w_value bigint, pin_pct_w_value double, min_av_tot double, q10_av_tot double, q25_av_tot double, median_av_tot double, q75_av_tot double, q90_av_tot double, max_av_tot double, mean_av_tot double, sum_av_tot double, min_av_bldg double, q10_av_bldg double, q25_av_bldg double, median_av_bldg double, q75_av_bldg double, q90_av_bldg double, max_av_bldg double, mean_av_bldg double, sum_av_bldg double, min_av_land double, q10_av_land double, q25_av_land double, median_av_land double, q75_av_land double, q90_av_land double, max_av_land double, mean_av_land double, sum_av_land double"


def model(dbt, spark_session):
dbt.config(
materialized="table",
engine_config={
"MaxConcurrentDpus": 40,
},
)

athena_user_logger.info("Loading assessment roll input table")

input = dbt.ref("reporting.sot_assessment_roll_input")

athena_user_logger.info("Dope stuff is happening... maybe?")

output = []
for group in groups:
for geography in geographies:
output += [
input.groupby(["stage_name", group, geography, "year"])
.applyInPandas(
aggregate_geography(geography),
schema=output_schema,
)
.select(
"*",
lit(group).alias("group_type"),
lit(geography).alias("geography_type"),
)
.toPandas()
]

df = pd.concat(output)

return df
104 changes: 104 additions & 0 deletions dbt/models/reporting/reporting.sot_assessment_roll_input.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
-- This script gathers parcel-level geographies and joins them to values and
-- class groupings. Its sole purpose is to feed reporting.sot_assessment_roll,
-- and should not be used otherwise.
{{
config(
materialized='table',
partitioned_by=['year']
)
}}

/* Ensure every municipality/class/year has a row for every stage through
cross-joining. This is to make sure that combinations that do not yet
exist in iasworld.asmt_all for the current year will exist in the view, but have
largely empty columns. For example: even if no class 4s in the City of Chicago
have been mailed yet for the current assessment year, we would still like an
empty City of Chicago/class 4 row to exist for the mailed stage. */
WITH stages AS (

SELECT 'MAILED' AS stage_name
UNION
SELECT 'ASSESSOR CERTIFIED' AS stage_name
UNION
SELECT 'BOR CERTIFIED' AS stage_name

),

-- Universe of all parcels as defined by iasworld.pardat, expanded with
-- assessment stages.
uni AS (
SELECT
vw_pin_universe.*,
stages.*
FROM {{ ref('default.vw_pin_universe') }}
CROSS JOIN stages
)

SELECT
uni.stage_name,
uni.class,
CAST(vals.tot AS INT) AS av_tot,
CAST(vals.bldg AS INT) AS av_bldg,
CAST(vals.land AS INT) AS av_land,
'Cook' AS county,
uni.triad_name AS triad,
uni.township_name AS township,
uni.nbhd_code AS nbhd,
uni.tax_code,
uni.zip_code,
uni.chicago_community_area_name AS community_area,
uni.census_place_geoid AS census_place,
uni.census_tract_geoid AS census_tract,
uni.census_congressional_district_geoid
AS
census_congressional_district,
uni.census_zcta_geoid AS census_zcta,
uni.cook_board_of_review_district_num AS cook_board_of_review_district,
uni.cook_commissioner_district_num AS cook_commissioner_district,
uni.cook_judicial_district_num AS cook_judicial_district,
uni.ward_num,
uni.chicago_police_district_num AS police_district,
uni.school_elementary_district_geoid AS school_elementary_district,
uni.school_secondary_district_geoid AS school_secondary_district,
uni.school_unified_district_geoid AS school_unified_district,
ARRAY_JOIN(uni.tax_municipality_name, ', ') AS tax_municipality,
ARRAY_JOIN(uni.tax_park_district_name, ', ') AS tax_park_district,
ARRAY_JOIN(uni.tax_library_district_name, ', ') AS tax_library_district,
ARRAY_JOIN(uni.tax_fire_protection_district_name, ', ')
AS tax_fire_protection_district,
ARRAY_JOIN(uni.tax_community_college_district_name, ', ')
AS
tax_community_college_district,
ARRAY_JOIN(uni.tax_sanitation_district_name, ', ')
AS tax_sanitation_district,
ARRAY_JOIN(uni.tax_special_service_area_name, ', ')
AS tax_special_service_area,
ARRAY_JOIN(uni.tax_tif_district_name, ', ') AS tax_tif_district,
uni.econ_central_business_district_num AS central_business_district,
uni.census_data_year,
uni.cook_board_of_review_district_data_year,
uni.cook_commissioner_district_data_year,
uni.cook_judicial_district_data_year,
COALESCE(
uni.ward_chicago_data_year, uni.ward_evanston_data_year) AS
ward_data_year,
Comment thread
wrridgeway marked this conversation as resolved.
uni.chicago_community_area_data_year AS community_area_data_year,
uni.chicago_police_district_data_year AS police_district_data_year,
uni.econ_central_business_district_data_year
AS
central_business_district_data_year,
Comment thread
wrridgeway marked this conversation as resolved.
uni.school_data_year,
uni.tax_data_year,
'no_group' AS no_group,
class_dict.major_class_type AS major_class,
class_dict.modeling_group,
CASE WHEN class_dict.major_class_code = '2' THEN 'RES' ELSE 'OTHER' END
AS res_other,
uni.year
FROM uni
LEFT JOIN {{ ref('reporting.vw_pin_value_long') }} AS vals
ON uni.pin = vals.pin
AND uni.year = vals.year
AND uni.stage_name = vals.stage_name
LEFT JOIN {{ ref('ccao.class_dict') }}
ON uni.class = class_dict.class_code
Loading
Loading