Adds CAD database tables and related ETLs by johnclary · Pull Request #2013 · cityofaustin/vision-zero

johnclary · 2026-04-03T19:54:20Z

Associated issues

This PR brings CAD data into the database! We currently have a live daily extract of CAD files accumulating in our shared network drive at /mnt/vision_zero_cad on our 02 server. We've also been provided with backfill files containing ~200mb of records going back to 2015. We'll process those mega files post-go-live.

Todos outside the scope of this PR:

Create Airflow DAG for daily processing
Backfill incident records

Testing

URL to test: Local

testing

Tests involve adding and removing files in S3, so it would be good to give folks a heads-up in Slack to avoid conflicts.

Start your local stack and apply migrations and metadata.
Set up your env file by copying env_template and filling in the blanks. If you already have a .env file for the afd_ems_import ETL, you can use that (just make sure it uses dev variables, not prod)
Acquire local files. in AWS S3, navigate to atd-vision-zero/dev/cad_incidents/archive and download all of the files from this archive directory, saving them to the ./test_data directory in this repo.
Now we will test the incidents_to_s3.py script, starting with a dry run. This command uses the docker-compose.local override file to mount your ./test-data directory into the container's network drive mount point.

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_to_s3.py --dry-run

Confirm that the output lists files oldest to newest based on the timestamp in the filename, with the WithGroupID files always processed after the file without that string in the filename. The output should look something like this

2026-05-07 17:30:37,071 INFO DRY RUN enabled — no files will be uploaded or deleted.
2026-05-07 17:30:37,076 INFO Found 6 file(s) to process.
2026-05-07 17:30:37,076 INFO [DRY RUN] Would upload s3://atd-vision-zero/dev/cad_incidents/inbox/TPWCADTrafficSafetyDaily_20260505.CSV
2026-05-07 17:30:37,076 INFO [DRY RUN] Would upload s3://atd-vision-zero/dev/cad_incidents/inbox/TPWCADTrafficSafetyWithGroupIDDaily_20260505.CSV

Remove the --dry-run flag run and the script again:

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_to_s3.py

The output should look similar to the previous step and display the name of each file that was uploaded to S3. In the AWS console, navigate to the CAD incidents inbox (atd-vision-zero/dev/cad_incidents/inbox) and observe that files have been uploaded there. The timestamps shown in the Last modified column should all be very recent.

Run this script again add the --remove directive. This will cause the processed files to be deleted from your filesystem.

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_to_s3.py --remove

Confirm the files have been deleted from your ./test_data directory.

Moving on to incident_import.py, we will now process the files from the S3 inbox into our local database. Start with a dry run:

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_import.py --dry-run

Confirm that the output lists files oldest to newest based on the timestamp in the filename, with the WithGroupID files always processed after the file without that string in the filename. It should look something like this:

INFO:root:Running CAD incident import
INFO:root:Getting list of files in S3 inbox
INFO:root:6 S3 files to process
INFO:root:Downloading: dev/cad_incidents/inbox/TPWCADTrafficSafetyDaily_20260505.CSV
INFO:root:1,247 total records to upsert
INFO:root:Would upsert 1000
INFO:root:Would upsert 247
INFO:root:Downloading: dev/cad_incidents/inbox/TPWCADTrafficSafetyWithGroupIDDaily_20260505.CSV
INFO:root:1,034 total records to upsert
INFO:root:Would upsert 1000
INFO:root:Would upsert 34

Run the script again without the --dry-run flag:

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_import.py

Once the script completes, use your SQL client to inspect the records in our new tables. Verify that the location_id and in_austin_full_purpose columns are populating on the cad_incidents table.

select * from cad_incidents;
select * from cad_incident_groups;

Lastly, run the script again using the --archive flag.

docker compose -f docker-compose.yml -f docker-compose.local.yml run import incidents_import.py --archive

Use your SQl client to confirm that the records in the cad_incidents table now have different timestamps in the created_at and updated_at columns.

Head to the CAD incidents inbox (atd-vision-zero/dev/cad_incidents/inbox) and confirm that files have been removed form this directory, and that the files in the /archive directory have recent timestamps.

Ship list

Check migrations for any conflicts with latest migrations in main branch
Confirm Hasura role permissions for necessary access
Code reviewed
Product manager approved

netlify · 2026-04-03T19:54:39Z

✅ Deploy Preview for atd-vze-staging canceled.

Name	Link
🔨 Latest commit	`95f43bc`
🔍 Latest deploy log	https://app.netlify.com/projects/atd-vze-staging/deploys/69de722d5af967000831e558

johnclary · 2026-04-14T17:05:46Z


    # Query the view definition and append to the file
-    run_psql -v ON_ERROR_STOP=1 -A -t -c "SELECT 'CREATE OR REPLACE VIEW ' || '$VIEW_NAME' || ' AS ' || pg_get_viewdef('$VIEW_NAME'::regclass, true);" >> database/views/$VIEW_NAME.sql
+    run_psql -v ON_ERROR_STOP=1 -A -t -c "SELECT 'CREATE OR REPLACE VIEW ' || '$VIEW_NAME' || ' AS' || chr(10) || pg_get_viewdef('$VIEW_NAME'::regclass, true);" >> database/views/$VIEW_NAME.sql


These changes ensure that we always have a newline character after the AS portion of the view output.

It seems as far as I can tell that sqruff does not have a setting for this controlling this, and it was leading to the bot making repeated changes to view files.

To follow along, see:

19198ca, the bot's first commit to this PR, in which it undid changes to views that only it had touched.

95f43bc, the bot reverting most of those changes after I introduced the newline (chr(10)) change above

My hope is that after this commit goes through we will stop seeing so much noise from the bot 🤞

netlify · 2026-05-01T20:10:57Z

✅ Deploy Preview for atd-vze-staging canceled.

Name	Link
🔨 Latest commit	`bb857e7`
🔍 Latest deploy log	https://app.netlify.com/projects/atd-vze-staging/deploys/69fe028d5a78440008c0830f

johnclary · 2026-05-07T19:02:58Z

+    UNIQUE (incident_group_id, master_incident_id)
+);
+
+comment on table cad_incident_groups is 'Table showing linkages of CAD incident groups, which are a poorly understood grouping of related incidents generated by the CAD system. Not all incident IDs in this table will have a corresponding record in the cad_incidents table, because some of those incidents may fall outside the scope of our CAD data query (which includes crash related records only)';


TBD how useful this table will be. I still don't understand the how/when/why the CAD system generates these incident groups.

It's further complicated by the fact that some master_incident_id's referenced here will not be in our database, because they related to a non-crash incident category. This is why I did not establish a foreign key relationship between cad_incident_groups and cad_incidents.

Interesting. At first I thought the comment on the table was also lifted from the APD public dataset and I thought "poorly understood" was oddly honest. I can't really picture what this table is, but it sounds like I am not alone in that.

johnclary · 2026-05-07T19:04:30Z

+
+CREATE INDEX idx_cad_incidents_response_date ON cad_incidents (response_date);
+
+comment ON TABLE cad_incidents IS 'This dataset contains information on both 911 calls (usually referred to as Calls for Service or Dispatched Incidents) and officer-initiated incidents related to traffic crashes as recorded in the Austin public safety Computer Aided Dispatch (CAD) system. Data is provided by the public safety enterprise data team after approval by AFD, ATCEMS, and APD';


I sourced most of the comments from APD's public dataset:
https://data.austintexas.gov/Public-Safety/APD-Computer-Aided-Dispatch-Incidents/22de-7rzg/explore

Charlie-Henry

On step 8:

Verify that the location_id and in_austin_full_purpose columns are populating on the cad_incidents table.

I did see some 129 records missing a location_id, I dunno if that is expected.

Everything else is pretty minor stuff/questions. I did have a small fix for the env_template for anyone testing this themselves. After that, everything worked well.

Charlie-Henry · 2026-05-07T21:30:28Z

I'm curious why we aren't retrieving the data directly from the warehouse? Sorry probably lost some context along the way.

Charlie-Henry · 2026-05-07T21:33:27Z

+        List: List of S3 object keys, sorted oldest to newest
+    """
+    prefix = f"{BUCKET_ENV}/cad_incidents/{subdir}"
+    response = s3_client.list_objects(


A thing I'm always reminded of that list_objects has a limit of 1k objects being returned. Maybe something to be aware of when doing the backfill of data.

Charlie-Henry · 2026-05-07T21:35:35Z

+    We expect object keys like:
+        - some/path/TPWCADTrafficSafetyWithGroupIDDaily_20260410.CSV
+        - some/path/TPWCADTrafficSafetyDaily_20260410.CSV


would it be possible we'd end up with files in our inbox not following this format?

Unintentionally, for sure. I noticed i forgot to uncomment the file name check in is_file_to_process() which is looking out for this.

we check is_file_to_process() when getting files locally, should we also check the file names from the s3 bucket?

Charlie-Henry · 2026-05-07T21:38:19Z

+    """
+    for row in data:
+        for source_key, target_key in cols_to_rename.items():
+            row[target_key] = row.pop(source_key)


If the csv file had a new column added we were not expecting I think this would not drop it?

Charlie-Henry · 2026-05-07T21:40:29Z

+
+We process two distinct files on a daily basis:
+
+- The CAD incident incident file, in which each row is a crash-related CAD incident responded to by AFD, EMS, or APD. Sample filename: `TPWCADTrafficSafetyDaily_20260410.CSV`


double incident

chiaberry

Tested and saw the files move the way I expected. When do you anticipate we would import the files directly from local vs uploading to s3 and then importing from there?

chiaberry · 2026-05-08T21:05:15Z

+            csv_content = download_file_s3(file_obj_key_or_path)
+            csv_content = download_file_s3(file_obj_key_or_path)


why is this called twice?

chiaberry · 2026-05-08T21:11:52Z

+    We expect object keys like:
+        - some/path/TPWCADTrafficSafetyWithGroupIDDaily_20260410.CSV
+        - some/path/TPWCADTrafficSafetyDaily_20260410.CSV


we check is_file_to_process() when getting files locally, should we also check the file names from the s3 bucket?

chiaberry · 2026-05-08T21:16:10Z

+        date_field_names (str[]): list of field names which hold date values to update
+        date_format (str): the format of the input date string, which will be use to parse the string
+            into a datetime object
+        tz (string): The IANA time zone name of the input time value. Defaluts to America/Chicago


two tiny typos. line 64 sting/string and Defaluts/defaults

chiaberry · 2026-05-08T21:19:33Z

+    )
+
+    if not files_todo:
+        raise Exception("No CAD files found in S3 inbox")


I think this would only happen if there were no files in the local_files_to_process since get_s3_files_todo throws an error if there are no files

add cad_incidents migrations and meta

b4f22d6

johnclary and others added 14 commits April 3, 2026 16:12

add location_id and in_austin_full_purpose

d3be8c1

add linkage table, indexes, etc

e2bc87b

update table tracking

a34bd03

incident import wip

668601f

remove link table from migration and add cad_incident_groups

46be374

add column comments and remove incident groups

33693ce

🤖 Export database views for john/27728-init-cad-table

19198ca

docker updates for data vol mount

18d0e5f

add incidents_to_s3 etl

17e964d

update incident import and include group handler

c81e27c

remove incident group handling

0817cc2

add WIP readme

6e680c9

add newline to view outputs

34cac67

🤖 Export database views for john/27728-init-cad-table

95f43bc

johnclary commented Apr 14, 2026

View reviewed changes

johnclary added 2 commits April 16, 2026 14:29

use a python entrypoint after all

0241fbb

still wip

e39a225

johnclary mentioned this pull request Apr 20, 2026

Establish data transfer process from public safety data warehouse to our ETL server cityofaustin/atd-data-tech#27719

Open

johnclary added 8 commits April 21, 2026 10:56

mount app dir in compose

4bcc7fd

skip group id file

9bcd7aa

wip..

71e9504

add skip_archive option to s3 uploader

215ceb2

add dry run and restore group ID file handler

060cc43

more wip readme

0ec6248

add cad_incident_groups back to migra

6d44879

update down migra

aa1f4cd

Merge branch 'main' into john/27728-init-cad-table

2fad9f1

johnclary changed the title ~~Adds cad_incidents table to the DB~~ Adds CAD data and related ETLs May 4, 2026

johnclary added 3 commits May 4, 2026 12:29

add support to import local file

030cea6

update docker ignore

27cae96

update dockerignore

1800f8a

johnclary changed the title ~~Adds CAD data and related ETLs~~ Adds CAD database tables and related ETLs May 5, 2026

johnclary added 11 commits May 7, 2026 13:56

remove app mount from docker compose

ab930dd

remove incidents_import and flip archive and remove flags

3371f11

remove unused config prop

b1f7e55

remove commented code

d63fbfd

update readme

a347a9a

add another chmod to dockerfile

d66bd0f

remove arg shortcut

4b81f8e

add shebang and fix typo

951495d

hardcode bucket name in template

c088e81

update readme

be788a8

remove commented code

ab1ed41

johnclary commented May 7, 2026

View reviewed changes

add docker ci

5c6a76e

johnclary requested review from Charlie-Henry, chiaberry, frankhereford, mateoclarke, mddilley, roseeichelmann and tillyw May 7, 2026 19:12

Charlie-Henry approved these changes May 7, 2026

View reviewed changes

johnclary added 2 commits May 8, 2026 11:34

fix env var in template

3d935c9

uncomment file name check

bb857e7

chiaberry approved these changes May 8, 2026

View reviewed changes


		CREATE INDEX idx_cad_incidents_response_date ON cad_incidents (response_date);

		comment ON TABLE cad_incidents IS 'This dataset contains information on both 911 calls (usually referred to as Calls for Service or Dispatched Incidents) and officer-initiated incidents related to traffic crashes as recorded in the Austin public safety Computer Aided Dispatch (CAD) system. Data is provided by the public safety enterprise data team after approval by AFD, ATCEMS, and APD';


		We process two distinct files on a daily basis:

		- The CAD incident incident file, in which each row is a crash-related CAD incident responded to by AFD, EMS, or APD. Sample filename: `TPWCADTrafficSafetyDaily_20260410.CSV`

		csv_content = download_file_s3(file_obj_key_or_path)
		csv_content = download_file_s3(file_obj_key_or_path)

Conversation

johnclary commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Associated issues

Testing

testing

Ship list

Uh oh!

netlify Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for atd-vze-staging canceled.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

netlify Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for atd-vze-staging canceled.

Uh oh!

johnclary May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Charlie-Henry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Charlie-Henry May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chiaberry left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnclary commented Apr 3, 2026 •

edited

Loading

netlify Bot commented Apr 3, 2026 •

edited

Loading

netlify Bot commented May 1, 2026 •

edited

Loading

johnclary May 7, 2026 •

edited

Loading

Charlie-Henry May 7, 2026 •

edited

Loading