Conversation
✅ Deploy Preview for atd-vze-staging canceled.
|
|
|
||
| # Query the view definition and append to the file | ||
| run_psql -v ON_ERROR_STOP=1 -A -t -c "SELECT 'CREATE OR REPLACE VIEW ' || '$VIEW_NAME' || ' AS ' || pg_get_viewdef('$VIEW_NAME'::regclass, true);" >> database/views/$VIEW_NAME.sql | ||
| run_psql -v ON_ERROR_STOP=1 -A -t -c "SELECT 'CREATE OR REPLACE VIEW ' || '$VIEW_NAME' || ' AS' || chr(10) || pg_get_viewdef('$VIEW_NAME'::regclass, true);" >> database/views/$VIEW_NAME.sql |
There was a problem hiding this comment.
These changes ensure that we always have a newline character after the AS portion of the view output.
It seems as far as I can tell that sqruff does not have a setting for this controlling this, and it was leading to the bot making repeated changes to view files.
To follow along, see:
- 19198ca, the bot's first commit to this PR, in which it undid changes to views that only it had touched.
- 95f43bc, the bot reverting most of those changes after I introduced the newline (
chr(10)) change above
My hope is that after this commit goes through we will stop seeing so much noise from the bot 🤞
✅ Deploy Preview for atd-vze-staging canceled.
|
cad_incidents table to the DB| UNIQUE (incident_group_id, master_incident_id) | ||
| ); | ||
|
|
||
| comment on table cad_incident_groups is 'Table showing linkages of CAD incident groups, which are a poorly understood grouping of related incidents generated by the CAD system. Not all incident IDs in this table will have a corresponding record in the cad_incidents table, because some of those incidents may fall outside the scope of our CAD data query (which includes crash related records only)'; |
There was a problem hiding this comment.
TBD how useful this table will be. I still don't understand the how/when/why the CAD system generates these incident groups.
It's further complicated by the fact that some master_incident_id's referenced here will not be in our database, because they related to a non-crash incident category. This is why I did not establish a foreign key relationship between cad_incident_groups and cad_incidents.
There was a problem hiding this comment.
Interesting. At first I thought the comment on the table was also lifted from the APD public dataset and I thought "poorly understood" was oddly honest. I can't really picture what this table is, but it sounds like I am not alone in that.
|
|
||
| CREATE INDEX idx_cad_incidents_response_date ON cad_incidents (response_date); | ||
|
|
||
| comment ON TABLE cad_incidents IS 'This dataset contains information on both 911 calls (usually referred to as Calls for Service or Dispatched Incidents) and officer-initiated incidents related to traffic crashes as recorded in the Austin public safety Computer Aided Dispatch (CAD) system. Data is provided by the public safety enterprise data team after approval by AFD, ATCEMS, and APD'; |
There was a problem hiding this comment.
I sourced most of the comments from APD's public dataset:
https://data.austintexas.gov/Public-Safety/APD-Computer-Aided-Dispatch-Incidents/22de-7rzg/explore
Charlie-Henry
left a comment
There was a problem hiding this comment.
On step 8:
Verify that the
location_idandin_austin_full_purposecolumns are populating on the cad_incidents table.
I did see some 129 records missing a location_id, I dunno if that is expected.
Everything else is pretty minor stuff/questions. I did have a small fix for the env_template for anyone testing this themselves. After that, everything worked well.
There was a problem hiding this comment.
I'm curious why we aren't retrieving the data directly from the warehouse? Sorry probably lost some context along the way.
| List: List of S3 object keys, sorted oldest to newest | ||
| """ | ||
| prefix = f"{BUCKET_ENV}/cad_incidents/{subdir}" | ||
| response = s3_client.list_objects( |
There was a problem hiding this comment.
A thing I'm always reminded of that list_objects has a limit of 1k objects being returned. Maybe something to be aware of when doing the backfill of data.
| We expect object keys like: | ||
| - some/path/TPWCADTrafficSafetyWithGroupIDDaily_20260410.CSV | ||
| - some/path/TPWCADTrafficSafetyDaily_20260410.CSV |
There was a problem hiding this comment.
would it be possible we'd end up with files in our inbox not following this format?
There was a problem hiding this comment.
Unintentionally, for sure. I noticed i forgot to uncomment the file name check in is_file_to_process() which is looking out for this.
There was a problem hiding this comment.
we check is_file_to_process() when getting files locally, should we also check the file names from the s3 bucket?
| """ | ||
| for row in data: | ||
| for source_key, target_key in cols_to_rename.items(): | ||
| row[target_key] = row.pop(source_key) |
There was a problem hiding this comment.
If the csv file had a new column added we were not expecting I think this would not drop it?
|
|
||
| We process two distinct files on a daily basis: | ||
|
|
||
| - The CAD incident incident file, in which each row is a crash-related CAD incident responded to by AFD, EMS, or APD. Sample filename: `TPWCADTrafficSafetyDaily_20260410.CSV` |
chiaberry
left a comment
There was a problem hiding this comment.
Tested and saw the files move the way I expected. When do you anticipate we would import the files directly from local vs uploading to s3 and then importing from there?
| csv_content = download_file_s3(file_obj_key_or_path) | ||
| csv_content = download_file_s3(file_obj_key_or_path) |
| We expect object keys like: | ||
| - some/path/TPWCADTrafficSafetyWithGroupIDDaily_20260410.CSV | ||
| - some/path/TPWCADTrafficSafetyDaily_20260410.CSV |
There was a problem hiding this comment.
we check is_file_to_process() when getting files locally, should we also check the file names from the s3 bucket?
| date_field_names (str[]): list of field names which hold date values to update | ||
| date_format (str): the format of the input date string, which will be use to parse the string | ||
| into a datetime object | ||
| tz (string): The IANA time zone name of the input time value. Defaluts to America/Chicago |
There was a problem hiding this comment.
two tiny typos. line 64 sting/string and Defaluts/defaults
| ) | ||
|
|
||
| if not files_todo: | ||
| raise Exception("No CAD files found in S3 inbox") |
There was a problem hiding this comment.
I think this would only happen if there were no files in the local_files_to_process since get_s3_files_todo throws an error if there are no files
Associated issues
This PR brings CAD data into the database! We currently have a live daily extract of CAD files accumulating in our shared network drive at
/mnt/vision_zero_cadon our02server. We've also been provided with backfill files containing ~200mb of records going back to 2015. We'll process those mega files post-go-live.Todos outside the scope of this PR:
Testing
URL to test: Local
testing
Tests involve adding and removing files in S3, so it would be good to give folks a heads-up in Slack to avoid conflicts.
Start your local stack and apply migrations and metadata.
Set up your env file by copying
env_templateand filling in the blanks. If you already have a.envfile for theafd_ems_importETL, you can use that (just make sure it uses dev variables, not prod)Acquire local files. in AWS S3, navigate to
atd-vision-zero/dev/cad_incidents/archiveand download all of the files from this archive directory, saving them to the./test_datadirectory in this repo.Now we will test the
incidents_to_s3.pyscript, starting with a dry run. This command uses the docker-compose.local override file to mount your./test-datadirectory into the container's network drive mount point.Confirm that the output lists files oldest to newest based on the timestamp in the filename, with the
WithGroupIDfiles always processed after the file without that string in the filename. The output should look something like this--dry-runflag run and the script again:The output should look similar to the previous step and display the name of each file that was uploaded to S3. In the AWS console, navigate to the CAD incidents inbox (
atd-vision-zero/dev/cad_incidents/inbox) and observe that files have been uploaded there. The timestamps shown in the Last modified column should all be very recent.--removedirective. This will cause the processed files to be deleted from your filesystem.Confirm the files have been deleted from your
./test_datadirectory.incident_import.py, we will now process the files from the S3 inbox into our local database. Start with a dry run:Confirm that the output lists files oldest to newest based on the timestamp in the filename, with the
WithGroupIDfiles always processed after the file without that string in the filename. It should look something like this:--dry-runflag:Once the script completes, use your SQL client to inspect the records in our new tables. Verify that the
location_idandin_austin_full_purposecolumns are populating on thecad_incidentstable.--archiveflag.Use your SQl client to confirm that the records in the
cad_incidentstable now have different timestamps in thecreated_atandupdated_atcolumns.Head to the CAD incidents inbox (
atd-vision-zero/dev/cad_incidents/inbox) and confirm that files have been removed form this directory, and that the files in the/archivedirectory have recent timestamps.Ship list
mainbranch