Skip to content

fix: address 5 code quality issues found by AI analysis#7

Open
shanebarakat wants to merge 3 commits intoopenstreetmap-polska:masterfrom
shanebarakat:fix/code-quality-1771026312
Open

fix: address 5 code quality issues found by AI analysis#7
shanebarakat wants to merge 3 commits intoopenstreetmap-polska:masterfrom
shanebarakat:fix/code-quality-1771026312

Conversation

@shanebarakat
Copy link
Copy Markdown

AI-Detected Code Quality Improvements

This PR addresses code quality issues detected by automated AI analysis using Claude Code CLI.

Issues Fixed

# File Severity Category Description
1 report.py:28 high bug overpass_to_dataframe() destructively mutates the input dict by deleting 'tags' from each element (del elem['tags']) and merging tag keys into the element. Since the same overpass_data dict is passed to both overpass_to_dataframe() (via create_report_md) and osm_cache.update(), and the dict is also backed up via json.dump, calling order in main() matters. Currently backup() is called before generate_report(), so the backup is safe, but osm_cache.update() is called before generate_report() and iterates elem['tags'] — if the call order were ever changed, it would crash. More critically, if create_report_md is called and then the data is reused (e.g., for debugging or future features), the original dict is silently corrupted. This is a latent data-corruption bug.
2 report.py:16 medium bug current_datetime, current_date, and current_date_str are computed at module import time (line 16-19), not at report generation time. Since the module is imported once when the process starts, if the process runs across midnight (or the module is cached in a long-running process), the report will show stale date/time values. In the CI context this is unlikely to matter, but it's a real bug for correctness — the 'As at' timestamp in plots will reflect when the module was imported, not when the report was actually generated.
3 report.py:77 medium bug current_year_aed_scatter_plot() accesses df_first_day_of_year.iloc[0] (line 77), which will raise an IndexError if there is no data entry for January 1st of the current year. This can happen if the project was started mid-year, if data collection began after Jan 1, or if no AED changes occurred on Jan 1. The function filters df_year for rows where date == first_day_of_year, but there's no guarantee such a row exists.
4 main.py:29 high bug download_data() uses requests.get() with no timeout parameter. The Overpass API can hang indefinitely, and without a timeout, the process will block forever. The TIMEOUT constant (30s) is only used as a sleep delay between retries, not as a request timeout. In a CI/CD environment (GitHub Actions), this could cause the workflow to hang until the job timeout (6 hours by default).
5 osmcache.py:63 medium error-handling In _fetch_object_history(), the HTTP response is never checked for errors. If the OSM API returns a 404, 429 (rate limit), or 500 error, response.json() will either raise a JSONDecodeError or return unexpected data, and the subsequent ['elements'] access will raise a KeyError. The error is caught by the retry loop, but the generic exception handler silently swallows the root cause (HTTP status), making debugging difficult. Additionally, there's no timeout on the session.get() call, so it can hang indefinitely.

This PR was automatically generated. Please review the changes carefully.

- [high/bug] overpass_to_dataframe() destructively mutates the input dict by deleting 'tags' from each element (del elem['tags']) and merging tag keys into the element. Since the same overpass_data dict is passed to both overpass_to_dataframe() (via create_report_md) and osm_cache.update(), and the dict is also backed up via json.dump, calling order in main() matters. Currently backup() is called before generate_report(), so the backup is safe, but osm_cache.update() is called before generate_report() and iterates elem['tags'] — if the call order were ever changed, it would crash. More critically, if create_report_md is called and then the data is reused (e.g., for debugging or future features), the original dict is silently corrupted. This is a latent data-corruption bug.
- [medium/bug] current_datetime, current_date, and current_date_str are computed at module import time (line 16-19), not at report generation time. Since the module is imported once when the process starts, if the process runs across midnight (or the module is cached in a long-running process), the report will show stale date/time values. In the CI context this is unlikely to matter, but it's a real bug for correctness — the 'As at' timestamp in plots will reflect when the module was imported, not when the report was actually generated.
- [medium/bug] current_year_aed_scatter_plot() accesses df_first_day_of_year.iloc[0] (line 77), which will raise an IndexError if there is no data entry for January 1st of the current year. This can happen if the project was started mid-year, if data collection began after Jan 1, or if no AED changes occurred on Jan 1. The function filters df_year for rows where date == first_day_of_year, but there's no guarantee such a row exists.
- [high/bug] download_data() uses requests.get() with no timeout parameter. The Overpass API can hang indefinitely, and without a timeout, the process will block forever. The TIMEOUT constant (30s) is only used as a sleep delay between retries, not as a request timeout. In a CI/CD environment (GitHub Actions), this could cause the workflow to hang until the job timeout (6 hours by default).
- [medium/error-handling] In _fetch_object_history(), the HTTP response is never checked for errors. If the OSM API returns a 404, 429 (rate limit), or 500 error, response.json() will either raise a JSONDecodeError or return unexpected data, and the subsequent ['elements'] access will raise a KeyError. The error is caught by the retry loop, but the generic exception handler silently swallows the root cause (HTTP status), making debugging difficult. Additionally, there's no timeout on the session.get() call, so it can hang indefinitely.
@shanebarakat
Copy link
Copy Markdown
Author

Hey @openstreetmap-polska!

We noticed you don't have automated code reviews on this repo. We ran Paragon on this PR and it caught some real issues — check the inline comments above.

Found 5 issues including bugs and security fixes.

We'd love to help your project out. Shoot an email to shane@polarity.so and we can chat!


The fixes in this PR are real — feel free to merge, modify, or close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant