-
Notifications
You must be signed in to change notification settings - Fork 0
Bugfix/18 fix kartverket api error on cloud #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors county polygon retrieval to add blob storage caching, improving performance by reducing redundant API calls to the Kartverket service. The main changes include renaming the retrieval method for clarity, implementing a caching layer using Parquet files in blob storage, and updating dependency injection to support the new architecture.
- Renamed
get_county_wkb_by_idtoget_county_polygons_by_idfor better clarity - Implemented blob storage caching with Parquet files to store county polygon data
- Updated dependency injection to provide blob storage and bytes services to CountyService
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/infra/infrastructure/services/county_service.py | Added caching logic with helper methods to read from and write county polygons to blob storage |
| src/infra/infrastructure/containers.py | Updated CountyService dependency injection to include blob_storage_service and bytes_service |
| src/config.py | Added COUNTY_FILE_NAME constant and type annotation for RELEASE_FILE_NAME |
| src/application/contracts/county_service_interface.py | Renamed method from get_county_wkb_by_id to get_county_polygons_by_id |
| src/presentation/entrypoints/release_pipeline.py | Updated method call to use renamed get_county_polygons_by_id |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_county_polygons_by_id(self, county_id: str, epsg_code: EPSGCode) -> tuple[bytes, dict[str, Any]]: | ||
| geometries = self.__get_county_polygons_from_blob_storage(region=county_id) | ||
|
|
||
| if geometries is not None: | ||
| return geometries | ||
|
|
||
| wkb, geo_json = self.__fetch_county_polygons_from_api(region=county_id, epsg_code=epsg_code) | ||
| self.__write_county_to_blob_storage(region=county_id, wkb=wkb, geo_json=geo_json) | ||
|
|
||
| return wkb, geo_json |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching implementation doesn't account for the epsg_code parameter. When cached data is returned from blob storage (line 40-41), it may be in a different coordinate system than requested. The cache should either:
- Store the EPSG code along with the polygon data and check it matches before returning cached data, or
- Document that only WGS84 is supported and validate the parameter
This could lead to incorrect spatial operations if the method is ever called with a different EPSG code.
| else pd.DataFrame(columns=["region", "wkb", "json"]) | ||
|
|
||
| json_string = json.dumps(geo_json) | ||
| if region in county_df["region"].values: | ||
| mask = county_df["region"] == region | ||
| county_df.loc[mask, "wkb"] = wkb | ||
| county_df.loc[mask, "json"] = json_string | ||
| else: | ||
| new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]}) |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning bytes directly to DataFrame cells using .loc may not work correctly with all pandas versions. The bytes object might be stored as a reference rather than properly serialized. Consider using .at for scalar assignment or explicitly setting the dtype to object when creating the DataFrame to ensure bytes are stored correctly.
| else pd.DataFrame(columns=["region", "wkb", "json"]) | |
| json_string = json.dumps(geo_json) | |
| if region in county_df["region"].values: | |
| mask = county_df["region"] == region | |
| county_df.loc[mask, "wkb"] = wkb | |
| county_df.loc[mask, "json"] = json_string | |
| else: | |
| new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]}) | |
| else pd.DataFrame(columns=["region", "wkb", "json"]).astype({"wkb": object}) | |
| # Ensure 'wkb' column is dtype object for bytes storage | |
| if "wkb" in county_df.columns: | |
| county_df["wkb"] = county_df["wkb"].astype(object) | |
| json_string = json.dumps(geo_json) | |
| if region in county_df["region"].values: | |
| mask = county_df["region"] == region | |
| county_df.loc[mask, "wkb"] = wkb | |
| county_df.loc[mask, "json"] = json_string | |
| else: | |
| new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]}).astype({"wkb": object}) |
| county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes) | ||
| if region not in county_df["region"].values: | ||
| return None | ||
|
|
||
| row = county_df.loc[county_df["region"] == region].iloc[0] | ||
| return row["wkb"], json.loads(row["json"]) | ||
|
|
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method lacks error handling for potential failures when parsing the parquet file (line 76) or deserializing JSON (line 81). If the cached file is corrupted or in an unexpected format, the method will raise an unhandled exception instead of falling back to the API. Consider wrapping these operations in a try-except block and returning None on error to allow fallback to the API fetch.
| county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes) | |
| if region not in county_df["region"].values: | |
| return None | |
| row = county_df.loc[county_df["region"] == region].iloc[0] | |
| return row["wkb"], json.loads(row["json"]) | |
| try: | |
| county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes) | |
| if region not in county_df["region"].values: | |
| return None | |
| row = county_df.loc[county_df["region"] == region].iloc[0] | |
| return row["wkb"], json.loads(row["json"]) | |
| except Exception: | |
| return None |
This pull request refactors the county polygon retrieval logic to add blob storage caching and improve maintainability. The main changes include renaming the county polygon retrieval method, introducing caching of county polygons in blob storage, and updating dependency injection to support the new services.
County polygon retrieval and caching improvements:
get_county_wkb_by_idin both the interface and implementation has been renamed toget_county_polygons_by_idfor clarity and consistency. [1] [2] [3]counties.parquet). If not present, it fetches from the API, stores the result in blob storage, and returns the data.Dependency injection and configuration updates:
Containers) has been updated to provide the required blob storage and bytes services toCountyService. [1] [2]COUNTY_FILE_NAMEwas added for the counties Parquet file.These changes improve performance by reducing redundant API calls and make the county polygon retrieval logic more robust and maintainable.