Skip to content

Conversation

@jathavaan
Copy link
Collaborator

This pull request refactors the county polygon retrieval logic to add blob storage caching and improve maintainability. The main changes include renaming the county polygon retrieval method, introducing caching of county polygons in blob storage, and updating dependency injection to support the new services.

County polygon retrieval and caching improvements:

  • The method get_county_wkb_by_id in both the interface and implementation has been renamed to get_county_polygons_by_id for clarity and consistency. [1] [2] [3]
  • County polygons are now cached in blob storage: when a county's polygons are requested, the service first checks blob storage (counties.parquet). If not present, it fetches from the API, stores the result in blob storage, and returns the data.
  • Helper methods were added to handle reading from and writing to blob storage, using Parquet files and supporting services for byte and blob operations.

Dependency injection and configuration updates:

  • The dependency injection container (Containers) has been updated to provide the required blob storage and bytes services to CountyService. [1] [2]
  • A new config constant COUNTY_FILE_NAME was added for the counties Parquet file.

These changes improve performance by reducing redundant API calls and make the county polygon retrieval logic more robust and maintainable.

@jathavaan jathavaan self-assigned this Nov 17, 2025
Copilot AI review requested due to automatic review settings November 17, 2025 18:39
@jathavaan jathavaan linked an issue Nov 17, 2025 that may be closed by this pull request
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors county polygon retrieval to add blob storage caching, improving performance by reducing redundant API calls to the Kartverket service. The main changes include renaming the retrieval method for clarity, implementing a caching layer using Parquet files in blob storage, and updating dependency injection to support the new architecture.

  • Renamed get_county_wkb_by_id to get_county_polygons_by_id for better clarity
  • Implemented blob storage caching with Parquet files to store county polygon data
  • Updated dependency injection to provide blob storage and bytes services to CountyService

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/infra/infrastructure/services/county_service.py Added caching logic with helper methods to read from and write county polygons to blob storage
src/infra/infrastructure/containers.py Updated CountyService dependency injection to include blob_storage_service and bytes_service
src/config.py Added COUNTY_FILE_NAME constant and type annotation for RELEASE_FILE_NAME
src/application/contracts/county_service_interface.py Renamed method from get_county_wkb_by_id to get_county_polygons_by_id
src/presentation/entrypoints/release_pipeline.py Updated method call to use renamed get_county_polygons_by_id

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings November 17, 2025 18:55
@jathavaan jathavaan merged commit 5ce479d into main Nov 17, 2025
2 checks passed
@jathavaan jathavaan deleted the bugfix/18-fix-kartverket-api-error-on-cloud branch November 17, 2025 18:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +46
def get_county_polygons_by_id(self, county_id: str, epsg_code: EPSGCode) -> tuple[bytes, dict[str, Any]]:
geometries = self.__get_county_polygons_from_blob_storage(region=county_id)

if geometries is not None:
return geometries

wkb, geo_json = self.__fetch_county_polygons_from_api(region=county_id, epsg_code=epsg_code)
self.__write_county_to_blob_storage(region=county_id, wkb=wkb, geo_json=geo_json)

return wkb, geo_json
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caching implementation doesn't account for the epsg_code parameter. When cached data is returned from blob storage (line 40-41), it may be in a different coordinate system than requested. The cache should either:

  1. Store the EPSG code along with the polygon data and check it matches before returning cached data, or
  2. Document that only WGS84 is supported and validate the parameter

This could lead to incorrect spatial operations if the method is ever called with a different EPSG code.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +99
else pd.DataFrame(columns=["region", "wkb", "json"])

json_string = json.dumps(geo_json)
if region in county_df["region"].values:
mask = county_df["region"] == region
county_df.loc[mask, "wkb"] = wkb
county_df.loc[mask, "json"] = json_string
else:
new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]})
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning bytes directly to DataFrame cells using .loc may not work correctly with all pandas versions. The bytes object might be stored as a reference rather than properly serialized. Consider using .at for scalar assignment or explicitly setting the dtype to object when creating the DataFrame to ensure bytes are stored correctly.

Suggested change
else pd.DataFrame(columns=["region", "wkb", "json"])
json_string = json.dumps(geo_json)
if region in county_df["region"].values:
mask = county_df["region"] == region
county_df.loc[mask, "wkb"] = wkb
county_df.loc[mask, "json"] = json_string
else:
new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]})
else pd.DataFrame(columns=["region", "wkb", "json"]).astype({"wkb": object})
# Ensure 'wkb' column is dtype object for bytes storage
if "wkb" in county_df.columns:
county_df["wkb"] = county_df["wkb"].astype(object)
json_string = json.dumps(geo_json)
if region in county_df["region"].values:
mask = county_df["region"] == region
county_df.loc[mask, "wkb"] = wkb
county_df.loc[mask, "json"] = json_string
else:
new_row = pd.DataFrame({"region": [region], "wkb": [wkb], "json": [json_string]}).astype({"wkb": object})

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +82
county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes)
if region not in county_df["region"].values:
return None

row = county_df.loc[county_df["region"] == region].iloc[0]
return row["wkb"], json.loads(row["json"])

Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method lacks error handling for potential failures when parsing the parquet file (line 76) or deserializing JSON (line 81). If the cached file is corrupted or in an unexpected format, the method will raise an unhandled exception instead of falling back to the API. Consider wrapping these operations in a try-except block and returning None on error to allow fallback to the API fetch.

Suggested change
county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes)
if region not in county_df["region"].values:
return None
row = county_df.loc[county_df["region"] == region].iloc[0]
return row["wkb"], json.loads(row["json"])
try:
county_df = self.__bytes_service.convert_parquet_bytes_to_df(county_bytes)
if region not in county_df["region"].values:
return None
row = county_df.loc[county_df["region"] == region].iloc[0]
return row["wkb"], json.loads(row["json"])
except Exception:
return None

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Kartverket API error on cloud

1 participant