Conversation
|
@mbarlow12 I'm assuming you'll want to make a bunch of changes on this. Feel free to take over the branch, but I'd prefer if we keep the signature on the new POST (i.e. takes an array of resource IDs, returns a Resource object). |
There was a problem hiding this comment.
Pull request overview
This pull request adds a new /resources/merge endpoint to the API that implements stub functionality for merging multiple resources. The endpoint accepts a list of resource IDs and repoints resources[1:] to resources[0] as the canonical target. This is explicitly noted as a springboard for future entity-linkage service development and intentionally lacks automated tests.
Changes:
- Added POST
/resources/mergeendpoint with MergeRequest model for accepting resource IDs - Implemented
merge_resourcesfunction in resource_actions that updates repointed_id field on resources - Added seed data (2 resources, 2 sources, 2 memberships) to support merge testing scenarios
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| deployments/api/src/stitch/api/routers/resources.py | Adds merge endpoint with validation, logging, and error handling wrapper around resource_actions.merge_resources |
| deployments/api/src/stitch/api/db/resource_actions.py | Implements merge_resources function that validates resource existence and updates repointed_id fields |
| deployments/api/src/stitch/api/db/init_job.py | Adds test/seed data for merge scenarios: "Merge Target" and "Merge Consumed" resources with associated sources |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| session: AsyncSession, | ||
| user: CurrentUser, | ||
| ids: Sequence[int], |
There was a problem hiding this comment.
The user parameter is accepted but never used in the merge_resources function. If this is intended for future authorization checks or audit logging (e.g., updating last_updated_by_id on modified resources), consider adding a TODO comment. Otherwise, the parameter can be removed to reduce confusion.
There was a problem hiding this comment.
@mbarlow12 my local IDE kept warning me about this too (User isn't consumed in the resource_actions). I'm guessing that when we start actually using permissions to filter the reesources, that's when we'll start using it?
| if not ids: | ||
| raise HTTPException(status_code=400, detail="No resource IDs provided.") | ||
| # preserve order but drop duplicates | ||
| unique_ids = list(dict.fromkeys(ids)) | ||
| if len(unique_ids) < 2: | ||
| raise HTTPException( | ||
| status_code=400, detail="Provide at least 2 unique resource IDs." | ||
| ) |
There was a problem hiding this comment.
The validation logic in lines 171-178 duplicates the validation already performed in the router endpoint (lines 58-61 of resources.py). Since the endpoint already validates that there are at least 2 unique resource IDs before calling this function, this redundant validation can be removed. The check for empty IDs list at line 171-172 is also unnecessary since the endpoint guarantees the list is non-empty after deduplication.
| if not ids: | |
| raise HTTPException(status_code=400, detail="No resource IDs provided.") | |
| # preserve order but drop duplicates | |
| unique_ids = list(dict.fromkeys(ids)) | |
| if len(unique_ids) < 2: | |
| raise HTTPException( | |
| status_code=400, detail="Provide at least 2 unique resource IDs." | |
| ) | |
| # preserve order but drop duplicates | |
| unique_ids = list(dict.fromkeys(ids)) |
| except HTTPException: | ||
| raise | ||
| except Exception as exc: | ||
| logger.exception("Error while merging resources %s: %s", unique_ids, exc) |
There was a problem hiding this comment.
The logger.exception call already includes the full exception traceback and the exception message automatically. Including the exception in the log message string as well (with %s, exc) results in duplicate exception information in the logs. Consider removing exc from the message format and just using: logger.exception("Error while merging resources %s", unique_ids)
| logger.exception("Error while merging resources %s: %s", unique_ids, exc) | |
| logger.exception("Error while merging resources %s", unique_ids) |
|
|
||
| logger.info( | ||
| "Merge requested by user=%s for resource_ids=%s", | ||
| getattr(user, "sub", "<anon>"), |
There was a problem hiding this comment.
The use of getattr(user, "sub", "") is unnecessary. According to the User entity definition (entities.py line 167), the 'sub' field is a required string field (not optional). The user parameter is typed as CurrentUser which is a User entity with sub always present. This can be simplified to just user.sub.
| getattr(user, "sub", "<anon>"), | |
| user.sub, |
| for rid in other_ids: | ||
| model = await session.get(ResourceModel, rid) | ||
| if model is None: | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail=f"No Resource with id `{rid}` found.", | ||
| ) | ||
| model.repointed_id = target_id |
There was a problem hiding this comment.
The merge_resources function modifies multiple resource records (lines 192-199) but doesn't wrap them in a database transaction beyond the implicit session transaction. If one of the resource IDs doesn't exist, the function will raise an HTTPException after potentially having already modified some resources. Consider checking all resource existence upfront before making any modifications, or ensure the transaction semantics are clear. The current implementation fetches and validates each resource sequentially, which could lead to partial modifications if a later resource is not found.
| @router.post("/merge", response_model=Resource) | ||
| async def merge_resources_endpoint( | ||
| *, uow: UnitOfWorkDep, user: CurrentUser, payload: MergeRequest | ||
| ) -> Resource: | ||
| """ | ||
| Merge multiple resources into one (STUB): | ||
| repoint resource_ids[1:] -> resource_ids[0] | ||
| """ | ||
| ids = payload.resource_ids | ||
| # preserve order but drop duplicates | ||
| unique_ids = list(dict.fromkeys(ids)) | ||
| if len(unique_ids) < 2: | ||
| raise HTTPException( | ||
| status_code=400, detail="Provide at least 2 unique resource IDs" | ||
| ) | ||
|
|
||
| logger.info( | ||
| "Merge requested by user=%s for resource_ids=%s", | ||
| getattr(user, "sub", "<anon>"), | ||
| unique_ids, | ||
| ) | ||
|
|
||
| try: | ||
| return await resource_actions.merge_resources( | ||
| session=uow.session, | ||
| user=user, | ||
| ids=unique_ids, | ||
| ) | ||
| except HTTPException: | ||
| raise | ||
| except Exception as exc: | ||
| logger.exception("Error while merging resources %s: %s", unique_ids, exc) | ||
| raise HTTPException(status_code=500, detail="Internal error during merge") |
There was a problem hiding this comment.
The new merge endpoint lacks test coverage. The repository has comprehensive test infrastructure for resources endpoints (both unit and integration tests in tests/routers/), and all other endpoints have corresponding tests. Consider adding at least:
- Unit tests covering the validation logic (empty list, single ID, duplicate IDs)
- Unit tests for error handling (non-existent resource IDs, HTTPException propagation)
- Integration tests verifying the actual merge behavior (repointing, target resource returned)
| # Ensure target exists | ||
| target_model = await session.get(ResourceModel, target_id) | ||
| if target_model is None: | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail=f"No Resource with id `{target_id}` found.", | ||
| ) | ||
|
|
||
| # Ensure all others exist, then repoint them | ||
| for rid in other_ids: | ||
| model = await session.get(ResourceModel, rid) | ||
| if model is None: | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail=f"No Resource with id `{rid}` found.", | ||
| ) |
There was a problem hiding this comment.
The merge logic doesn't check if any of the resources being merged (ids[1:]) already have a repointed_id set. If a resource is already repointed to another resource, this creates a chain of repointing rather than updating the entire chain. Consider either:
- Preventing merges of already-repointed resources (validate that all resources in ids[1:] have repointed_id=None)
- Resolving the full repointing chain before merging
- Updating all resources in the chain to point to the new target
Additionally, there's no check to prevent a resource from being merged with itself (if the same ID appears in different positions after deduplication at the router level).
| # Ensure target exists | |
| target_model = await session.get(ResourceModel, target_id) | |
| if target_model is None: | |
| raise HTTPException( | |
| status_code=HTTP_404_NOT_FOUND, | |
| detail=f"No Resource with id `{target_id}` found.", | |
| ) | |
| # Ensure all others exist, then repoint them | |
| for rid in other_ids: | |
| model = await session.get(ResourceModel, rid) | |
| if model is None: | |
| raise HTTPException( | |
| status_code=HTTP_404_NOT_FOUND, | |
| detail=f"No Resource with id `{rid}` found.", | |
| ) | |
| # Ensure target exists and is not already repointed | |
| target_model = await session.get(ResourceModel, target_id) | |
| if target_model is None: | |
| raise HTTPException( | |
| status_code=HTTP_404_NOT_FOUND, | |
| detail=f"No Resource with id `{target_id}` found.", | |
| ) | |
| if getattr(target_model, "repointed_id", None) is not None: | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Cannot merge into Resource `{target_id}` because it has already been merged/repointed.", | |
| ) | |
| # Ensure all others exist and are not already repointed, then repoint them | |
| for rid in other_ids: | |
| # Explicit guard against self-merge, even if IDs were de-duplicated earlier | |
| if rid == target_id: | |
| raise HTTPException( | |
| status_code=400, | |
| detail="Cannot merge a resource into itself.", | |
| ) | |
| model = await session.get(ResourceModel, rid) | |
| if model is None: | |
| raise HTTPException( | |
| status_code=HTTP_404_NOT_FOUND, | |
| detail=f"No Resource with id `{rid}` found.", | |
| ) | |
| if getattr(model, "repointed_id", None) is not None: | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Resource with id `{rid}` has already been merged/repointed.", | |
| ) |
| class MergeRequest(BaseModel): | ||
| resource_ids: list[int] |
There was a problem hiding this comment.
The MergeRequest model lacks validation constraints. Consider adding:
- A minimum length validator to ensure at least 2 IDs are provided (currently only validated in the endpoint logic)
- Documentation about the merge behavior (that ids[0] becomes the target)
This would make the API contract clearer and move validation into the request model where it belongs.
|
@mbarlow12 Copilot brought up a bunch of issues with my implementation (which all seem fair), but I'm probably not the best person to figure out what the "right" answer is. You mentioned in meeting yesterday wanting to use what's already available in I'm fine if you want to either take over this PR, or close it. From a design perspective, I'd like to have a merge endpoint at |
Adds a
/resources/mergeendpoint to API.Note: This has not added any automated tests, and exists primarily as a springboard for adding an entity-linkage service.