Conversation
There was a problem hiding this comment.
Pull request overview
This PR migrates the API deployment toward using the workspace stitch-models package and simultaneously simplifies the API’s “resources” domain to a domain-agnostic shape (removing source-data handling across the API, DB actions, seeds, and tests).
Changes:
- Add
stitch-modelsas a workspace dependency for the API deployment. - Introduce domain-agnostic request/response models for the resources API and refactor router + DB actions accordingly.
- Remove source-table modeling/seed logic and update API tests to match the simplified resource model.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Adds stitch-models to the workspace lock dependencies. |
deployments/api/pyproject.toml |
Declares stitch-models as a workspace dependency for the API. |
packages/stitch-models/src/stitch/models/__init__.py |
Exports EmptySourcePayload and a BaseResource type alias for domain-agnostic usage. |
deployments/api/src/stitch/api/resources/entities.py |
Adds new domain-agnostic CreateResource/Resource API schemas. |
deployments/api/src/stitch/api/routers/resources.py |
Switches the resources router to the new domain-agnostic schemas. |
deployments/api/src/stitch/api/entities.py |
Removes the old resource/source-related entities, leaving User. |
deployments/api/src/stitch/api/db/resource_actions.py |
Simplifies create/get/get_all to be domain-agnostic and removes source/membership creation. |
deployments/api/src/stitch/api/db/model/sources.py |
Removes source-table ORM models and related helpers. |
deployments/api/src/stitch/api/db/model/resource.py |
Adjusts membership typing and switches IdType import to stitch-models. |
deployments/api/src/stitch/api/db/model/mixins.py |
Broadens payload typing to BaseModel and changes how source is set from payload. |
deployments/api/src/stitch/api/db/model/__init__.py |
Stops exporting removed source models. |
deployments/api/src/stitch/api/db/init_job.py |
Removes seed sources/memberships and updates dev seeding for simplified resources. |
deployments/api/tests/utils.py |
Simplifies test factories to produce the new minimal CreateResource payload. |
deployments/api/tests/routers/test_resources_unit.py |
Updates unit tests to use the new API schemas and simplified payload validation. |
deployments/api/tests/routers/test_resources_integration.py |
Updates integration tests to match the new response/request shapes (no country/source assertions). |
deployments/api/tests/db/test_resource_actions.py |
Rewrites DB integration tests for the simplified resource actions and adds a get_all test. |
deployments/api/tests/conftest.py |
Removes fixtures tied to removed source models and their defaults. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from stitch.api.db import resource_actions | ||
| from stitch.api.db.config import UnitOfWorkDep | ||
| from stitch.api.auth import CurrentUser | ||
| from stitch.api.entities import CreateResource, Resource | ||
| from stitch.api.resources.entities import CreateResource, Resource | ||
|
|
There was a problem hiding this comment.
PR description says the API deployment model is migrated to the workspace package stitch-models, but the request/response schemas used by the resources router are still defined locally in stitch.api.resources.entities (only IdType was moved to stitch-models). Either update the PR description to reflect the actual scope, or move/re-export these API models from stitch-models to match the stated goal.
| async def get_all(session: AsyncSession) -> Sequence[Resource]: | ||
| stmt = ( | ||
| select(ResourceModel) | ||
| .where(ResourceModel.repointed_id.is_(None)) | ||
| .options(selectinload(ResourceModel.memberships)) | ||
| ) | ||
| stmt = select(ResourceModel).where(ResourceModel.repointed_id.is_(None)) | ||
| models = (await session.scalars(stmt)).all() | ||
| fn = partial(resource_model_to_entity, session) | ||
| return await asyncio.gather(*[fn(m) for m in models]) |
There was a problem hiding this comment.
get_all() uses asyncio.gather() to call resource_model_to_entity() concurrently while sharing the same AsyncSession. SQLAlchemy AsyncSession does not support concurrent IO, so this can raise runtime errors (and it also introduces an N+1 query pattern via get_constituents_by_root_id). Prefer iterating sequentially, or refactor to prefetch constituents in bulk without per-resource queries (or use separate sessions per task if you truly need concurrency).
| created=str(model.created) if getattr(model, "created", None) else None, | ||
| updated=str(model.updated) if getattr(model, "updated", None) else None, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
resource_model_to_entity() passes created/updated into Resource(...), but the Resource model currently has no such fields. This is dead code today (extras are ignored) and will break if Resource is later configured with extra='forbid'. Either add these fields to the Resource schema (if they’re part of the API contract) or stop including them here.
| created=str(model.created) if getattr(model, "created", None) else None, | |
| updated=str(model.updated) if getattr(model, "updated", None) else None, | |
| ) | |
| ) |
| def create( | ||
| cls, | ||
| created_by: UserEntity, | ||
| resource: "ResourceModel", | ||
| source: SourceKey, | ||
| source: str, | ||
| source_pk: IdType, | ||
| status: MembershipStatus = MembershipStatus.ACTIVE, | ||
| ): | ||
| model = cls( | ||
| resource_id=resource.id, | ||
| source=source, | ||
| source_pk=str(source_pk), | ||
| source_pk=int(source_pk), | ||
| status=status, |
There was a problem hiding this comment.
MembershipModel.create() accepts source_pk: IdType (which includes UUID via stitch.models.types.IdType) but then coerces it with int(source_pk) into a BIGINT column. int(UUID) yields a 128-bit integer that will overflow PostgreSQL BIGINT, causing failures at insert time. Either constrain source_pk to int here (and in callers), or change the storage type/serialization to safely handle UUIDs (e.g., store UUID as text).
| # DB ResourceModel may store `.name`; tolerate either while refactor settles. | ||
| assert getattr(db_resource, "name", None) in (None, "Test Label") |
There was a problem hiding this comment.
This test assertion is currently tolerant of the resource name not being persisted (getattr(..., 'name', None) in (None, 'Test Label')), which can let real regressions slip through. Since resource_actions.create() sets name, the test should assert the expected persisted value (or assert the specific transitional behavior you actually intend), rather than accepting both outcomes.
| # DB ResourceModel may store `.name`; tolerate either while refactor settles. | |
| assert getattr(db_resource, "name", None) in (None, "Test Label") | |
| # DB ResourceModel should persist the resource name set by create(). | |
| assert db_resource.name == "Test Label" |
|
|
||
|
|
There was a problem hiding this comment.
EmptySourcePayload is documented as “no sources”, but it inherits Pydantic’s default extra='ignore' behavior. That means callers can pass arbitrary keys (e.g., source data) and it will be silently dropped, which is surprising for an “empty payload”. Consider setting model_config = ConfigDict(extra='forbid') on EmptySourcePayload (or on SourcePayload) to enforce the invariant.
| # Enforce that this "empty" payload cannot receive unexpected fields. | |
| model_config: ClassVar[ConfigDict] = ConfigDict(from_attributes=True, extra="forbid") |
| from __future__ import annotations | ||
|
|
||
| from pydantic import BaseModel, ConfigDict, Field | ||
|
|
||
|
|
There was a problem hiding this comment.
stitch.api.resources is introduced without an __init__.py, which makes it a namespace package. The rest of this repo’s Python package structure (e.g., stitch.api.db, stitch.api.routers) uses explicit packages with __init__.py, so this inconsistency can cause tooling/import edge cases. Consider adding deployments/api/src/stitch/api/resources/__init__.py for consistency.
mbarlow12
left a comment
There was a problem hiding this comment.
I'm not sure I entirely understand this PR. It looks like there's a lot of functioning code that's been removed, which isn't necessarily a requirement for integrating stitch-ogsi and/or stitch-models (though I'm pretty sure we won't need a direct dependency on stitch-models).
The basic shapes/pydantic models are all almost identical and can be swapped in with minimal additional changes.
I have some other misc comments, but the big one is really about removing working functionality.
| source: Mapped[SourceKey] = mapped_column( | ||
| String(10), nullable=False | ||
| ) # "gem" | "wm" | ||
| source: Mapped[str] = mapped_column(String(10), nullable=False) # "gem" | "wm" |
There was a problem hiding this comment.
This does loosen the restrictions somewhat on the valid values, allowing any string under 10 characters.
If an invalid value was somehow saved to the db, we'd get an error when trying to call model_validate when fetching it from the db.
| created_by: UserEntity, | ||
| resource: "ResourceModel", | ||
| source: SourceKey, | ||
| source: str, |
There was a problem hiding this comment.
This was a place that helps getting editor support and early errors. With the SrcKey (now OGSISrcKey from stitch-ogsi), you'll get an error diagnostic if you try to call Resource.create(source="some_source", ...).
| class EmptySourcePayload(SourcePayload): | ||
| """Domain-agnostic source payload container (no sources).""" | ||
|
|
||
|
|
||
| # A concrete, domain-agnostic Resource you can use everywhere. | ||
| # Domains can replace `EmptySourcePayload` with their own payload type. | ||
| type BaseResource[TResId: IdType] = Resource[TResId, EmptySourcePayload] |
There was a problem hiding this comment.
This seems redundant as it's already captured within the Resource class. This also means that inheritors of BaseResource must subclass EmptySourcePayload specifically. I think it muddies the package API about whether/when to use SourcePayload or EmtpySourcePayload.
| class CreateResource(BaseModel): | ||
| """Domain-agnostic create model.""" | ||
|
|
||
| model_config = ConfigDict(extra="forbid") | ||
| name: str | None = None | ||
|
|
||
|
|
||
| class Resource(BaseModel): | ||
| """Domain-agnostic read model.""" | ||
|
|
||
| id: int | ||
| name: str | None = None | ||
| repointed_to: int | None = None | ||
| constituents: frozenset[int] = Field(default_factory=frozenset) |
There was a problem hiding this comment.
Are these meant to be temporary placeholders?
| model_config = ConfigDict(extra="forbid") | ||
| name: str | None = None |
There was a problem hiding this comment.
This is a really tight constraint and will reject anything that's not {"name": ...}.
When I originally designed the resource entities, I allowed for a lot of flexibility in the shape of the create request:
- resource with no source data
- resource with new source data
- resource with mix of new and existing (via ids) source data
| SourceModelData, | ||
| ) | ||
| from stitch.api.entities import IdType, User as UserEntity | ||
| from stitch.models.types import IdType |
There was a problem hiding this comment.
I think most (if not all) of the imports will actually end up coming from stitch-ogsi.
| class SourceBase(Base, Generic[TModelIn, TModelOut]): | ||
| __abstract__ = True | ||
| __entity_class_in__: type[TModelIn] | ||
| __entity_class_out__: type[TModelOut] | ||
|
|
||
| id: Mapped[int] = mapped_column( | ||
| PORTABLE_BIGINT, primary_key=True, autoincrement=True | ||
| ) | ||
|
|
||
| def __init_subclass__(cls, **kwargs) -> None: | ||
| super().__init_subclass__(**kwargs) | ||
| for base in getattr(cls, "__orig_bases__", ()): | ||
| if get_origin(base) is SourceBase: | ||
| args = get_args(base) | ||
| if len(args) >= 2: | ||
| if isinstance(args[0], type): | ||
| cls.__entity_class_in__ = args[0] | ||
| if isinstance(args[1], type): | ||
| cls.__entity_class_out__ = args[1] | ||
| break | ||
|
|
||
| def as_entity(self): | ||
| return self.__entity_class_out__.model_validate(self) | ||
|
|
||
| @classmethod | ||
| def from_entity(cls, entity: TModelIn) -> Self: | ||
| mapper = inspect(cls) | ||
| column_keys = {col.key for col in mapper.columns} | ||
| filtered = {k: v for k, v in entity.model_dump().items() if k in column_keys} | ||
| return cls(**filtered) | ||
|
|
||
|
|
There was a problem hiding this comment.
I would strongly recommend against deleting this file. All these mechanics will work the same with the new models in stitch-ogsi.
Unless we're planning to have the db schema wildly different from the model schema, this will all still function as intended when we integrate stitch-ogsi.
There was a problem hiding this comment.
In fact, this gets potentially simplified, too. With id as optional now, we don't need 2 classes to distinguish between creational & stored/managed structures.
Updates the data model used in the API deployment to be the workspace package
stitch-models, rather than a pydantic model defined in the deployment.Note that the
Userobject is still defined in the api deployment.ChatGPT used as explainer while making the migration, and wrote some code snippets.