-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Is there an existing issue for this?
- I have searched the existing issues
Summary
As the query tool (or an interested API user), I want to see dataset level information to display to my user. Especially on
- level of current access of data
- Authors
- contact info
- etc
TODO
- Read in the dataset metadata JSON created in Add entrypoint script to process dataset metadata in JSONLD files recipes#163 on API startup
- Introduce env var for the datasets metadata JSON path (can be configured for dev, value to be hardcoded in the docker-compose.yml as part of [ENH] Add service to validate and extract dataset info from JSONLDs recipes#164)
- error out if file not found
- Add startup event to load datasets metadata JSON into memory
- Introduce env var for the datasets metadata JSON path (can be configured for dev, value to be hardcoded in the docker-compose.yml as part of [ENH] Add service to validate and extract dataset info from JSONLDs recipes#164)
- Expand model for only a datasets-level response from
POST /datasetsto:- add new dataset attributes
- remove
dataset_portal_uri
Lines 120 to 131 in 94c41f0
class DatasetQueryResponse(BaseModel): """Data model for metadata of datasets matching a query.""" dataset_uuid: str # dataset_file_path: str # TODO: Revisit this field once we have datasets without imaging info/sessions. dataset_name: str dataset_portal_uri: Optional[str] dataset_total_subjects: int records_protected: bool num_matching_subjects: int image_modals: list available_pipelines: dict
- separate out response model for legacy
GET /queryendpoint so it is not tied to other response models - Update SPARQL query to no longer return
?dataset_nameor?dataset_portal_uri - Remove all dataset attributes from
/subjectsendpoint other thandataset_uuid - Update
/datasetsendpoint to append dataset attributes from the dataset metadata JSON to the response object, matching datasets based on the UUID - Update test data
[ ] Return node mode--> info already available viarecords_protectedfield in/datasetsresponse
Considerations
- SPARQL query for legacy
/queryendpoint still looks fornb:hasPortalURIwhich will be deprecated in future JSONLDs, so thedataset_portal_urifield of the response may be always empty - Separate (unoptimized) SPARQL queries for
POST /subjectsvs.GET /subjects?
Option 1: rename hasPortalURI edge in SPARQL query but update response models/shaping so that GET /subjects no longer returns dataset-level metadata
- Pro:
- Minimal changes to query template
- Full SPARQL query template is now up to date with latest JSONLD model
- Cons:
- Less optimized SPARQL query since we're returning more info than we need
- Confusing to maintain since we would be using different internal and exposed field names, etc.
- Maintains coupling between
/subjectsand/queryvia the SPARQL query
Option 2: deprecate /query endpoint entirely
- Pros:
- We no longer need to maintain endpoint
- Also simple, just need to update one SPARQL query
- Cons:
- Bad for any tools consuming the API directly or dedicated, outdated query tool + f-API still using the
/queryendpoint - Need to communicate to nodes (maybe: BrainLife? ReproLake? MicroGigs?)
- Bad for any tools consuming the API directly or dedicated, outdated query tool + f-API still using the
Option 3: Use slightly different SPARQL queries for each. Can either use two functions or one function with conditionals.
- Pros:
- We can fully split logic for endpoints - no more interdependency, no rush to retire query endpoint
- Cons:
- Potential duplication or branching logic = more maintenance
- Visual clutter
Decision: We will implement Option 1 in this PR as a temporary fix, and deprecate the /query tool endpoint soon in #520.
Documentation update(s) needed
No response
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Implement - Track