n-API can return dataset level information

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Summary

As the query tool (or an interested API user), I want to see dataset level information to display to my user. Especially on 

- level of current access of data
- Authors
- contact info
- etc

### TODO
- [x] Read in the dataset metadata JSON created in https://github.com/neurobagel/recipes/issues/163 on API startup
  - [x] Introduce env var for the datasets metadata JSON path (can be configured for dev, value to be hardcoded in the docker-compose.yml as part of https://github.com/neurobagel/recipes/pull/164)
    - [x] error out if file not found
  - [x] Add startup event to load datasets metadata JSON into memory
- [x] Expand model for only a datasets-level response from `POST /datasets` to:
  - [x] add new dataset attributes
  - [x] remove `dataset_portal_uri`
https://github.com/neurobagel/api/blob/94c41f0dfc06802a199ae9a1855eef96d0c4a709/app/api/models.py#L120-L131
- [x] separate out response model for legacy `GET /query` endpoint so it is not tied to other response models
- [x] Update SPARQL query to no longer return `?dataset_name` or `?dataset_portal_uri` 
- [x] Remove all dataset attributes from `/subjects` endpoint other than `dataset_uuid`
- [x] Update `/datasets` endpoint to append dataset attributes from the dataset metadata JSON to the response object, matching datasets based on the UUID
- [x] Update test data
- ~[ ] Return node mode~ --> info already available via `records_protected` field in `/datasets` response

### Considerations
- SPARQL query for legacy `/query` endpoint still looks for `nb:hasPortalURI` which will be deprecated in future JSONLDs, so the `dataset_portal_uri` field of the response may be always empty
- Separate (unoptimized) SPARQL queries for `POST /subjects` vs. `GET /subjects`?

Option 1: rename `hasPortalURI` edge in SPARQL query but update response models/shaping so that `GET /subjects` no longer returns dataset-level metadata
- Pro:
  - Minimal changes to query template
  - Full SPARQL query template is now up to date with latest JSONLD model
- Cons:
  - Less optimized SPARQL query since we're returning more info than we need
  - Confusing to maintain since we would be using different internal and exposed field names, etc.
  - Maintains coupling between `/subjects` and `/query` via the SPARQL query

Option 2: deprecate `/query` endpoint entirely
- Pros:
  - We no longer need to maintain endpoint
  - Also simple, just need to update one SPARQL query
- Cons:
  - Bad for any tools consuming the API directly or dedicated, outdated query tool + f-API still using the `/query` endpoint
  - Need to communicate to nodes (maybe: BrainLife? ReproLake? MicroGigs?)

Option 3: Use slightly different SPARQL queries for each. Can either use two functions or one function with conditionals.
- Pros:
  - We can fully split logic for endpoints - no more interdependency, no rush to retire query endpoint
- Cons:
  - Potential duplication or branching logic = more maintenance
  - Visual clutter

**Decision:** We will implement Option 1 in this PR as a temporary fix, and deprecate the `/query` tool endpoint soon in https://github.com/neurobagel/api/issues/520.

### Documentation update(s) needed

_No response_

	class DatasetQueryResponse(BaseModel):
	"""Data model for metadata of datasets matching a query."""

	dataset_uuid: str
	# dataset_file_path: str # TODO: Revisit this field once we have datasets without imaging info/sessions.
	dataset_name: str
	dataset_portal_uri: Optional[str]
	dataset_total_subjects: int
	records_protected: bool
	num_matching_subjects: int
	image_modals: list
	available_pipelines: dict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

n-API can return dataset level information #509

Is there an existing issue for this?

Summary

TODO

Considerations

Documentation update(s) needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

n-API can return dataset level information #509

Description

Is there an existing issue for this?

Summary

TODO

Considerations

Documentation update(s) needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions