Skip to content

n-API can return dataset level information #509

@surchs

Description

@surchs

Is there an existing issue for this?

  • I have searched the existing issues

Summary

As the query tool (or an interested API user), I want to see dataset level information to display to my user. Especially on

  • level of current access of data
  • Authors
  • contact info
  • etc

TODO

  • Read in the dataset metadata JSON created in Add entrypoint script to process dataset metadata in JSONLD files recipes#163 on API startup
  • Expand model for only a datasets-level response from POST /datasets to:
    • add new dataset attributes
    • remove dataset_portal_uri

      api/app/api/models.py

      Lines 120 to 131 in 94c41f0

      class DatasetQueryResponse(BaseModel):
      """Data model for metadata of datasets matching a query."""
      dataset_uuid: str
      # dataset_file_path: str # TODO: Revisit this field once we have datasets without imaging info/sessions.
      dataset_name: str
      dataset_portal_uri: Optional[str]
      dataset_total_subjects: int
      records_protected: bool
      num_matching_subjects: int
      image_modals: list
      available_pipelines: dict
  • separate out response model for legacy GET /query endpoint so it is not tied to other response models
  • Update SPARQL query to no longer return ?dataset_name or ?dataset_portal_uri
  • Remove all dataset attributes from /subjects endpoint other than dataset_uuid
  • Update /datasets endpoint to append dataset attributes from the dataset metadata JSON to the response object, matching datasets based on the UUID
  • Update test data
  • [ ] Return node mode --> info already available via records_protected field in /datasets response

Considerations

  • SPARQL query for legacy /query endpoint still looks for nb:hasPortalURI which will be deprecated in future JSONLDs, so the dataset_portal_uri field of the response may be always empty
  • Separate (unoptimized) SPARQL queries for POST /subjects vs. GET /subjects?

Option 1: rename hasPortalURI edge in SPARQL query but update response models/shaping so that GET /subjects no longer returns dataset-level metadata

  • Pro:
    • Minimal changes to query template
    • Full SPARQL query template is now up to date with latest JSONLD model
  • Cons:
    • Less optimized SPARQL query since we're returning more info than we need
    • Confusing to maintain since we would be using different internal and exposed field names, etc.
    • Maintains coupling between /subjects and /query via the SPARQL query

Option 2: deprecate /query endpoint entirely

  • Pros:
    • We no longer need to maintain endpoint
    • Also simple, just need to update one SPARQL query
  • Cons:
    • Bad for any tools consuming the API directly or dedicated, outdated query tool + f-API still using the /query endpoint
    • Need to communicate to nodes (maybe: BrainLife? ReproLake? MicroGigs?)

Option 3: Use slightly different SPARQL queries for each. Can either use two functions or one function with conditionals.

  • Pros:
    • We can fully split logic for endpoints - no more interdependency, no rush to retire query endpoint
  • Cons:
    • Potential duplication or branching logic = more maintenance
    • Visual clutter

Decision: We will implement Option 1 in this PR as a temporary fix, and deprecate the /query tool endpoint soon in #520.

Documentation update(s) needed

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Implement - Track

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions