Skip to content

[Issue]: Recommended Ontologies and Vocabularies #80

@kcranston

Description

@kcranston

Issue Title

Recommended Ontologies and Vocabularies for GA4GH

Issue Type

Product Harmonization

Problem Statement

Standardized vocabularies and ontologies help with data alignment, but the large number of existing standards can lead to much work aligning standards across projects. Currently, each GA4GH product development team independently chooses the vocabularies / ontologies to use in their product. This creates downstream interoperability challenges when trying to combine data or workflows that use multiple GA4GH products.

A list of GA4GH approved ontologies / vocabularies would ease the interoperability burden. It would also shorten development time for GA4GH products, because a lengthy process evaluating and choosing vocabularies would no longer be required.

Scope Validation

✅ Harmonization Impact: How does this aid harmonization of GA4GH products?
Enables harmonization by providing a centralized location where workstreams can look up the preferred GA4GH vocabularies for shared concepts to ensure that all products use the same term for the same concept.

✅ Barrier Reduction: What barriers to organization-wide harmonization does this address?
Easier combination of data and workflows for users of GA4GH products; simplifies development of GA4GH products by eliminating step of evaluating / choosing vocabularies.

✅ Alignment Challenges: Which specific alignment challenges does this solve?
This helps to avoid the need to map vocabulary terms across workstreams.

✅ Cross-Work Stream: Does this require cross-work stream development?
Yes, for any workstream developing products that use vocabularies in same domain, would want to harmonize across the workstreams.

Proposed Solution(s)

DaMaSc has reviewed the vocabularies and ontologies used in GA4GH products and produced a draft document that includes a table of the existing usage:

https://docs.google.com/document/d/1h1Qimi-7unzfHJ6zZBkLm8E-y0MjkQQfQFfPXDe8RN4/edit?tab=t.0

This document was informed by a survey in 2024, a "Roadshow" series of presentations at GA4GH workstreams in 2025 and a GA4GH Connect session in 2025. Notes from the roadshow presentations:

https://docs.google.com/document/d/1F7CN1SOcJ7qnvGWmjXJVxNH_1qrIUKCQi69SQV2vlsY/edit?tab=t.0

Estimated Effort Level

Unknown (needs further assessment)

Success Criteria

Publication of an approved list of GA4GH vocabularies and ontologies with guidance on whether these are recommended, preferred or required and how GA4GH product approval intersects with use of vocabularies on the list.

Key metrics:

  • For at least 3 GA4GH products using the same concept, demonstrated use of the same vocabulary terms.
  • Successful inclusion of terms from GA4GH use cases into external vocabularies

How will this issue aid GA4GH harmonization?

How does this aid harmonization of GA4GH products?
Enables harmonization by providing a centralized location where workstreams can look up the preferred GA4GH vocabularies for shared concepts to ensure that all products use the same term for the same concept.

What barriers to organization-wide harmonization does this address?
Easier combination of data and workflows for users of GA4GH products; simplifies development of GA4GH products by eliminating step of evaluating / choosing vocabularies.

Which specific alignment challenges does this solve?
This helps to avoid the need to map vocabulary terms across workstreams.

Does this require cross-work stream development?
Yes, for any workstream developing products that use vocabularies in same domain, would want to harmonize across the workstreams.

Additional context

Please provide any additional pieces of information you feel is relevant to this issue

Work Streams Raising This Issue

  • Clinical & Phenotypic Data (Clin/Pheno)
  • Cloud Work Stream
  • Data Security
  • Data Use & Researcher IDs (DURI)
  • Discovery
  • Genomic Knowledge Standards (GKS)
  • Large Scale Genomics (LSG)
  • Regulatory & Ethics (REWS)
  • Data Models & Schemas Committee (DaMaSC)
  • Genomic Implementation Forum (GIF)
  • Technical Team
  • Other (specify below)

Other Groups Raising This Issue

No response

Work Streams That Will Be Impacted

  • Clinical & Phenotypic Data (Clin/Pheno)
  • Cloud Work Stream
  • Data Security
  • Data Use & Researcher IDs (DURI)
  • Discovery
  • Genomic Knowledge Standards (GKS)
  • Large Scale Genomics (LSG)
  • Regulatory & Ethics (REWS)
  • Data Models & Schemas Committee (DaMaSC)
  • Genomic Implementation Forum (GIF)
  • Technical Team
  • Other (specify below)

Other Groups That Will Be Impacted

No response

Key Stakeholders to Consult

No response

Products affected

Please list them here. It does not need to be exhaustive

Additional Context

No response

Priority Level

None

Additional Tags

  • Documentation
  • API
  • Schema
  • Security
  • Performance
  • Interoperability
  • Compliance
  • User Experience
  • Infrastructure
  • Testing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions