-
Notifications
You must be signed in to change notification settings - Fork 9
[Issue]: Recommended Ontologies and Vocabularies #80
Description
Issue Title
Recommended Ontologies and Vocabularies for GA4GH
Issue Type
Product Harmonization
Problem Statement
Standardized vocabularies and ontologies help with data alignment, but the large number of existing standards can lead to much work aligning standards across projects. Currently, each GA4GH product development team independently chooses the vocabularies / ontologies to use in their product. This creates downstream interoperability challenges when trying to combine data or workflows that use multiple GA4GH products.
A list of GA4GH approved ontologies / vocabularies would ease the interoperability burden. It would also shorten development time for GA4GH products, because a lengthy process evaluating and choosing vocabularies would no longer be required.
Scope Validation
✅ Harmonization Impact: How does this aid harmonization of GA4GH products?
Enables harmonization by providing a centralized location where workstreams can look up the preferred GA4GH vocabularies for shared concepts to ensure that all products use the same term for the same concept.
✅ Barrier Reduction: What barriers to organization-wide harmonization does this address?
Easier combination of data and workflows for users of GA4GH products; simplifies development of GA4GH products by eliminating step of evaluating / choosing vocabularies.
✅ Alignment Challenges: Which specific alignment challenges does this solve?
This helps to avoid the need to map vocabulary terms across workstreams.
✅ Cross-Work Stream: Does this require cross-work stream development?
Yes, for any workstream developing products that use vocabularies in same domain, would want to harmonize across the workstreams.
Proposed Solution(s)
DaMaSc has reviewed the vocabularies and ontologies used in GA4GH products and produced a draft document that includes a table of the existing usage:
https://docs.google.com/document/d/1h1Qimi-7unzfHJ6zZBkLm8E-y0MjkQQfQFfPXDe8RN4/edit?tab=t.0
This document was informed by a survey in 2024, a "Roadshow" series of presentations at GA4GH workstreams in 2025 and a GA4GH Connect session in 2025. Notes from the roadshow presentations:
https://docs.google.com/document/d/1F7CN1SOcJ7qnvGWmjXJVxNH_1qrIUKCQi69SQV2vlsY/edit?tab=t.0
Estimated Effort Level
Unknown (needs further assessment)
Success Criteria
Publication of an approved list of GA4GH vocabularies and ontologies with guidance on whether these are recommended, preferred or required and how GA4GH product approval intersects with use of vocabularies on the list.
Key metrics:
- For at least 3 GA4GH products using the same concept, demonstrated use of the same vocabulary terms.
- Successful inclusion of terms from GA4GH use cases into external vocabularies
How will this issue aid GA4GH harmonization?
How does this aid harmonization of GA4GH products?
Enables harmonization by providing a centralized location where workstreams can look up the preferred GA4GH vocabularies for shared concepts to ensure that all products use the same term for the same concept.
What barriers to organization-wide harmonization does this address?
Easier combination of data and workflows for users of GA4GH products; simplifies development of GA4GH products by eliminating step of evaluating / choosing vocabularies.
Which specific alignment challenges does this solve?
This helps to avoid the need to map vocabulary terms across workstreams.
Does this require cross-work stream development?
Yes, for any workstream developing products that use vocabularies in same domain, would want to harmonize across the workstreams.
Additional context
Please provide any additional pieces of information you feel is relevant to this issue
Work Streams Raising This Issue
- Clinical & Phenotypic Data (Clin/Pheno)
- Cloud Work Stream
- Data Security
- Data Use & Researcher IDs (DURI)
- Discovery
- Genomic Knowledge Standards (GKS)
- Large Scale Genomics (LSG)
- Regulatory & Ethics (REWS)
- Data Models & Schemas Committee (DaMaSC)
- Genomic Implementation Forum (GIF)
- Technical Team
- Other (specify below)
Other Groups Raising This Issue
No response
Work Streams That Will Be Impacted
- Clinical & Phenotypic Data (Clin/Pheno)
- Cloud Work Stream
- Data Security
- Data Use & Researcher IDs (DURI)
- Discovery
- Genomic Knowledge Standards (GKS)
- Large Scale Genomics (LSG)
- Regulatory & Ethics (REWS)
- Data Models & Schemas Committee (DaMaSC)
- Genomic Implementation Forum (GIF)
- Technical Team
- Other (specify below)
Other Groups That Will Be Impacted
No response
Key Stakeholders to Consult
No response
Products affected
Please list them here. It does not need to be exhaustive
Additional Context
No response
Priority Level
None
Additional Tags
- Documentation
- API
- Schema
- Security
- Performance
- Interoperability
- Compliance
- User Experience
- Infrastructure
- Testing