Skip to content

Add Cross-Property Completeness Correlation Analysis #23

@marvinm2

Description

@marvinm2

Description

Analyze whether AOPs/KEs that have certain properties (e.g., OECD status, has_evidence) are more likely to have other properties completed. This reveals curation patterns, property interdependencies, and can identify "completeness clusters" - well-curated entities that tend to have many properties vs. sparse ones.

Visualization Types

  • Latest snapshot: Correlation matrix and scatter plots

Value & Priority

  • Priority: Phase 2 - Medium Priority
  • Value: High (advanced quality insights)
  • Complexity: Medium

Implementation Details

Key Data Requirements

  • Calculate completeness scores across property categories (Essential, Metadata, Content, Assessment, Context)
  • Compute correlation between:
    • Property category completeness scores
    • OECD status vs. overall completeness
    • Evidence quality vs. metadata completeness
    • Essential properties vs. optional properties
  • Group by entity type (AOP/KE/KER) for comparison

Visualization Format

  • Correlation heatmap: Matrix showing correlations between property categories
  • Scatter plot matrix: Pairwise comparisons of completeness dimensions
  • Grouped analysis: Separate views by OECD status to show maturity patterns
  • Box plots: Distribution of completeness scores grouped by key properties

Expected Insights

  • Discover which properties tend to be filled together (curation patterns)
  • Identify whether OECD-endorsed AOPs are better documented
  • Understand if evidence-rich KERs also have better metadata
  • Guide curation priorities based on property interdependencies
  • Detect "completeness clusters" for quality tiers

Performance Notes

  • Medium complexity - requires multiple property queries and correlation calculations
  • May need to compute completeness scores in Python rather than SPARQL
  • Consider caching property data for correlation analysis
  • Potentially expensive if done across all historical versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions