Skip to content

Implement Per-Primitive Confidence-Based Epistemic Metric #2

@anormang1992

Description

@anormang1992

Summary

Introduce a per-primitive epistemic confidence score that quantifies how well-grounded each individual concept is, based on its depth coverage, relata density, and provenance quality.

Problem Statement

The epistemic gate currently produces a binary grounded/not-grounded result at the query level. This loses per-concept nuance. Two primitives can both be "grounded at D3" but differ significantly in quality: File may have rich relata with authored provenance across all depths, while Permission has sparse relata and a single learned depth. The binary result treats them identically.

A per-primitive confidence score would allow the agent to identify its weakest epistemic links, enable policies to gate on individual concept strength, and give the user visibility into which parts of a grounding trace are solid vs. thin.

Proposed Solution

Add a confidence score computed per Primitive during grounding, considering:

  1. Depth coverage — ratio of populated depths to required depth (e.g. D0–D3 all present = 1.0, D2 missing = 0.75)
  2. Relata density — ratio of actual relata to expected/typical relata at each depth
  3. Provenance weightingauthored provenance scores higher than learned, which scores higher than conversational (requires Ensure Provenance Is a First-Class Attribute on Primitives and Relata #6)
  4. Epistemic confidence decay — relata with stale timestamps contribute less confidence (requires timestamp metadata)

The score would attach to each primitive in the epistemic trace, and the GroundingResult could optionally surface a minimum or aggregate as a convenience.

VRE Design Alignment

  • Does not change the grounding contract: Grounding remains binary (grounded or not). Confidence is an advisory layer — it informs policy decisions and user-facing communication but does not override grounding.
  • Per-primitive, not per-query: This aligns with VRE's philosophy that knowledge is represented at the concept level. Each node knows how strong its own grounding is.
  • No new node or relation types: Confidence is computed from existing graph structure (depths, relata, metadata).
  • Preserves epistemic honesty: The metric makes the agent more honest by surfacing which concepts are well-understood vs. thinly grounded.

Acceptance Criteria

  • Per-primitive confidence score (0.0–1.0) computed during grounding
  • Score factors in depth coverage and relata density (provenance and decay are stretch goals pending Ensure Provenance Is a First-Class Attribute on Primitives and Relata #6)
  • Score is accessible on each primitive in the epistemic trace
  • GroundingResult.__str__ includes per-primitive confidence when present
  • PolicyGate can optionally reference individual primitive confidence in policy evaluation
  • Unit tests for confidence calculation across primitives with varying depth/relata structures

Open Questions

  • Should confidence be stored on the graph node itself (persisted) or computed at query time (ephemeral)?
  • What is the right weighting between depth coverage, relata density, and provenance? Should this be configurable?
  • Should confidence decay be time-based (wall clock) or event-based (queries since last validation)?
  • Should GroundingResult expose a min/mean/weighted aggregate, or leave aggregation to consumers?

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions