-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Introduce a per-primitive epistemic confidence score that quantifies how well-grounded each individual concept is, based on its depth coverage, relata density, and provenance quality.
Problem Statement
The epistemic gate currently produces a binary grounded/not-grounded result at the query level. This loses per-concept nuance. Two primitives can both be "grounded at D3" but differ significantly in quality: File may have rich relata with authored provenance across all depths, while Permission has sparse relata and a single learned depth. The binary result treats them identically.
A per-primitive confidence score would allow the agent to identify its weakest epistemic links, enable policies to gate on individual concept strength, and give the user visibility into which parts of a grounding trace are solid vs. thin.
Proposed Solution
Add a confidence score computed per Primitive during grounding, considering:
- Depth coverage — ratio of populated depths to required depth (e.g. D0–D3 all present = 1.0, D2 missing = 0.75)
- Relata density — ratio of actual relata to expected/typical relata at each depth
- Provenance weighting —
authoredprovenance scores higher thanlearned, which scores higher thanconversational(requires Ensure Provenance Is a First-Class Attribute on Primitives and Relata #6) - Epistemic confidence decay — relata with stale timestamps contribute less confidence (requires timestamp metadata)
The score would attach to each primitive in the epistemic trace, and the GroundingResult could optionally surface a minimum or aggregate as a convenience.
VRE Design Alignment
- Does not change the grounding contract: Grounding remains binary (grounded or not). Confidence is an advisory layer — it informs policy decisions and user-facing communication but does not override grounding.
- Per-primitive, not per-query: This aligns with VRE's philosophy that knowledge is represented at the concept level. Each node knows how strong its own grounding is.
- No new node or relation types: Confidence is computed from existing graph structure (depths, relata, metadata).
- Preserves epistemic honesty: The metric makes the agent more honest by surfacing which concepts are well-understood vs. thinly grounded.
Acceptance Criteria
- Per-primitive confidence score (0.0–1.0) computed during grounding
- Score factors in depth coverage and relata density (provenance and decay are stretch goals pending Ensure Provenance Is a First-Class Attribute on Primitives and Relata #6)
- Score is accessible on each primitive in the epistemic trace
-
GroundingResult.__str__includes per-primitive confidence when present -
PolicyGatecan optionally reference individual primitive confidence in policy evaluation - Unit tests for confidence calculation across primitives with varying depth/relata structures
Open Questions
- Should confidence be stored on the graph node itself (persisted) or computed at query time (ephemeral)?
- What is the right weighting between depth coverage, relata density, and provenance? Should this be configurable?
- Should confidence decay be time-based (wall clock) or event-based (queries since last validation)?
- Should
GroundingResultexpose a min/mean/weighted aggregate, or leave aggregation to consumers?
Dependencies
- Ensure Provenance Is a First-Class Attribute on Primitives and Relata #6 (Provenance as first-class attribute) — required for provenance weighting factor
- Timestamp metadata on relata — required for confidence decay
Metadata
Metadata
Assignees
Labels
Projects
Status