Skip to content

Question: Where are ΔCov / ΔCon / RCR / RLC / CQA implemented in the repo? #5

@CHERRY-ui8

Description

@CHERRY-ui8

Hi — thanks for releasing KARMA!

I'm reproducing the experiments described in the paper and I'm looking for the implementation of the following evaluation metrics mentioned in the manuscript:

  • Coverage gain (ΔCov)
  • Connectivity gain (ΔCon)
  • Conflict rate (RCR)
  • LLM-based correctness (RLC, using a hold-out verifier model)
  • QA consistency (CQA)

What I've checked so far:

  • relationship_extraction sets confidence/clarity/relevance on KnowledgeTriple (sources for MCon/MCla/MRel).
  • evaluator implements integration score = 0.5confidence + 0.25clarity + 0.25*relevance and threshold filtering.
  • conflict_resolution implements contradiction detection and a simple keep-higher-confidence resolution, but resolve_conflicts currently returns (final_triples, 0, 0, 0.0) (placeholders) — I couldn't find conflict counts or a computed conflict rate.
  • KnowledgeGraph.get_statistics() returns entity_count, triple_count, unique_relations, avg_confidence, but there's no incremental ΔCov/ΔCon computation (no before/after KG snapshot comparison or network connectivity metrics).
  • I couldn't find an implementation of a hold-out LLM verifier (RLC) or a KG-based QA consistency (CQA) evaluation in the repo.

Files I inspected:

  • karma/agents/conflict_resolution/agent.py
  • karma/agents/relationship_extraction/agent.py
  • karma/agents/evaluator/agent.py
  • karma/core/data_structures.py
  • karma/core/pipeline.py
  • main.py
  • examples/basic_usage.py

Could you please clarify:

  1. Are ΔCov / ΔCon / RCR / RLC / CQA implemented somewhere in this repository? If so, could you point to the exact files/functions or provide an example of how to run them?
  2. If not included, is there a separate evaluation scripts repository or planned location for these metrics? Alternatively, could you advise on where in the pipeline these metrics are expected to be computed (e.g., KG snapshots before/after integration for ΔCov/ΔCon; conflict counts in ConflictResolutionAgent for RCR; a VerifierAgent or Evaluator extension for RLC; a QA evaluation script for CQA)?
  3. I'm happy to contribute patches (e.g., record conflict stats in resolve_conflicts & expose via agent.get_metrics or IntermediateOutput.metrics). Are contributions welcome for these additions, and is there a preferred output schema?

Thanks in advance — I can provide my local search logs if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions