Skip to content

tech-debt: _write_per_file_extras stats counters mix DB-wide counts and len() — diverge in incremental mode #220

@egouilliard-leyton

Description

@egouilliard-leyton

Context

Discovered during review of #97 (skip untouched files in per-file extras).

Description

In loader.py:_write_per_file_extras, edge stats are counted two different ways:

  • READS_ATOM / WRITES_ATOM (lines 835, 844): use session.run("MATCH ()-[r:X]->() RETURN count(r)") — counts all edges of that type in the entire DB.
  • READS_ENV / HANDLES_EVENT / EMITS_EVENT (lines 853, 861, 869): use len(local_list) — counts only edges from the current batch.

In full-index mode (DB wiped first), these produce the same result. In incremental mode (touched_files set), they diverge: DB-wide counts include pre-existing edges from prior runs, while len() only counts edges from touched files.

Neither approach is wrong per se, but the inconsistency makes the stats object unreliable when comparing edge types in incremental mode.

Suggested approach

Pick one convention and apply it uniformly:

  • Option A: Use len() everywhere (fast, counts only what was written this run).
  • Option B: Use DB-wide count(r) everywhere (accurate total, but requires extra queries).

Option A is simpler and consistent with what callers likely expect from an incremental run ("how many edges did this run produce?").

Metadata

Metadata

Assignees

No one assigned

    Labels

    tech-debtTechnical debt to address

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions