Bug: repeated parent-directory semantic recomputation on each new memory write

# Bug: repeated parent-directory semantic recomputation on each new memory write

## Summary

When a new memory is written under a memory directory (for example `viking://user/default/memories/entities`), OpenViking enqueues the **parent directory** for semantic processing.

In practice, this causes repeated regeneration of directory-level semantic artifacts (such as `.overview.md` / `.abstract.md`) for the entire directory, even when only **one new memory** was added.

For large memory directories, this leads to a poor cost model:

- cost scales with **directory size**, not change size
- the same “fat” directory may be reprocessed repeatedly in a short time window
- directory-level semantic maintenance becomes a major cost center in the memory write path

This behavior may be acceptable for relatively static resource/document trees, but it becomes problematic for dynamic, high-frequency memory trees.

---

## What this issue is about

This issue is **not** a generic complaint that “there are many VLM calls”.

It is about a more specific structural problem:

- writing **one** new memory can trigger **full parent-directory semantic recomputation**
- repeated writes to the same parent can trigger repeated full-directory recomputation
- the cost is tied to **total accumulated directory size**, not the size of the current change

---

## What this issue is **not** about

To avoid conflating separate problems, this issue is **not** about the previously identified recursive semantic propagation problem.

We already observed and locally mitigated a separate issue where memory-context semantic generation could recursively expand work. That mitigation was to force `recursive=False` for memory-context semantic messages.

This issue remains **even after recursive propagation is cut off**.

So the problem here is different:

> even without recursive expansion, the current parent-directory recomputation strategy is still too expensive for long-term memory trees.

---

## Observed behavior

After adding visibility to VLM call paths and inspecting logs, we observed that a large portion of VLM activity was coming from directory-level semantic processing, especially:

- `storage/queuefs/semantic_processor.py:_generate_text_summary`
- `storage/queuefs/semantic_processor.py:_generate_overview`

rather than only from the normal extraction / dedup path.

In our environment, hotspots were concentrated in memory directories such as:

- `viking://user/default/memories/entities`
- `viking://agent/.../memories/patterns`

We also observed repeated batch waves where the same or similar parent directories were processed multiple times in a short period.

This strongly suggests that the main cost is not single-memory extraction itself, but **parent-directory semantic maintenance**.

---

## Root cause analysis

From code inspection, the relevant path appears to be:

1. a memory is created or updated
2. `memory_extractor._enqueue_semantic_for_parent(file_uri, ctx)` is called
3. the parent directory URI is pushed into the semantic queue
4. `semantic_processor` processes the parent directory
5. directory-level semantic artifacts are regenerated for the whole directory

Relevant code areas include:

- `openviking/session/memory_extractor.py`
  - `_enqueue_semantic_for_parent(...)`
- `openviking/storage/queuefs/semantic_processor.py`
- queue implementation layers such as:
  - `named_queue.py`
  - `semantic_queue.py`

In our investigation, the queue layer appeared to have **no effective deduplication / coalescing** for repeated jobs targeting the same parent directory URI.

As a result, repeated memory writes can repeatedly enqueue the same large directory.

---

## Why this is a problem

For memory trees, the current strategy means:

- change size can be **1 new memory**
- recomputation size can be **the entire parent directory**

That is a poor fit for long-term memory systems, because memory directories are:

- continuously growing
- frequently written
- cost-sensitive
- mostly updated incrementally, not rebuilt wholesale

In other words, a mechanism intended as an auxiliary semantic organization layer can become the dominant cost center.

---

## Expected behavior

At minimum, for memory directories:

- repeated semantic work for the same parent directory should be **deduplicated or coalesced**
- one memory write should **not** eagerly trigger repeated full-directory semantic recomputation

A more appropriate model for memory trees would be one of the following:

1. **dedup / debounce / coalescing** for the same `parent_uri`
2. **eventual consistency** for directory-level semantic artifacts, instead of strong write-time synchronization
3. longer term, **incremental directory semantic maintenance** instead of full rescans on every new memory

---

## Actual behavior

Current behavior appears to be:

- each memory write can enqueue parent semantic regeneration
- the same parent may be re-enqueued multiple times
- semantic processing then performs expensive directory-wide VLM work
- large memory directories become hot spots

---

## Minimal reproduction idea

A simplified reproduction path could be:

1. prepare a memory directory with many existing memory files (for example 50–100 files)
2. add one new memory file to that directory
3. observe that parent-directory semantic processing is triggered
4. add several more memory files in a short interval
5. observe repeated parent-directory semantic recomputation and elevated VLM usage

Expected:
- same-parent semantic work should be deduplicated / coalesced at least within a short time window

Actual:
- repeated writes can trigger repeated full-directory recomputation

---

## Suggested discussion points

The main questions we hope maintainers can clarify are:

1. Is it the intended design that memory writes eagerly trigger **parent-directory full semantic recomputation**?
2. If yes, is that design considered suitable only for relatively small or static directories, rather than long-lived memory trees?
3. Would maintainers accept at least a short-term fix such as **same-parent dedup / debounce / coalescing**?
4. Longer term, should memory-directory semantic maintenance move toward **eventual consistency** or **incremental aggregation** rather than full-directory rescans?

---

## Local mitigation used in our environment

As a temporary stopgap in our own environment, we added a small guard in `_enqueue_semantic_for_parent()`:

- deduplicate by `parent_uri`
- skip repeated enqueue requests for the same parent within a short time window

This is only a local mitigation to reduce immediate cost. It is **not** presented as the final architectural answer.

The core issue we want to surface here is the current **cost model mismatch** between parent-directory full recomputation and long-term memory workloads.

---

## Impact

This issue can lead to:

- unexpectedly high VLM usage
- token consumption concentrated in large memory directories
- degraded efficiency of the memory write path
- directory-level semantic maintenance overshadowing the actual memory extraction workload

---

## One-sentence problem statement

OpenViking currently treats memory directories too much like static content directories for semantic maintenance, so adding one new memory can repeatedly trigger expensive full parent-directory recomputation; this cost model does not scale well for long-lived memory trees.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: repeated parent-directory semantic recomputation on each new memory write #769

Bug: repeated parent-directory semantic recomputation on each new memory write

Summary

What this issue is about

What this issue is not about

Observed behavior

Root cause analysis

Why this is a problem

Expected behavior

Actual behavior

Minimal reproduction idea

Suggested discussion points

Local mitigation used in our environment

Impact

One-sentence problem statement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: repeated parent-directory semantic recomputation on each new memory write #769

Description

Bug: repeated parent-directory semantic recomputation on each new memory write

Summary

What this issue is about

What this issue is not about

Observed behavior

Root cause analysis

Why this is a problem

Expected behavior

Actual behavior

Minimal reproduction idea

Suggested discussion points

Local mitigation used in our environment

Impact

One-sentence problem statement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions