Skip to content

Directory overview generation can build oversized prompts and destabilize VLM calls #529

@dddgogogo

Description

@dddgogogo

Summary

When OpenViking generates directory .overview.md content for large folders, the overview prompt can grow without a clear input budget. In practice this can produce extremely large prompts, trigger VLM timeouts, and even leave the upstream VLM service in a degraded or semi-broken state for subsequent requests.

This is not just a performance issue. It can break the memory / semantic generation pipeline for otherwise valid data.

Environment

  • Repository: OpenViking
  • Branch base used for testing: local main rebased from upstream main
  • Deployment: local source checkout + systemd service
  • Storage backend: local AGFS + local vectordb
  • VLM provider: OpenAI-compatible endpoint behind a local router
  • Model route: ov-llm
  • Relevant config during mitigation:
    • auto_generate_l1=false (workaround currently in use)
    • vlm.max_concurrent=1

Actual behavior

For large directories, _generate_overview() assembles the overview prompt by concatenating:

  • all file summaries in the directory
  • all child directory abstracts

without a strong budget guard.

That can create very large prompts. In our case, the prompt reached approximately 117954 tokens before the VLM call. The result was:

  • VLM timeout / failure on overview generation
  • the ov-llm path becoming unstable for follow-up requests
  • semantic generation and related memory workflows becoming unreliable until the service recovered

Expected behavior

Directory overview generation should remain bounded and fail-safe:

  • large folders should not generate unbounded prompts
  • one oversized folder should not destabilize later VLM calls
  • OpenViking should degrade gracefully when overview input exceeds a safe threshold

Reproduction

A general reproduction path is:

  1. Prepare a directory with many files and/or many child directories.
  2. Ensure semantic processing attempts to generate L1 overview content for that directory.
  3. Use file summaries / child abstracts large enough that the assembled prompt becomes very large.
  4. Trigger semantic generation.
  5. Observe timeout / failure in the VLM call and degraded behavior afterward.

Suspected root cause

The issue appears to come from the overview assembly logic in:

  • openviking/storage/queuefs/semantic_processor.py
  • method: _generate_overview()

The current logic concatenates the full list of file summaries and child directory abstracts into one prompt. For sufficiently large directories, this creates an oversized request.

Why this matters

This is especially problematic because overview generation is an internal maintenance / semantic task. Users may hit it indirectly while doing normal memory or resource workflows, but the failure mode is severe enough to affect unrelated follow-up requests.

Current workaround

We mitigated the problem locally by disabling automatic directory overview generation:

  • auto_generate_l1=false

This avoids the problematic path, but it is a workaround rather than a product-level fix.

Suggested fixes

I think OpenViking should consider one or more of the following:

  1. Input budget guard before VLM call

    • estimate prompt size before calling the model
    • if over budget, truncate or summarize inputs first
  2. Hierarchical / staged overview generation

    • summarize file summaries in batches
    • summarize child abstracts in batches
    • then combine the batch summaries into the final overview
  3. Configurable hard limits

    • max number of file summaries included
    • max number of child abstracts included
    • max characters or tokens per overview input
  4. Fail-safe degradation

    • if the budget is exceeded, skip L1 generation and keep L0 only
    • or generate a short fallback overview instead of failing the whole path
  5. Observability

    • log prompt size / estimated tokens for overview generation
    • log when budget guards trigger

Additional note

Even if a bounded assembly strategy is added, I still think the system should protect the VLM path from a single oversized semantic task causing downstream instability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions