Is this how the agent's document processing and caching works?

Hi! I've been using the project and noticed an interesting behavior — when I select a folder with 22 documents and ask my first question, it takes ~18 minutes to respond. But after that, all follow-up questions are answered within ~1 minute.

I dug into the code to understand why, and I wanted to confirm if my understanding is correct:

### My Understanding

**First query is slow because the agent does a full multi-step exploration:**

1. `scan_folder` runs on all 22 documents in parallel to get previews
2. `parse_file` is called individually on each document marked as RELEVANT for a full deep-read
3. If cross-references are found between documents, more `parse_file` calls happen (backtracking)
4. Each of these steps is a separate API round-trip to Gemini, and each call sends the **entire accumulated conversation history** — so by the end, we're sending hundreds of thousands of tokens per call
5. This is what triggers the Gemini 429 rate limit (`1M input tokens/minute`), causing forced ~60s pauses between retries

**Follow-up queries are fast because of the singleton agent pattern:**

1. `FsExplorerAgent` is created once as a module-level singleton via `get_agent()` in `workflow.py`
2. `_chat_history` (a plain Python list in memory) is **never cleared** between queries
3. So all 22 documents' content from the first scan/parse is already sitting in the history
4. Gemini already "knows" the documents and can answer in just 1-2 API calls instead of 10+

**Things I noticed that could be potential concerns:**

- **No persistence** — if the server restarts, all chat history is lost and the next query does the full 18-minute scan again
- **History only grows** — every follow-up question adds more to `_chat_history`, meaning we're re-sending all previous documents + Q&A pairs every time, which increases cost and latency over time
- **Skipped documents** — if the first query caused the agent to SKIP certain documents as irrelevant, a follow-up question about those skipped documents would still require new `parse_file` calls

### Questions

1. Is my understanding above correct?
2. Is the growing `_chat_history` by design, or is there a plan to implement some form of summarization/pruning to keep token usage in check?
3. Has there been any consideration for persisting the chat history (e.g., in a database or cache) so that a server restart doesn't require a full re-scan?

Thanks for the great project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this how the agent's document processing and caching works? #5

My Understanding

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is this how the agent's document processing and caching works? #5

Description

My Understanding

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions