-
Notifications
You must be signed in to change notification settings - Fork 104
Open
Description
Hi! I've been using the project and noticed an interesting behavior — when I select a folder with 22 documents and ask my first question, it takes ~18 minutes to respond. But after that, all follow-up questions are answered within ~1 minute.
I dug into the code to understand why, and I wanted to confirm if my understanding is correct:
My Understanding
First query is slow because the agent does a full multi-step exploration:
scan_folderruns on all 22 documents in parallel to get previewsparse_fileis called individually on each document marked as RELEVANT for a full deep-read- If cross-references are found between documents, more
parse_filecalls happen (backtracking) - Each of these steps is a separate API round-trip to Gemini, and each call sends the entire accumulated conversation history — so by the end, we're sending hundreds of thousands of tokens per call
- This is what triggers the Gemini 429 rate limit (
1M input tokens/minute), causing forced ~60s pauses between retries
Follow-up queries are fast because of the singleton agent pattern:
FsExplorerAgentis created once as a module-level singleton viaget_agent()inworkflow.py_chat_history(a plain Python list in memory) is never cleared between queries- So all 22 documents' content from the first scan/parse is already sitting in the history
- Gemini already "knows" the documents and can answer in just 1-2 API calls instead of 10+
Things I noticed that could be potential concerns:
- No persistence — if the server restarts, all chat history is lost and the next query does the full 18-minute scan again
- History only grows — every follow-up question adds more to
_chat_history, meaning we're re-sending all previous documents + Q&A pairs every time, which increases cost and latency over time - Skipped documents — if the first query caused the agent to SKIP certain documents as irrelevant, a follow-up question about those skipped documents would still require new
parse_filecalls
Questions
- Is my understanding above correct?
- Is the growing
_chat_historyby design, or is there a plan to implement some form of summarization/pruning to keep token usage in check? - Has there been any consideration for persisting the chat history (e.g., in a database or cache) so that a server restart doesn't require a full re-scan?
Thanks for the great project!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels