Skip to content

feat(rag): implement RAG-style attachment handling for enhanced document indexing#687

Open
ba0f3 wants to merge 3 commits intonextlevelbuilder:devfrom
ba0f3:rag
Open

feat(rag): implement RAG-style attachment handling for enhanced document indexing#687
ba0f3 wants to merge 3 commits intonextlevelbuilder:devfrom
ba0f3:rag

Conversation

@ba0f3
Copy link
Copy Markdown
Contributor

@ba0f3 ba0f3 commented Apr 4, 2026

Summary

Optional RAG attachment indexing hooks the agent loop to extract text from uploaded documents, inject it into the current user turn when possible, and async-index it into memory under scoped rag/... paths for later memory_search.

  • Leverages GoClaw’s memory stack (pgvector-backed retrieval) so indexed knowledge plugs into the same recall path agents already use.
  • That often avoids an extra LLM round-trip through read_document, because the model can see the extracted text immediately.
  • If extraction fails, document refs and the read hint remain, so the run can still proceed normally.
  • Add a RAG Indexing item under System in the sidebar for managing document indexing.
image

See docs/rag-implementation.md for architecture and usage.

Type

  • Feature
  • Bug fix
  • Hotfix (targeting main)
  • Refactor
  • Docs
  • CI/CD

Target Branch

Checklist

  • go build ./... passes
  • go build -tags sqliteonly ./... passes (if Go changes)
  • go vet ./... passes
  • Tests pass: go test -race ./...
  • Web UI builds: cd ui/web && pnpm build (if UI changes)
  • No hardcoded secrets or credentials
  • SQL queries use parameterized $1, $2 (no string concat)
  • New user-facing strings added to all 3 locales (en/vi/zh)
  • Migration version bumped in internal/upgrade/version.go (if new migration)

Test Plan

ba0f3 and others added 3 commits April 4, 2026 19:47
…ent indexing

- Introduced RAG (Retrieval-Augmented Generation) functionality to handle document attachments, enabling text extraction during LLM turns and asynchronous indexing into memory.
- Updated `registerAllMethods` to include a new `MemoryStore` parameter for session management.
- Enhanced `SessionsMethods` to purge RAG-related documents upon session deletion and reset.
- Added new methods for document extraction and indexing, supporting various file types including .pdf, .docx, and .xlsx.
- Created a comprehensive implementation guide in `docs/rag-implementation.md` detailing RAG functionality and configuration.
- Implemented tests for RAG attachment handling and scope parsing to ensure correct behavior across different contexts.

This commit lays the groundwork for improved document handling and retrieval in the system, enhancing user experience and operational efficiency.
- Added `AgentUpdateResponse` type to the imports in `agent-advanced-dialog.tsx` and `agent-overview-tab.tsx` to enhance type safety and clarity in agent data updates.
- Improved the handling of RAG dependencies in `rag-page.tsx` by ensuring the last RAG dependencies are correctly retrieved from a filtered list, enhancing robustness in data management.

This commit refines type definitions and improves data handling logic across the agent detail components.
- Introduced `RAG_SELECTED_AGENT_IDS` constant to manage selected agent IDs in localStorage, allowing persistence across sessions.
- Updated `rag-page.tsx` to restore selected agent IDs from localStorage when the agent list is available and to save changes back to localStorage.
- Enhanced comments for clarity regarding the behavior of selected agent IDs.

This commit improves user experience by maintaining agent selection state across page reloads.
@reski-rukmantiyo
Copy link
Copy Markdown

Very interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants