feat: add configurable RAG context retrieval settings for assistants#231
Open
gdmrino wants to merge 2 commits intoeneo-ai:developfrom
Open
feat: add configurable RAG context retrieval settings for assistants#231gdmrino wants to merge 2 commits intoeneo-ai:developfrom
gdmrino wants to merge 2 commits intoeneo-ai:developfrom
Conversation
Add per-assistant settings to control how much knowledge is retrieved during RAG. Users can choose between four modes: - Default (50% of model context) - Automatic relevance-based filtering (using autocut) - Custom percentage of context window - Fixed number of chunks Backend: - New RagContextType enum and rag_context_type/rag_context_value fields on the Assistant domain entity, API models, DB table, and repository - _get_retrieval_params() method replaces hardcoded chunk calculation - Thread autocut_cutoff through ReferencesService for relevance mode - Alembic migration adding columns to assistants table Frontend: - New RagContextSettings Svelte component with mode selector and value inputs (percentage slider, chunk count, or auto-relevance) - Wired into the assistant edit page with change tracking and revert - i18n strings for English and Swedish - TypeScript schema types updated for the new fields
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Add per-assistant settings to control how much knowledge is retrieved during RAG.
Users can choose between four modes:
Backend:
on the Assistant domain entity, API models, DB table, and repository
Frontend:
value inputs (percentage slider, chunk count, or auto-relevance)
Why
Previously, retrieval during RAG was fixed to use 50% of the model context window, which did not provide enough flexibility for different use cases. This was especially limiting when working with open or smaller-context models, where consuming half the context for retrieval can significantly reduce space available for the actual response.
Testing
Tested both backend and frontend behavior:
Backend:
Frontend:
End-to-end:
Screenshots