Skip to content

feat: add configurable RAG context retrieval settings for assistants#231

Open
gdmrino wants to merge 2 commits intoeneo-ai:developfrom
gdmrino:feat/rag-ctx-settings-pr
Open

feat: add configurable RAG context retrieval settings for assistants#231
gdmrino wants to merge 2 commits intoeneo-ai:developfrom
gdmrino:feat/rag-ctx-settings-pr

Conversation

@gdmrino
Copy link
Copy Markdown

@gdmrino gdmrino commented Feb 17, 2026

Changes

Add per-assistant settings to control how much knowledge is retrieved during RAG.
Users can choose between four modes:

  • Default (50% of model context)
  • Automatic relevance-based filtering (using autocut from v1)
  • Custom percentage of context window
  • Fixed number of chunks

Backend:

  • New RagContextType enum and rag_context_type/rag_context_value fields
    on the Assistant domain entity, API models, DB table, and repository
  • _get_retrieval_params() method replaces hardcoded chunk calculation
  • Thread autocut_cutoff through ReferencesService for relevance mode
  • Alembic migration adding columns to assistants table

Frontend:

  • New RagContextSettings Svelte component with mode selector and
    value inputs (percentage slider, chunk count, or auto-relevance)
  • Wired into the assistant edit page with change tracking and revert
  • i18n strings for English and Swedish
  • TypeScript schema types updated for the new fields

Why

Previously, retrieval during RAG was fixed to use 50% of the model context window, which did not provide enough flexibility for different use cases. This was especially limiting when working with open or smaller-context models, where consuming half the context for retrieval can significantly reduce space available for the actual response.

Testing

Tested both backend and frontend behavior:

Backend:

  • Verified new database fields are created correctly via migration
  • Confirmed each RagContextType produces the expected retrieval parameters
  • Ensured default behavior remains unchanged for existing assistants
  • Validated relevance mode correctly threads the autocut cutoff through retrieval

Frontend:

  • Confirmed all four modes render and switch correctly in the assistant editor
  • Verified percentage slider, chunk count input, and relevance mode settings persist after save
  • Checked revert/change-tracking behavior works as expected

End-to-end:

  • Created assistants using each mode and confirmed retrieval size matches configuration during queries

Screenshots

Screenshot 2026-02-18 at 00 14 03

Add per-assistant settings to control how much knowledge is retrieved during RAG.
Users can choose between four modes:

- Default (50% of model context)
- Automatic relevance-based filtering (using autocut)
- Custom percentage of context window
- Fixed number of chunks

Backend:
- New RagContextType enum and rag_context_type/rag_context_value fields
  on the Assistant domain entity, API models, DB table, and repository
- _get_retrieval_params() method replaces hardcoded chunk calculation
- Thread autocut_cutoff through ReferencesService for relevance mode
- Alembic migration adding columns to assistants table

Frontend:
- New RagContextSettings Svelte component with mode selector and
  value inputs (percentage slider, chunk count, or auto-relevance)
- Wired into the assistant edit page with change tracking and revert
- i18n strings for English and Swedish
- TypeScript schema types updated for the new fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants