Skip to content

Feature: Response Strictness Setting for Assistants (RAG Grounding Control) #112

@DerrickF

Description

@DerrickF

Summary

Add a response strictness setting to the assistant create/edit page that controls how strictly the agent should ground its responses in the retrieved knowledge base documents. Presented as a slider with three modes.

Motivation

Different assistants have different trust requirements. A policy FAQ bot should only answer from its documents and say "I don't know" otherwise. A research assistant should use documents as a starting point but supplement with general knowledge. Giving the assistant creator control over this builds user trust and makes assistants more fit-for-purpose.

Proposed Behavior

Three Strictness Modes

Mode Label Behavior
strict Only use documents Agent must answer exclusively from retrieved knowledge base content. If the answer is not in the documents, it should say so explicitly rather than speculate.
balanced Prefer documents Agent uses retrieved documents as the primary source but may supplement with general knowledge when documents are insufficient. Should clearly distinguish between sourced and general information.
flexible Use documents as context Agent treats retrieved documents as helpful context alongside its general knowledge. Documents inform but do not constrain the response.

Default

balanced — matches the current behavior most closely (the existing RAG prompt says "use this information to answer accurately" without strict grounding constraints).

Implementation

Data Model

New field on the assistant:

ragStrictness: STRING (enum: "strict" | "balanced" | "flexible", default: "balanced")

Assistant model (shared/assistants/models.py):

rag_strictness: Literal["strict", "balanced", "flexible"] = Field("balanced", alias="ragStrictness")

Add to CreateAssistantRequest, UpdateAssistantRequest, and AssistantResponse similarly.

RAG Prompt Modification

The strictness mode changes the instruction text in augment_prompt_with_context() (shared/assistants/rag_service.py). Currently the function hardcodes:

The following context is retrieved from the assistant's knowledge base.
Use this information to answer the user's question accurately and comprehensively.

This becomes mode-dependent:

strict:

The following context is retrieved from the assistant's knowledge base.
You MUST answer ONLY using the provided context. Do not use any outside knowledge.
If the answer cannot be found in the context below, respond with:
"I don't have enough information in my knowledge base to answer that question."
Do not speculate or infer beyond what is explicitly stated in the context.

balanced:

The following context is retrieved from the assistant's knowledge base.
Use this information as your primary source to answer the user's question accurately.
You may supplement with general knowledge when the context is insufficient,
but clearly prioritize the provided documents.

flexible:

The following context is retrieved from the assistant's knowledge base.
Use this information as helpful context alongside your general knowledge
to provide the most comprehensive and accurate answer possible.

Where the Strictness is Applied

In inference_api/chat/routes.py, the assistant is already loaded before augment_prompt_with_context is called. Pass the strictness mode through:

augmented_message = augment_prompt_with_context(
    user_message=input_data.message,
    context_chunks=context_chunks,
    strictness=assistant.rag_strictness,  # new parameter
)

Same pattern in app_api/chat/routes.py if RAG augmentation happens there.

Frontend Changes

Assistant Create/Edit Form

  • Add a 3-position slider (or segmented control) labeled "Response Strictness"
  • Positions: Only use documents | Prefer documents | Use documents as context
  • Default position: center ("Prefer documents")
  • Brief helper text below the slider explaining the selected mode:
    • Strict: "The assistant will only answer from its uploaded documents"
    • Balanced: "The assistant will prioritize documents but may use general knowledge"
    • Flexible: "The assistant will use documents as context alongside general knowledge"

Placement

Below the instructions/system prompt field, near the other assistant behavior settings (alongside the citation toggles from #111).

Migration

  • Existing assistants have no ragStrictness attribute in DynamoDB
  • Pydantic default of "balanced" handles this — existing assistants continue to behave as they do today

Out of Scope

  • Per-conversation strictness override by end users
  • Strictness levels beyond the three defined modes
  • Custom prompt template editing (could be a future "advanced" option)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions