Summary
Add a response strictness setting to the assistant create/edit page that controls how strictly the agent should ground its responses in the retrieved knowledge base documents. Presented as a slider with three modes.
Motivation
Different assistants have different trust requirements. A policy FAQ bot should only answer from its documents and say "I don't know" otherwise. A research assistant should use documents as a starting point but supplement with general knowledge. Giving the assistant creator control over this builds user trust and makes assistants more fit-for-purpose.
Proposed Behavior
Three Strictness Modes
| Mode |
Label |
Behavior |
strict |
Only use documents |
Agent must answer exclusively from retrieved knowledge base content. If the answer is not in the documents, it should say so explicitly rather than speculate. |
balanced |
Prefer documents |
Agent uses retrieved documents as the primary source but may supplement with general knowledge when documents are insufficient. Should clearly distinguish between sourced and general information. |
flexible |
Use documents as context |
Agent treats retrieved documents as helpful context alongside its general knowledge. Documents inform but do not constrain the response. |
Default
balanced — matches the current behavior most closely (the existing RAG prompt says "use this information to answer accurately" without strict grounding constraints).
Implementation
Data Model
New field on the assistant:
ragStrictness: STRING (enum: "strict" | "balanced" | "flexible", default: "balanced")
Assistant model (shared/assistants/models.py):
rag_strictness: Literal["strict", "balanced", "flexible"] = Field("balanced", alias="ragStrictness")
Add to CreateAssistantRequest, UpdateAssistantRequest, and AssistantResponse similarly.
RAG Prompt Modification
The strictness mode changes the instruction text in augment_prompt_with_context() (shared/assistants/rag_service.py). Currently the function hardcodes:
The following context is retrieved from the assistant's knowledge base.
Use this information to answer the user's question accurately and comprehensively.
This becomes mode-dependent:
strict:
The following context is retrieved from the assistant's knowledge base.
You MUST answer ONLY using the provided context. Do not use any outside knowledge.
If the answer cannot be found in the context below, respond with:
"I don't have enough information in my knowledge base to answer that question."
Do not speculate or infer beyond what is explicitly stated in the context.
balanced:
The following context is retrieved from the assistant's knowledge base.
Use this information as your primary source to answer the user's question accurately.
You may supplement with general knowledge when the context is insufficient,
but clearly prioritize the provided documents.
flexible:
The following context is retrieved from the assistant's knowledge base.
Use this information as helpful context alongside your general knowledge
to provide the most comprehensive and accurate answer possible.
Where the Strictness is Applied
In inference_api/chat/routes.py, the assistant is already loaded before augment_prompt_with_context is called. Pass the strictness mode through:
augmented_message = augment_prompt_with_context(
user_message=input_data.message,
context_chunks=context_chunks,
strictness=assistant.rag_strictness, # new parameter
)
Same pattern in app_api/chat/routes.py if RAG augmentation happens there.
Frontend Changes
Assistant Create/Edit Form
- Add a 3-position slider (or segmented control) labeled "Response Strictness"
- Positions: Only use documents | Prefer documents | Use documents as context
- Default position: center ("Prefer documents")
- Brief helper text below the slider explaining the selected mode:
- Strict: "The assistant will only answer from its uploaded documents"
- Balanced: "The assistant will prioritize documents but may use general knowledge"
- Flexible: "The assistant will use documents as context alongside general knowledge"
Placement
Below the instructions/system prompt field, near the other assistant behavior settings (alongside the citation toggles from #111).
Migration
- Existing assistants have no
ragStrictness attribute in DynamoDB
- Pydantic default of
"balanced" handles this — existing assistants continue to behave as they do today
Out of Scope
- Per-conversation strictness override by end users
- Strictness levels beyond the three defined modes
- Custom prompt template editing (could be a future "advanced" option)
Summary
Add a response strictness setting to the assistant create/edit page that controls how strictly the agent should ground its responses in the retrieved knowledge base documents. Presented as a slider with three modes.
Motivation
Different assistants have different trust requirements. A policy FAQ bot should only answer from its documents and say "I don't know" otherwise. A research assistant should use documents as a starting point but supplement with general knowledge. Giving the assistant creator control over this builds user trust and makes assistants more fit-for-purpose.
Proposed Behavior
Three Strictness Modes
strictbalancedflexibleDefault
balanced— matches the current behavior most closely (the existing RAG prompt says "use this information to answer accurately" without strict grounding constraints).Implementation
Data Model
New field on the assistant:
Assistantmodel (shared/assistants/models.py):Add to
CreateAssistantRequest,UpdateAssistantRequest, andAssistantResponsesimilarly.RAG Prompt Modification
The strictness mode changes the instruction text in
augment_prompt_with_context()(shared/assistants/rag_service.py). Currently the function hardcodes:This becomes mode-dependent:
strict:balanced:flexible:Where the Strictness is Applied
In
inference_api/chat/routes.py, the assistant is already loaded beforeaugment_prompt_with_contextis called. Pass the strictness mode through:Same pattern in
app_api/chat/routes.pyif RAG augmentation happens there.Frontend Changes
Assistant Create/Edit Form
Placement
Below the instructions/system prompt field, near the other assistant behavior settings (alongside the citation toggles from #111).
Migration
ragStrictnessattribute in DynamoDB"balanced"handles this — existing assistants continue to behave as they do todayOut of Scope