Skip to content

⚗️ Distill: Optimize MCP response for search_knowledge#1065

Open
fritzprix wants to merge 2 commits intodev/0.7.xfrom
distill-search-knowledge-pagination-3744919151408971230
Open

⚗️ Distill: Optimize MCP response for search_knowledge#1065
fritzprix wants to merge 2 commits intodev/0.7.xfrom
distill-search-knowledge-pagination-3744919151408971230

Conversation

@fritzprix
Copy link
Copy Markdown
Owner

💡 What: Refactored the search_knowledge MCP tool in the Rust backend to return dense Markdown tables instead of bloated JSON arrays. Added offset support to the tool schema to enable limit/offset pagination, capped at 10,000 to prevent OOM errors.

🎯 Why: The original implementation returned unstructured json! arrays directly into the content context, risking Context Overflow on large keyword searches and limiting readability for the LLM.

📉 Token Impact: Drastically reduces the token footprint per knowledge chunk returned. By utilizing Markdown tables, the LLM avoids parsing redundant JSON keys (like content:, id:, score:) for every entry, and long-term context retention is preserved.

🛠️ Error Recovery: Added explicit LLM-coaching guidance (Try using simpler search queries or exploring context around known entities.) when database queries fail, and a helpful redirect (Try calling again with offset: 0) if the agent tries to traverse past the end of the results.


PR created automatically by Jules for task 3744919151408971230 started by @fritzprix

…pagination

Co-authored-by: fritzprix <917899+fritzprix@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings April 20, 2026 03:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Rust builtin search_knowledge MCP tool to emit dense Markdown table output (instead of verbose JSON-in-text), while adding offset support for limit/offset pagination with a safety cap to avoid large queries.

Changes:

  • Added offset to the search_knowledge tool input schema.
  • Implemented limit/offset pagination by fetching limit + offset + 1, slicing results, and emitting a Markdown table.
  • Added additional guided error messaging for DB failures and invalid offsets; documented the learning in .jules/distill.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src-tauri/src/mcp/builtin/knowledge/tools.rs Adds offset parameter to the MCP tool schema for pagination.
src-tauri/src/mcp/builtin/knowledge/queries.rs Implements pagination + Markdown table rendering, adds guidance messages.
.jules/distill.md Documents the rationale/guardrails around offset capping and has_more calculation.

Comment on lines +132 to +134
None,
0,
Some("Pagination offset (0-based)."),
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset is silently capped in the backend (to 10,000) but the tool schema exposes no maximum. This mismatch makes it likely the agent will request offsets above the cap and get surprising results; consider adding a max (e.g., 10,000) to the schema and documenting it in the description.

Suggested change
None,
0,
Some("Pagination offset (0-based)."),
Some(10_000),
0,
Some("Pagination offset (0-based, maximum: 10,000)."),

Copilot uses AI. Check for mistakes.
Comment on lines 125 to +129
Some("Maximum number of results to return (default: 5)."),
),
),
(
"offset".to_string(),
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit constraint in the schema (max 50) doesn’t match the backend clamp (MAX_LIMIT is 100). Either reduce the backend clamp to 50 or update the schema max to 100 so the contract is consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +35
let offset = args
.get("offset")
.and_then(|v| v.as_u64())
.unwrap_or(0)
.min(10_000);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamping offset with .min(10_000) can create an infinite-pagination loop: if the agent follows the “call again with offset: …” hint beyond 10,000, the value will be clamped back to 10,000 and the tool will keep returning the same page (and still report has_more). Prefer rejecting offsets > 10,000 with an InvalidInput error (or otherwise explicitly signal the cap was hit) rather than silently clamping.

Suggested change
let offset = args
.get("offset")
.and_then(|v| v.as_u64())
.unwrap_or(0)
.min(10_000);
let offset = match args.get("offset").and_then(|v| v.as_u64()) {
Some(offset) if offset > 10_000 => {
return Ok(guided_error(
ErrorCategory::InvalidInput,
format!(
"Unsupported offset '{offset}'. Expected offset to be less than or equal to 10,000."
),
ToolGroup::Knowledge,
)
.to_mcp_result());
}
Some(offset) => offset,
None => 0,
};

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +135
"offset".to_string(),
integer_prop_with_default(
Some(0),
None,
0,
Some("Pagination offset (0-based)."),
),
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema change adds offset pagination support, but there’s no test asserting search_knowledge_tool() exposes the new field (and its bounds). There are already schema tests for record_knowledge_tool(); adding a similar test for search_knowledge_tool() would prevent regressions (e.g., missing offset/max).

Copilot uses AI. Check for mistakes.
…pagination

Co-authored-by: fritzprix <917899+fritzprix@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants