⚗️ Distill: Optimize MCP response for search_knowledge#1065
⚗️ Distill: Optimize MCP response for search_knowledge#1065
Conversation
…pagination Co-authored-by: fritzprix <917899+fritzprix@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
Refactors the Rust builtin search_knowledge MCP tool to emit dense Markdown table output (instead of verbose JSON-in-text), while adding offset support for limit/offset pagination with a safety cap to avoid large queries.
Changes:
- Added
offsetto thesearch_knowledgetool input schema. - Implemented limit/offset pagination by fetching
limit + offset + 1, slicing results, and emitting a Markdown table. - Added additional guided error messaging for DB failures and invalid offsets; documented the learning in
.jules/distill.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src-tauri/src/mcp/builtin/knowledge/tools.rs |
Adds offset parameter to the MCP tool schema for pagination. |
src-tauri/src/mcp/builtin/knowledge/queries.rs |
Implements pagination + Markdown table rendering, adds guidance messages. |
.jules/distill.md |
Documents the rationale/guardrails around offset capping and has_more calculation. |
| None, | ||
| 0, | ||
| Some("Pagination offset (0-based)."), |
There was a problem hiding this comment.
offset is silently capped in the backend (to 10,000) but the tool schema exposes no maximum. This mismatch makes it likely the agent will request offsets above the cap and get surprising results; consider adding a max (e.g., 10,000) to the schema and documenting it in the description.
| None, | |
| 0, | |
| Some("Pagination offset (0-based)."), | |
| Some(10_000), | |
| 0, | |
| Some("Pagination offset (0-based, maximum: 10,000)."), |
| Some("Maximum number of results to return (default: 5)."), | ||
| ), | ||
| ), | ||
| ( | ||
| "offset".to_string(), |
There was a problem hiding this comment.
The limit constraint in the schema (max 50) doesn’t match the backend clamp (MAX_LIMIT is 100). Either reduce the backend clamp to 50 or update the schema max to 100 so the contract is consistent.
| let offset = args | ||
| .get("offset") | ||
| .and_then(|v| v.as_u64()) | ||
| .unwrap_or(0) | ||
| .min(10_000); |
There was a problem hiding this comment.
Clamping offset with .min(10_000) can create an infinite-pagination loop: if the agent follows the “call again with offset: …” hint beyond 10,000, the value will be clamped back to 10,000 and the tool will keep returning the same page (and still report has_more). Prefer rejecting offsets > 10,000 with an InvalidInput error (or otherwise explicitly signal the cap was hit) rather than silently clamping.
| let offset = args | |
| .get("offset") | |
| .and_then(|v| v.as_u64()) | |
| .unwrap_or(0) | |
| .min(10_000); | |
| let offset = match args.get("offset").and_then(|v| v.as_u64()) { | |
| Some(offset) if offset > 10_000 => { | |
| return Ok(guided_error( | |
| ErrorCategory::InvalidInput, | |
| format!( | |
| "Unsupported offset '{offset}'. Expected offset to be less than or equal to 10,000." | |
| ), | |
| ToolGroup::Knowledge, | |
| ) | |
| .to_mcp_result()); | |
| } | |
| Some(offset) => offset, | |
| None => 0, | |
| }; |
| "offset".to_string(), | ||
| integer_prop_with_default( | ||
| Some(0), | ||
| None, | ||
| 0, | ||
| Some("Pagination offset (0-based)."), | ||
| ), |
There was a problem hiding this comment.
Schema change adds offset pagination support, but there’s no test asserting search_knowledge_tool() exposes the new field (and its bounds). There are already schema tests for record_knowledge_tool(); adding a similar test for search_knowledge_tool() would prevent regressions (e.g., missing offset/max).
…pagination Co-authored-by: fritzprix <917899+fritzprix@users.noreply.github.com>
💡 What: Refactored the
search_knowledgeMCP tool in the Rust backend to return dense Markdown tables instead of bloated JSON arrays. Addedoffsetsupport to the tool schema to enable limit/offset pagination, capped at 10,000 to prevent OOM errors.🎯 Why: The original implementation returned unstructured
json!arrays directly into the content context, risking Context Overflow on large keyword searches and limiting readability for the LLM.📉 Token Impact: Drastically reduces the token footprint per knowledge chunk returned. By utilizing Markdown tables, the LLM avoids parsing redundant JSON keys (like
content:,id:,score:) for every entry, and long-term context retention is preserved.🛠️ Error Recovery: Added explicit LLM-coaching guidance (
Try using simpler search queries or exploring context around known entities.) when database queries fail, and a helpful redirect (Try calling again with offset: 0) if the agent tries to traverse past the end of the results.PR created automatically by Jules for task 3744919151408971230 started by @fritzprix