fix(mcp): 13 quality improvements across search, save, and pipeline by kckylechen1 · Pull Request #42 · kckylechen1/tachi

kckylechen1 · 2026-04-07T15:00:42Z

Summary

Audited the MCP server codebase end-to-end and fixed 13 issues across P0/P1/P2 severity levels, covering search quality, data integrity, and operational correctness.

Changes

P0 — Functional Bugs

auto_link DB mismatch: auto_link was hardcoded to search global store only; now searches the same DB the memory was saved to
importance unvalidated: save_memory now clamps importance to [0.0, 1.0] (was passing raw values)
EXTRACTION_PROMPT missing fields: Prompt now requests entities and persons; fact_to_entry parses them (were always empty arrays)
Enrichment sequential → concurrent: embed_voyage and generate_summary now run via tokio::join! (~2x faster)

P1 — Search Quality / UX

Path prefix filter pushed to SQL: search_vec and search_fts accept path_prefix parameter; filtering at DB level instead of Rust post-filter (fixes effective top_k being reduced)
Cross-DB score normalization: Global and project results re-normalized by各自的 max score before merge sort
Search skip returns reason: should_skip_query now returns {status, reason, results} instead of []
hub_feedback false success: Now returns recorded: false for non-existent capability IDs; also clamps rating to [0.0, 5.0]

P2 — Robustness

strip_code_fence nested fences: Uses rfind("```") for last closing fence
Noise filter AI boilerplate: Added "I apologize", "As an AI", "Let me know if", "Feel free to", "Is there anything else", "I cannot"
SKIP_PATTERNS fragile indices: Long-text check uses named variable references instead of [3..5]
generate_summary fallback marking: Returns (String, bool) tuple so callers can distinguish LLM summary from truncation fallback
sync_memories dual-DB known state: Reads from both global and project DBs; writes to both

Test plan

cargo check passes with zero warnings
cargo test -p memory-core — 37 tests pass, 0 failures

P0 fixes: - auto_link now searches the same DB the memory was saved to (was global-only) - importance is clamped to [0.0, 1.0] on save (was unvalidated) - EXTRACTION_PROMPT now requests entities/persons fields; fact_to_entry parses them - enrichment runs embedding + summary concurrently via tokio::join! (was sequential) P1 fixes: - path prefix filter pushed down to SQL in search_vec/search_fts (was post-filter) - cross-DB scores re-normalized before merge sort (was comparing raw scores) - search skip returns {status, reason} instead of empty array - hub_feedback returns recorded:false for non-existent capability IDs (was always true) - rating is clamped to [0.0, 5.0] in hub_record_feedback P2 fixes: - strip_code_fence uses rfind for last ``` to handle nested fences - noise filter catches AI boilerplate (I apologize, As an AI, etc.) - SKIP_PATTERNS long-text check uses named references instead of fragile indices - generate_summary returns (summary, is_fallback) to distinguish LLM vs truncation - sync_memories reads known state from both global and project DBs

gemini-code-assist

Code Review

This pull request implements hierarchical path filtering for vector and FTS searches, refines noise detection patterns, and enhances the memory synchronization and feedback systems to better handle project-specific stores. It also introduces concurrent embedding and summary generation and expands the extraction prompt to include entities and persons. Review feedback identifies a logical flaw in cross-DB score normalization that erases quality gaps between stores, potential false positives in path prefix matching for sibling directories, and an unhandled error in the project store's agent state update.

gemini-code-assist · 2026-04-07T15:03:00Z

crates/memory-server/src/memory_search_ops.rs

+    // Re-normalize cross-DB scores: each DB's scores are independently normalized,
+    // so we re-scale each group by its own max to make them comparable.
+    {
+        let global_max = combined_results
+            .iter()
+            .filter(|(_, s)| matches!(s, DbScope::Global))
+            .map(|(r, _)| r.score.final_score)
+            .fold(0.0_f64, f64::max);
+        let project_max = combined_results
+            .iter()
+            .filter(|(_, s)| matches!(s, DbScope::Project))
+            .map(|(r, _)| r.score.final_score)
+            .fold(0.0_f64, f64::max);
+
+        if global_max > 0.0 || project_max > 0.0 {
+            let scale_global = if global_max > 0.0 { 1.0 / global_max } else { 1.0 };
+            let scale_project = if project_max > 0.0 { 1.0 / project_max } else { 1.0 };
+            for (result, db_scope) in &mut combined_results {
+                let scale = match db_scope {
+                    DbScope::Global => scale_global,
+                    DbScope::Project => scale_project,
+                };
+                result.score.final_score *= scale;
+            }
+        }
+        combined_results.sort_by(|a, b| {
+            b.0.score
+                .final_score
+                .partial_cmp(&a.0.score.final_score)
+                .unwrap_or(std::cmp::Ordering::Equal)
+        });
+    }


The cross-DB score normalization logic is logically flawed for hybrid search results. While the intent is to make scores comparable, re-scaling each database's results by its own maximum score erases the quality gap between the two stores. For example, if the Global DB has a highly relevant match (score 0.9) and the Project DB has only a marginal match (score 0.1), this logic will promote the marginal match to 1.0, making it appear as relevant as the global match. Since search_fts already performs internal relative normalization and search_vec provides absolute cosine similarity, the scores are already sufficiently comparable for a merged sort without this destructive re-scaling.

gemini-code-assist · 2026-04-07T15:03:00Z

crates/memory-core/src/db.rs

+    if let Some(prefix) = path_prefix {
+        let pattern = format!("{}%", prefix.trim_end_matches('/'));
+        let mut stmt = conn.prepare(
+            r#"SELECT v.id, v.distance
+               FROM memories_vec v
+               JOIN memories m ON m.id = v.id
+               WHERE v.embedding MATCH ?1
+                 AND k = ?3
+                 AND (?2 = 1 OR m.archived = 0)
+                 AND (m.path LIKE ?4 OR m.path = ?5)
+               ORDER BY v.distance"#,
+        )?;
+        let rows = stmt.query_map(
+            params![blob, include_archived as i64, top_k as i64, &pattern, prefix.trim_end_matches('/')],
+            |row| {
+                let id: String = row.get(0)?;
+                let dist: f64 = row.get(1)?;
+                Ok((id, dist))
+            },
+        )?;
+        for r in rows {
+            let (id, dist) = r?;
+            let sim: f64 = (1.0 - dist / 2.0).clamp(0.0, 1.0);
+            scores.insert(id, sim);
+        }


The path prefix matching logic is too broad. Using LIKE 'prefix%' will match sibling directories that share the same prefix (e.g., a search for /foo will incorrectly match /foobar). To correctly implement hierarchical path filtering, you should match either the exact path or paths starting with the prefix followed by a slash. This is consistent with the logic already used in list_by_path.

if let Some(prefix) = path_prefix { let normalized = prefix.trim_end_matches('/'); let (pattern, path_val) = if normalized.is_empty() || normalized == "/" { ("/%".to_string(), "/") } else { (format!("{}/%", normalized), normalized) }; let mut stmt = conn.prepare( r#"SELECT v.id, v.distance FROM memories_vec v JOIN memories m ON m.id = v.id WHERE v.embedding MATCH ?1 AND k = ?3 AND (?2 = 1 OR m.archived = 0) AND (m.path = ?5 OR m.path LIKE ?4) ORDER BY v.distance"#, )?; let rows = stmt.query_map( params![blob, include_archived as i64, top_k as i64, &pattern, path_val], |row| { let id: String = row.get(0)?; let dist: f64 = row.get(1)?; Ok((id, dist)) }, )?; for r in rows { let (id, dist) = r?; let sim: f64 = (1.0 - dist / 2.0).clamp(0.0, 1.0); scores.insert(id, sim); } }

gemini-code-assist · 2026-04-07T15:03:01Z

crates/memory-core/src/db.rs

+    let mut raw: Vec<(String, f64)> = if let Some(prefix) = path_prefix {
+        let pattern = format!("{}%", prefix.trim_end_matches('/'));
+        let mut stmt = conn.prepare(
+            r#"SELECT memories_fts.id, -bm25(memories_fts) AS score
+               FROM memories_fts
+               JOIN memories m ON m.id = memories_fts.id
+               WHERE memories_fts MATCH simple_query(?1)
+                 AND (?2 = 1 OR m.archived = 0)
+                 AND (m.path LIKE ?4 OR m.path = ?5)
+               ORDER BY bm25(memories_fts)
+               LIMIT ?3"#,
+        )?;
+        let rows = stmt.query_map(
+            params![safe_query, include_archived as i64, limit as i64, &pattern, prefix.trim_end_matches('/')],
+            |row| {
+                let id: String = row.get(0)?;
+                let score: f64 = row.get(1)?;
+                Ok((id, score))
+            },
+        )?;
+        rows.filter_map(|r| r.ok()).collect()


Similar to the vector search, the FTS path prefix matching logic will incorrectly match sibling directories (e.g., /foo matching /foobar). It should be updated to use the same directory-aware logic as list_by_path.

let mut raw: Vec<(String, f64)> = if let Some(prefix) = path_prefix { let normalized = prefix.trim_end_matches('/'); let (pattern, path_val) = if normalized.is_empty() || normalized == "/" { ("/%".to_string(), "/") } else { (format!("{}/%", normalized), normalized) }; let mut stmt = conn.prepare( r#"SELECT memories_fts.id, -bm25(memories_fts) AS score FROM memories_fts JOIN memories m ON m.id = memories_fts.id WHERE memories_fts MATCH simple_query(?1) AND (?2 = 1 OR m.archived = 0) AND (m.path = ?5 OR m.path LIKE ?4) ORDER BY bm25(memories_fts) LIMIT ?3"#, )?; let rows = stmt.query_map( params![safe_query, include_archived as i64, limit as i64, &pattern, path_val], |row| { let id: String = row.get(0)?; let score: f64 = row.get(1)?; Ok((id, score)) }, )?; rows.filter_map(|r| r.ok()).collect() }

gemini-code-assist · 2026-04-07T15:03:01Z

crates/memory-server/src/pipeline_ops.rs

+        if server.project_db_path.is_some() {
+            let _ = server.with_project_store(|store| {
+                store
+                    .update_agent_known_state(&params.agent_id, &sync_updates)
+                    .map_err(|e| format!("Failed to update project agent state: {}", e))
+            });
+        }


The update to the project store's agent state ignores potential errors. If the project database is locked or the update fails for any other reason, the agent's known state will become out of sync, leading to redundant or missing memories in future sync calls. You should propagate the error using the ? operator, consistent with the global store update above.

Suggested change

if server.project_db_path.is_some() {

let _ = server.with_project_store(|store| {

store

.update_agent_known_state(&params.agent_id, &sync_updates)

.map_err(|e| format!("Failed to update project agent state: {}", e))

});

}

if server.project_db_path.is_some() {

server.with_project_store(|store| {

store

.update_agent_known_state(&params.agent_id, &sync_updates)

.map_err(|e| format!("Failed to update project agent state: {}", e))

})?;

}

Reapplies fixes from PR #42 onto the restructured codebase (db split, hub_ops split, tool_params dir). Fixes: - importance clamped to [0.0, 1.0] on save (was unvalidated) - EXTRACTION_PROMPT now requests entities/persons; fact_to_entry parses them - cross-DB search scores re-normalized before merge sort - hub_feedback returns recorded:false for non-existent IDs; rating clamped to [0.0, 5.0] - strip_code_fence uses rfind for last ``` to handle nested fences - noise filter catches AI boilerplate (I apologize, As an AI, etc.) - SKIP_PATTERNS long-text check uses named references instead of fragile indices - sync_memories reads known state from both global and project DBs

kckylechen1 · 2026-04-08T07:26:21Z

Superseded by #43. Closing the older branch to keep the review surface clean.

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

kckylechen1 mentioned this pull request Apr 7, 2026

fix(mcp): 9 quality improvements for search, save, and pipeline #43

Open

2 tasks

kckylechen1 closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): 13 quality improvements across search, save, and pipeline#42

fix(mcp): 13 quality improvements across search, save, and pipeline#42
kckylechen1 wants to merge 1 commit intomainfrom
fix/mcp-server-quality-improvements

kckylechen1 commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

kckylechen1 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kckylechen1 commented Apr 7, 2026

Summary

Changes

P0 — Functional Bugs

P1 — Search Quality / UX

P2 — Robustness

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

kckylechen1 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant