Improve probe/score semantics by wkusnierczyk · Pull Request #170 · wkusnierczyk/aigent

wkusnierczyk · 2026-02-23T16:36:31Z

Summary

Replace hand-rolled stemmer with Snowball (rust-stemmers), add synonym expansion, graduate trigger/name scores, soften structural scoring, unify trigger phrases across modules
Closes [TASK] Improve probe/score matching semantics #168

Changes

Matching improvements (src/tester.rs):

Replace 11-rule suffix stripper with Snowball English stemmer for better inflection handling
Add 15 synonym groups for common skill-domain terms (expand query tokens only)
Graduate trigger score from binary (0/1) to fraction of query tokens matched
Graduate name score from binary (0/1) to fraction of query tokens matched
Use shared TRIGGER_PHRASES from linter with contains (was starts_with on 2 phrases)

Scoring improvements (src/scorer.rs):

Proportional structural scoring: 6 checks × 10 points each (was all-or-nothing 0 or 60)
Add STRUCTURAL_POINTS_PER_CHECK = 10 constant

Shared constants (src/linter.rs, src/cli/upgrade.rs):

Make TRIGGER_PHRASES pub (was private) for cross-module use
Upgrade module now uses shared TRIGGER_PHRASES instead of hardcoded "use when"/"use this when"

Documentation (docs/cli.md):

Add limitations notes for probe and score commands
Update structural scoring description from all-or-nothing to proportional
Update example scores to reflect new scoring

Breaking changes: Probe scores and score totals change for existing skills.

Test plan

cargo test — 527 lib + 144 CLI + 27 plugin + 1 doc = 699 passing
cargo clippy -- -D warnings — clean
cargo fmt --check — clean
cargo test --test anthropics_skills -- --ignored — 64 integration tests pass

🤖 Generated with Claude Code

Replace hand-rolled stemmer with Snowball (rust-stemmers), add synonym expansion for query tokens, graduate trigger/name scores from binary to fractional, soften structural scoring from all-or-nothing to proportional (10 per check), and unify trigger phrase definitions across linter, tester, and upgrade modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request improves the semantic matching and scoring capabilities of the probe and score commands, addressing issue #168. The changes replace a simple 11-rule suffix stripper with the Snowball English stemmer, add synonym expansion for common skill-domain terms, graduate binary trigger/name scores to proportional scoring, and make structural scoring proportional rather than all-or-nothing.

Changes:

Replace hand-rolled stemmer with Snowball stemmer and add synonym expansion to improve matching recall
Graduate trigger and name scores from binary (0/1) to proportional (fraction of query tokens matched)
Change structural scoring from all-or-nothing (0 or 60 points) to proportional (10 points per passing check)

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/tester.rs`	Replaces suffix stripper with Snowball stemmer, adds 15 synonym groups, implements graduated trigger/name scoring, and uses shared TRIGGER_PHRASES
`src/scorer.rs`	Implements proportional structural scoring (10 points per check instead of all-or-nothing 60)
`src/linter.rs`	Makes TRIGGER_PHRASES public for cross-module sharing
`src/cli/upgrade.rs`	Uses shared TRIGGER_PHRASES instead of hardcoded phrases
`tests/cli.rs`	Updates test to include trigger phrase in description for compatibility with new scoring
`docs/cli.md`	Adds limitations sections for probe and score, updates scoring descriptions, adjusts example scores
`Cargo.toml`	Adds rust-stemmers dependency
`Cargo.lock`	Locks rust-stemmers at version 1.2.0
`CHANGES.md`	Documents breaking changes to probe scores and score totals

Copilot · 2026-02-23T16:45:21Z

src/tester.rs

+    }
+    expanded
+}
+


No tests for the new stem function or expand_synonyms function. These are significant new features that should have unit tests to verify:

That common words stem correctly (e.g., "validating" → "valid", "parsed" → "pars")

That synonym expansion works as expected (e.g., "validate" expands to include "check", "verify", "lint")

That synonym expansion only applies to query tokens, not description tokens

Edge cases like empty input, tokens with no synonyms, multiple synonyms matching

Without tests, it's difficult to verify that the synonym groups contain the correct stemmed forms, or that the synonym expansion logic works correctly.

Suggested change

#[cfg(test)]

mod tests {

use super::*;

#[test]

fn stem_common_words() {

// Verify that common words are stemmed as expected.

assert_eq!(stem("validating"), "valid");

assert_eq!(stem("parsed"), "pars");

}

#[test]

fn expand_synonyms_on_single_token_in_group() {

// Use the first synonym group to avoid hard-coding specific values.

let group = SYNONYM_GROUPS[0];

let token = group[0].to_string();

let expanded = expand_synonyms(&[token]);

// All members of the group should be present in the expanded set.

for &syn in group {

assert!(

expanded.contains(syn),

"expected expanded set to contain synonym '{}'",

syn

);

}

}

#[test]

fn expand_synonyms_empty_input_yields_empty_set() {

let expanded = expand_synonyms(&[]);

assert!(expanded.is_empty());

}

#[test]

fn expand_synonyms_token_with_no_synonyms_is_preserved_only() {

// Choose a token that is very unlikely to appear in any synonym group.

let token = "uniquetokenwithnosynonyms".to_string();

let expanded = expand_synonyms(&[token.clone()]);

assert_eq!(expanded.len(), 1);

assert!(expanded.contains(token.as_str()));

}

#[test]

fn expand_synonyms_multiple_tokens_in_same_group_do_not_duplicate() {

let group = SYNONYM_GROUPS[0];

assert!(

group.len() >= 2,

"expected at least two entries in the first synonym group"

);

let tokens: Vec<String> =

group.iter().take(2).map(|s| s.to_string()).collect();

let expanded = expand_synonyms(&tokens);

// All members of the group are present, but there are no duplicates.

for &syn in group {

assert!(

expanded.contains(syn),

"expected expanded set to contain synonym '{}'",

syn

);

}

}

#[test]

fn tokenize_does_not_apply_synonym_expansion() {

// Tokenization should only stem and filter stopwords, not expand synonyms.

// Using "validate"/"validating" as an example to ensure only the stem appears.

let tokens = tokenize("validating");

// We expect exactly one stemmed token and no additional synonyms.

assert_eq!(tokens.len(), 1);

assert_eq!(tokens[0], stem("validating"));

}

}

Copilot · 2026-02-23T16:45:21Z

docs/cli.md

 - **Quality (40 points)** — Five semantic lint checks worth 8 points each:
  third-person description, trigger phrase (`"Use when..."`), gerund name form
  (`converting-pdfs` not `pdf-converter`), specific (non-generic) name, and
  description length (≥ 20 words).


Documentation inaccuracy: The description says "description length (≥ 20 words)" but the actual lint check in src/linter.rs:144 is description.chars().count() < 20 || word_count < 4, which means the description must have both ≥ 20 characters AND ≥ 4 words.

Update the documentation to accurately describe the check as requiring both a minimum character count (20) and minimum word count (4), not just "≥ 20 words".

Copilot · 2026-02-23T16:45:21Z

src/tester.rs

+const SYNONYM_GROUPS: &[&[&str]] = &[
+    &["valid", "check", "verifi", "lint"],
+    &["pars", "extract", "read"],
+    &["format", "style", "clean"],
+    &["build", "assembl", "compil"],
+    &["test", "probe", "evalu"],
+    &["creat", "generat", "new"],
+    &["fix", "repair", "correct"],
+    &["upgrad", "improv", "enhanc"],
+    &["document", "describ", "explain"],
+    &["delet", "remov", "strip"],
+    &["instal", "setup", "configur"],
+    &["deploy", "publish", "releas"],
+    &["search", "find", "discov", "locat"],
+    &["transform", "convert", "process"],
+    &["analyz", "inspect", "review"],
+];


The synonym groups contain manually created "stemmed" forms that may not match the Snowball English stemmer output. This could cause synonym expansion to fail if a query token's actual Snowball stem doesn't match the manually created forms in SYNONYM_GROUPS.

For example, if "validate" is stemmed by the Snowball algorithm but doesn't match the "valid" in the synonym group, the expansion won't work. To ensure correctness, either:

Pre-stem all words in SYNONYM_GROUPS using the actual Snowball stemmer, or

Add tests to verify that common word forms correctly match their expected synonyms

Without verification, synonym expansion may silently fail to expand queries as intended.

Copilot · 2026-02-23T16:45:22Z

src/tester.rs

+    let desc_overlap = if expanded_query.is_empty() {
        0.0
    } else {
-        intersection as f64 / query_set.len() as f64
+        intersection as f64 / expanded_query.len() as f64
    };


Inconsistency in denominator for description overlap calculation. The description overlap divides by expanded_query.len() (which includes synonyms), while trigger and name scores divide by the original query_tokens.len(). This creates asymmetric scoring where synonym expansion deflates the description overlap score relative to trigger/name scores.

For example, if a 2-token query expands to 6 tokens via synonyms, and all 6 match the description, the description overlap would be 1.0 (6/6). But if only the original 2 tokens match the trigger, the trigger score would be 1.0 (2/2). Both are "perfect" matches, but the weighted formula would work differently.

Consider using query_tokens.len() consistently across all three components for more intuitive weighting, or document why the expanded set should be used for description but not for trigger/name.

Copilot · 2026-02-23T16:45:22Z

src/tester.rs

 fn stem(word: &str) -> String {
-    let w = word.to_lowercase();
-    // Order matters: check longer suffixes first.
-    for suffix in &[
-        "ting", "sing", "zing", "ning", "ring", "ses", "ies", "ing", "ed", "es", "s",
-    ] {
-        if w.len() > suffix.len() + 2 {
-            if let Some(root) = w.strip_suffix(suffix) {
-                return root.to_string();
-            }
-        }
-    }
-    w
+    let stemmer = Stemmer::create(Algorithm::English);
+    stemmer.stem(&word.to_lowercase()).into_owned()
 }


Performance issue: A new Stemmer instance is created for every word that needs stemming. The stem function is called inside tokenize, which processes every token in queries, descriptions, and trigger phrases. For a description with 50 words, this creates 50 separate Stemmer instances.

The Stemmer should be created once and reused. Consider using a LazyLock static variable (similar to how PERSON_RE is defined in src/linter.rs) or passing a reference to a shared Stemmer instance. This would significantly reduce overhead, especially when processing multiple skills or running batch operations.

Copilot AI review requested due to automatic review settings February 23, 2026 16:36

wkusnierczyk added task Task / chore medium Medium priority labels Feb 23, 2026

wkusnierczyk self-assigned this Feb 23, 2026

Copilot started reviewing on behalf of wkusnierczyk February 23, 2026 16:37 View session

wkusnierczyk closed this Feb 23, 2026

wkusnierczyk deleted the improve-probe-score-semantics branch February 23, 2026 16:41

Copilot AI reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve probe/score semantics#170

Improve probe/score semantics#170
wkusnierczyk wants to merge 1 commit intomainfrom
improve-probe-score-semantics

wkusnierczyk commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

+#[cfg(test)]
+mod tests {
+    use super::*;
+    #[test]
+    fn stem_common_words() {
+        // Verify that common words are stemmed as expected.
+        assert_eq!(stem("validating"), "valid");
+        assert_eq!(stem("parsed"), "pars");
+    }
+    #[test]
+    fn expand_synonyms_on_single_token_in_group() {
+        // Use the first synonym group to avoid hard-coding specific values.
+        let group = SYNONYM_GROUPS[0];
+        let token = group[0].to_string();
+        let expanded = expand_synonyms(&[token]);
+        // All members of the group should be present in the expanded set.
+        for &syn in group {
+            assert!(
+                expanded.contains(syn),
+                "expected expanded set to contain synonym '{}'",
+                syn
+            );
+        }
+    }
+    #[test]
+    fn expand_synonyms_empty_input_yields_empty_set() {
+        let expanded = expand_synonyms(&[]);
+        assert!(expanded.is_empty());
+    }
+    #[test]
+    fn expand_synonyms_token_with_no_synonyms_is_preserved_only() {
+        // Choose a token that is very unlikely to appear in any synonym group.
+        let token = "uniquetokenwithnosynonyms".to_string();
+        let expanded = expand_synonyms(&[token.clone()]);
+        assert_eq!(expanded.len(), 1);
+        assert!(expanded.contains(token.as_str()));
+    }
+    #[test]
+    fn expand_synonyms_multiple_tokens_in_same_group_do_not_duplicate() {
+        let group = SYNONYM_GROUPS[0];
+        assert!(
+            group.len() >= 2,
+            "expected at least two entries in the first synonym group"
+        );
+        let tokens: Vec<String> =
+            group.iter().take(2).map(|s| s.to_string()).collect();
+        let expanded = expand_synonyms(&tokens);
+        // All members of the group are present, but there are no duplicates.
+        for &syn in group {
+            assert!(
+                expanded.contains(syn),
+                "expected expanded set to contain synonym '{}'",
+                syn
+            );
+        }
+    }
+    #[test]
+    fn tokenize_does_not_apply_synonym_expansion() {
+        // Tokenization should only stem and filter stopwords, not expand synonyms.
+        // Using "validate"/"validating" as an example to ensure only the stem appears.
+        let tokens = tokenize("validating");
+        // We expect exactly one stemmed token and no additional synonyms.
+        assert_eq!(tokens.len(), 1);
+        assert_eq!(tokens[0], stem("validating"));
+    }
+}

Uh oh!

Conversation

wkusnierczyk commented Feb 23, 2026

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants