feat(skills): add synonym expansion to gws skills search#514
feat(skills): add synonym expansion to gws skills search#514dumko2001 wants to merge 15 commits intogoogleworkspace:mainfrom
Conversation
…x .gitignore formatting
…in src/registry.rs
Add explicit --help/-h handling to `handle_skills_command` so that `gws skills --help` and `gws skills search --help` print a proper help screen instead of treating the flag as a search query. Keeps the same manual dispatch pattern as all other top-level commands in main.rs rather than introducing a full clap subcommand tree for a single new command.
Per CLI convention, invoking a subcommand with no arguments should display usage/help rather than return an error. Since 'gws skills --help', 'gws skills', and 'gws skills <unknown-subcommand>' all indicate the user wants guidance, route all non-search invocations to print_skills_help() and return Ok(()).
Replace single-string substring match with a token-AND approach: split the query into individual tokens and require all tokens to appear somewhere in the combined fields (name + description + aliases). Previously, `gws skills search send email` built the query string "send email" and required that exact contiguous phrase to appear in a field. This failed for descriptions like "Gmail: Send an email" where the words are present but not adjacent. With token-AND matching, both tokens must match anywhere across the combined text, which is the correct behavior for natural-language queries.
Add a static SYNONYMS table mapping 30+ natural-language terms to their canonical service names. When a query token matches a synonym entry it is expanded to also include the canonical form, so: gws skills search email → finds Gmail (email → gmail) gws skills search spreadsheet → finds Sheets (spreadsheet → sheets) gws skills search schedule → finds Calendar (schedule → calendar) gws skills search document → finds Docs (document → docs) The matching logic uses token-AND with synonym-OR per token: every original query token must be satisfied, but it is satisfied if any of its synonym expansions appears in the skill's fields. An exact match on the canonical name still works as before. Adds expand_tokens() as a testable helper with 8 unit tests covering no-synonym pass-through, multi-word expansion, deduplication, and table integrity checks.
🦋 Changeset detectedLatest commit: e2cec4d The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the discoverability of skills within the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a synonym expansion feature for the new gws skills search command, making it easier for users to discover skills with natural language queries. It also includes a significant refactoring that moves generated API and helper skills into a references/ subdirectory, improving the project's structure. The implementation is generally strong, with good test coverage for the new search logic.
I've identified one high-severity issue related to how search query arguments are parsed. Quoted queries are not correctly tokenized, which can lead to failed searches. I've provided a code suggestion to address this. Otherwise, the changes, including the large-scale refactoring and the addition of a link-checking script, are well-executed.
src/skills_search.rs
Outdated
| // Split into individual tokens so multi-word queries like "send email" match | ||
| // descriptions where the words appear separately (e.g. "Send an email"). | ||
| // Then expand each token via the synonym table so "email" also matches "gmail". | ||
| let raw_tokens: Vec<String> = args[1..].iter().map(|a| a.to_lowercase()).collect(); |
There was a problem hiding this comment.
The current implementation for parsing search tokens doesn't correctly handle quoted multi-word queries. For example, a command like gws skills search "send email" would incorrectly treat "send email" as a single token instead of two separate tokens, send and email. This leads to incorrect search behavior as it would search for the literal phrase rather than individual keywords.
To ensure both quoted and unquoted queries are handled correctly, you should first join all arguments into a single string and then split that string by whitespace.
| let raw_tokens: Vec<String> = args[1..].iter().map(|a| a.to_lowercase()).collect(); | |
| let raw_tokens: Vec<String> = args[1..].join(" ").split_whitespace().map(|a| a.to_lowercase()).collect(); |
|
Fixed the tokenization of quoted queries in |
|
/gemini review |
|
Updated persona and recipe skill generation to use relative Markdown links in the PREREQUISITE section and correct skill identifiers (without the directory prefix) in the metadata |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This is a great enhancement to skill discovery! The synonym expansion will significantly improve the user experience for agents searching for skills. The implementation is solid, with good test coverage for the new logic. The refactoring to centralize registry definitions and improve error handling in the skill generation script is also a welcome improvement. I have one suggestion regarding performance in the new search command.
| for svc in services::SERVICES { | ||
| if let Some(helper) = helpers::get_helper(svc.api_name) { | ||
| let cli = Command::new(svc.api_name); | ||
| let doc = discovery::RestDescription { | ||
| name: svc.api_name.to_string(), | ||
| ..Default::default() | ||
| }; | ||
| let cli_with_helpers = helper.inject_commands(cli, &doc); | ||
| for sub in cli_with_helpers.get_subcommands() { | ||
| let name = sub.get_name(); | ||
| if name.starts_with('+') { | ||
| let short_name = name.trim_start_matches('+'); | ||
| let full_helper_name = format!("gws-{}-{}", svc.aliases[0], short_name); | ||
| let about = sub.get_about().map(|s| s.to_string()).unwrap_or_default(); | ||
| let about_clean = about.strip_prefix("[Helper] ").unwrap_or(&about); | ||
|
|
||
| if matches(&[full_helper_name.as_str(), about_clean]) { | ||
| println!("[Helper] {} - {}", full_helper_name, about_clean); | ||
| println!( | ||
| " Reference: skills/references/{}/SKILL.md\n", | ||
| full_helper_name | ||
| ); | ||
| results += 1; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation for searching helpers rebuilds the clap::Command tree for every service on every search invocation. This is inefficient and can slow down the skills search command, as it performs redundant computations on every run.
To improve performance, you can compute the list of all helpers once and cache it for subsequent searches. The once_cell::sync::Lazy pattern is a great fit for this. You would need to add once_cell as a dependency to your Cargo.toml.
Here's an example of how you could implement this optimization:
- Add
use once_cell::sync::Lazy;and define a struct for helper information at the module level. - Create a
Lazystatic to build the helper list once. - Replace the current helper search loop with a simple iteration over this static
HELPERSlist.
use once_cell::sync::Lazy;
// ... other imports
struct HelperInfo {
full_name: String,
description: String,
}
static HELPERS: Lazy<Vec<HelperInfo>> = Lazy::new(|| {
let mut helpers = Vec::new();
for svc in services::SERVICES {
if let Some(helper) = helpers::get_helper(svc.api_name) {
let cli = Command::new(svc.api_name);
let doc = crate::discovery::RestDescription {
name: svc.api_name.to_string(),
..Default::default()
};
let cli_with_helpers = helper.inject_commands(cli, &doc);
for sub in cli_with_helpers.get_subcommands() {
if sub.get_name().starts_with('+') {
let short_name = sub.get_name().trim_start_matches('+');
let full_helper_name = format!("gws-{}-{}", svc.aliases[0], short_name);
let about = sub.get_about().unwrap_or_default();
let about_clean = about.strip_prefix("[Helper] ").unwrap_or(about);
helpers.push(HelperInfo {
full_name: full_helper_name,
description: about_clean.to_string(),
});
}
}
}
}
helpers
});
// In handle_skills_command, replace the loop with:
// for h in HELPERS.iter() { ... }This change will make the search command significantly faster, especially as the number of services and helpers grows.
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable synonym expansion feature for skill search, significantly improving usability for agents. The accompanying refactoring of the skill file structure is substantial but well-executed, creating a more organized references directory. The implementation is generally strong, but I've identified a critical issue in the new link-checking script that would cause it to fail, and a high-severity bug in the search command where an empty query incorrectly matches all skills. My review includes suggestions to fix both issues.
| if not target_path.exists(): | ||
| print(f"Broken link in {filepath}: {link_path} (resolved to {target_path})") | ||
| broken_links += 1 | ||
|
|
| raw_tokens.push(token.to_lowercase()); | ||
| } | ||
| } | ||
| let query_display = raw_tokens.join(" "); |
There was a problem hiding this comment.
An empty or whitespace-only search query will currently match all skills. This is because token_candidates will be empty, and the .all() call on an empty iterator returns true. You should add a check to handle this case, for example by returning an error or showing help.
| let query_display = raw_tokens.join(" "); | |
| if raw_tokens.is_empty() { | |
| return Err(GwsError::Validation( | |
| "Search query cannot be empty or only whitespace.".to_string(), | |
| )); | |
| } | |
| let query_display = raw_tokens.join(" "); |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable synonym expansion feature to the gws skills search command, making it much easier for users to discover skills using natural language terms. The implementation in skills_search.rs is well-structured.
The PR also includes a significant refactoring, moving the generated API and helper skills into a skills/references/ subdirectory. While this improves the project's structure by separating reference skills from composable ones like personas and recipes, it's a large change that is arguably outside the primary scope of adding synonym search. As per the repository's general rules, avoid introducing changes that are outside the primary goal of a pull request to prevent scope creep. In the future, it would be better to separate large refactorings from feature additions into their own pull requests to keep reviews focused and easier to manage.
I've found one high-severity issue where an empty search query incorrectly returns all skills. Please see my detailed comment.
| let matches = |fields: &[&str]| -> bool { | ||
| let combined = fields.join(" ").to_lowercase(); | ||
| token_candidates | ||
| .iter() | ||
| .all(|candidates| candidates.iter().any(|c| combined.contains(c.as_str()))) | ||
| }; |
There was a problem hiding this comment.
An empty or whitespace-only search query (e.g., gws skills search "") incorrectly matches and displays all available skills. This happens because token_candidates becomes an empty vector, and the all() call on an empty iterator returns true, causing the matches closure to always return true.
To fix this, you can add a check at the beginning of the matches closure to handle the case of an empty query.
let matches = |fields: &[&str]| -> bool {
if token_candidates.is_empty() {
return false;
}
let combined = fields.join(" ").to_lowercase();
token_candidates
.iter()
.all(|candidates| candidates.iter().any(|c| combined.contains(c.as_str())))
};|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a synonym expansion feature to the gws skills search command, significantly improving its usability for natural language queries. It also includes a major refactoring of the skill generation logic, moving generated skill files into a skills/references subdirectory to separate them from user-facing persona and recipe skills. This is a great structural improvement. The PR also adds a new Python script to validate markdown links, which will help maintain documentation quality. The implementation is well-structured, with new logic for search and registries cleanly separated into new modules. I've found one high-severity issue where a whitespace-only query would incorrectly match all skills, and I've provided a suggestion to fix it.
| let mut raw_tokens = Vec::new(); | ||
| for arg in &args[1..] { | ||
| for token in arg.split_whitespace() { | ||
| raw_tokens.push(token.to_lowercase()); | ||
| } | ||
| } | ||
| let query_display = raw_tokens.join(" "); |
There was a problem hiding this comment.
A search query consisting only of whitespace characters will result in raw_tokens being empty. This causes the search logic to match all skills because token_candidates.iter().all() on an empty iterator returns true.
To fix this, you should add a check for empty raw_tokens after parsing the arguments.
I've also suggested a small refactoring to use iterators for a more concise way to build raw_tokens.
| let mut raw_tokens = Vec::new(); | |
| for arg in &args[1..] { | |
| for token in arg.split_whitespace() { | |
| raw_tokens.push(token.to_lowercase()); | |
| } | |
| } | |
| let query_display = raw_tokens.join(" "); | |
| let raw_tokens: Vec<String> = args[1..] | |
| .iter() | |
| .flat_map(|arg| arg.split_whitespace()) | |
| .map(|s| s.to_lowercase()) | |
| .collect(); | |
| if raw_tokens.is_empty() { | |
| return Err(GwsError::Validation( | |
| "Search query cannot be empty. Usage: gws skills search <query>".to_string(), | |
| )); | |
| } | |
| let query_display = raw_tokens.join(" "); |
|
Fixed a high-severity bug where whitespace-only or empty search queries would incorrectly match all skills. Now, an error is returned if no valid search tokens are identified. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable gws skills search command with synonym expansion, significantly improving skill discoverability. The accompanying refactoring to a hierarchical skills/references/ directory structure is a good organizational improvement. The new link-checking script is also a welcome addition for repository maintenance. My review identifies one high-severity security issue related to sanitizing user input in the new search command to prevent terminal escape sequence injection.
| raw_tokens.push(token.to_lowercase()); | ||
| } | ||
| } | ||
| let query_display = raw_tokens.join(" "); |
There was a problem hiding this comment.
The search query is constructed from user input and later printed to the terminal on line 139 without sanitization. This could allow for terminal escape sequence injection if a user provides a malicious query, for example by including ANSI escape codes in the search terms. It's important to sanitize any user-provided string before printing it to the terminal. This aligns with the general rule to 'Sanitize error strings printed to the terminal to prevent escape sequence injection.' While this is not an error string, the principle of sanitizing all user-controlled output applies.
| let query_display = raw_tokens.join(" "); | |
| let query_display: String = raw_tokens.join(" ").chars().filter(|c| !c.is_control()).collect(); |
References
- Sanitize error strings printed to the terminal to prevent escape sequence injection.
Description
Extends
gws skills search(introduced in #507) with a static synonym expansion table so agents don't need to know exact API names to find the right skill.Problem:
gws skills search emailreturned no results because "email" does not appear literally in the Gmail service'sapi_name("gmail") oraliases. Agents using natural language to discover skills were forced to guess canonical names.Solution: A static
SYNONYMStable maps 30+ common terms to their canonical service names. Each query token is expanded before matching using token-AND / synonym-OR semantics — every token must match, but it is satisfied if any of its expansions appears in the skill's fields.emailemail→gmail)spreadsheetspreadsheet→sheets)scheduleschedule→calendar)documentdocument→docs)presentationpresentation→slides)send emailgws-gmail-sendhelperupload filegws-drive-uploadhelperDry Run Output:
(This is a local discovery command — no JSON API request is produced, so no
--dry-runJSON applies.)Checklist:
AGENTS.mdguidelines (no generatedgoogle-*crates).cargo fmt --allto format the code perfectly.cargo clippy -- -D warningsand resolved all warnings.pnpx changeset) to document my changes.