Skip to content

feat: add fuzzy matching for TermQuery with Fuzziness enum#93

Merged
poyrazK merged 7 commits intomainfrom
feat/fuzzy-matching
May 7, 2026
Merged

feat: add fuzzy matching for TermQuery with Fuzziness enum#93
poyrazK merged 7 commits intomainfrom
feat/fuzzy-matching

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented May 5, 2026

Summary

  • Add Fuzziness enum (Auto or Exact(usize)) to cloudsearch-common
  • Add fuzziness: Option<Fuzziness> field to TermQuery
  • Implement fuzzy_term_match() and levenshtein_distance() in cloudsearch-index
  • When fuzziness is set, uses edit distance threshold instead of exact equality

Fuzziness behavior

  • Auto — 0 edit distance for 1-2 char terms, 1 for 3-5 chars, 2 for 6+ chars
  • Exact(n) — allow edit distance <= n

API Examples

JSON API:

{"term": {"field": "name", "value": "admin", "fuzziness": "auto"}}
{"term": {"field": "name", "value": "admn", "fuzziness": 2}}

Query string:

name:admin~auto
name:admn~2

Test plan

  • cargo test --workspace — all 387 tests pass
  • cargo clippy --workspace --all-targets — clean

Add fuzziness parameter to TermQuery supporting Auto (0/1/2 edit distance
based on term length) and Exact(n) modes. When fuzziness is set, matching
uses Levenshtein edit distance instead of exact equality.

- Add Fuzziness enum (Auto, Exact(usize)) to cloudsearch-common
- Add fuzziness field to TermQuery with serde skip_serializing_if
- Implement fuzzy_term_match() and levenshtein_distance() in cloudsearch-index
- Update all TermQuery usages across the codebase to include fuzziness: None
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 21 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84a05b6e-5b7b-42a0-8c9f-f7ed037094cc

📥 Commits

Reviewing files that changed from the base of the PR and between 23cb888 and ececa5e.

📒 Files selected for processing (6)
  • rust/crates/cloudsearch-api/src/lib.rs
  • rust/crates/cloudsearch-api/src/query_string.rs
  • rust/crates/cloudsearch-common/src/lib.rs
  • rust/crates/cloudsearch-common/tests/round_trip.rs
  • rust/crates/cloudsearch-index/src/lib.rs
  • rust/crates/cloudsearch-index/tests/coverage.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/fuzzy-matching

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

poyrazK added 6 commits May 5, 2026 18:32
- Add levenshtein_distance unit tests (empty, identical, one_edit,
  case_sensitive, complex)
- Add fuzzy_term_match unit tests (exact match, Auto mode, numeric values)
- Add three integration tests in coverage.rs covering edit distance
  matching, Auto threshold, and threshold rejection
- Fix score_query to properly reject fuzzy matches that return Some(false)
- Add Fuzziness serde rename_all = "lowercase" for JSON auto/exact
- Parse "fuzziness" key in parse_term_query (accepts "auto" or integer)
- Add ~suffix query string syntax: field:value~auto, field:value~2
- Add query_has_fuzzy_term helper to detect fuzzy in nested Bool
- Validate search_after + fuzzy query combination in validate_search_request
- Add validate_search_request_rejects_fuzzy_with_search_after test
- Add doc comment on fuzzy_term_match return value semantics
- Add comment explaining why unreachable!() is intentional in fuzzy_term_match
  (guarded by is_none() check above, compilation failure preferred to wrong answers)
- Add parse_term_query unit tests for fuzziness parsing (auto, AUTO, integer,
  zero, missing, wrong type, unknown string)
- Add note about ~ suffix vs wildcard detection order in query_string parser
Copy link
Copy Markdown
Owner Author

@poyrazK poyrazK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to merge

@poyrazK poyrazK merged commit 189579c into main May 7, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant