-
Notifications
You must be signed in to change notification settings - Fork 0
Plural representations should not change results #46
Copy link
Copy link
Open
Description
Problem Statement
Searches for "pattern" and "patterns" (or other singular/plural pairs) can produce significantly different results, leading to inconsistent retrieval quality. Users expect morphological variants to return equivalent results.
Current Behavior
The QueryExpander performs:
- ✅ Lowercasing
- ✅ Punctuation removal
- ✅ Short term filtering (length ≤ 2)
- ✅ WordNet synonym expansion
- ❌ No lemmatization/stemming
// Current: src/concepts/query_expander.ts, lines 29-34
const originalTerms = queryText
.toLowerCase()
.split(/\s+/)
.filter(term => term.length > 2)
.map(term => term.replace(/[^\w\s]/g, ''))
.filter(term => term.length > 0);Example of Inconsistent Results
| Query | Top Result | Score |
|---|---|---|
| "design pattern" | Clean Architecture Ch.5 | 0.87 |
| "design patterns" | Gang of Four Introduction | 0.82 |
The plural form matches a different document because:
- BM25 does exact keyword matching ("patterns" ≠ "pattern")
- Concept matching fails if concept is stored as singular ("design pattern")
- Vector similarity partially compensates but doesn't fully resolve
Test Cases
| Original | Lemma | Notes |
|---|---|---|
| patterns | pattern | Regular plural |
| children | child | Irregular plural |
| running | run | Verb form |
| better | good | Comparative adjective |
| analyses | analysis | Greek plural |
| criteria | criterion | Latin plural |
| matrices | matrix | Latin plural |
Acceptance Criteria
- Query "patterns" returns same top-5 results as "pattern"
- Query "running tests" ≈ "run test" in result ordering
- Lemmatization adds ≤50ms to query latency
- Common irregular forms are handled correctly
- Original query terms are still included (not replaced by lemmas)
Dependencies
- NLTK WordNet Lemmatizer (Python, already installed for WordNet)
- OR wink-lemmatizer (JavaScript alternative):
npm install wink-lemmatizer
References
src/concepts/query_expander.ts- Current query expansion implementationsrc/wordnet/wordnet_service.ts- WordNet integration (can extend)- NLTK Lemmatizer
- wink-lemmatizer - JavaScript alternative
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels