Skip to content

Plural representations should not change results #46

@m2ux

Description

@m2ux

Problem Statement

Searches for "pattern" and "patterns" (or other singular/plural pairs) can produce significantly different results, leading to inconsistent retrieval quality. Users expect morphological variants to return equivalent results.

Current Behavior

The QueryExpander performs:

  1. ✅ Lowercasing
  2. ✅ Punctuation removal
  3. ✅ Short term filtering (length ≤ 2)
  4. ✅ WordNet synonym expansion
  5. No lemmatization/stemming
// Current: src/concepts/query_expander.ts, lines 29-34
const originalTerms = queryText
    .toLowerCase()
    .split(/\s+/)
    .filter(term => term.length > 2)
    .map(term => term.replace(/[^\w\s]/g, ''))
    .filter(term => term.length > 0);

Example of Inconsistent Results

Query Top Result Score
"design pattern" Clean Architecture Ch.5 0.87
"design patterns" Gang of Four Introduction 0.82

The plural form matches a different document because:

  • BM25 does exact keyword matching ("patterns" ≠ "pattern")
  • Concept matching fails if concept is stored as singular ("design pattern")
  • Vector similarity partially compensates but doesn't fully resolve

Test Cases

Original Lemma Notes
patterns pattern Regular plural
children child Irregular plural
running run Verb form
better good Comparative adjective
analyses analysis Greek plural
criteria criterion Latin plural
matrices matrix Latin plural

Acceptance Criteria

  • Query "patterns" returns same top-5 results as "pattern"
  • Query "running tests" ≈ "run test" in result ordering
  • Lemmatization adds ≤50ms to query latency
  • Common irregular forms are handled correctly
  • Original query terms are still included (not replaced by lemmas)

Dependencies

  • NLTK WordNet Lemmatizer (Python, already installed for WordNet)
  • OR wink-lemmatizer (JavaScript alternative): npm install wink-lemmatizer

References

  • src/concepts/query_expander.ts - Current query expansion implementation
  • src/wordnet/wordnet_service.ts - WordNet integration (can extend)
  • NLTK Lemmatizer
  • wink-lemmatizer - JavaScript alternative

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions