Skip to content

Conversation

@ebembi-crdb
Copy link
Contributor

Problem

Searches for version numbers with different formats returned inconsistent results:

  • Search for "26.1" returned 14 results
  • Search for "v26.1" returned only 1 result

This discrepancy occurred because the bloat removal system was filtering standalone version
numbers like "v26.1" as "version spam," even when they appeared in legitimate documentation
contexts.

Root Cause

The existing version spam filter used a blanket approach that filtered all standalone version
numbers matching ^v\d+\.\d+(\.\d+)?(-beta\.\d+)?\s*$, regardless of context. This removed
legitimate version references from release pages while also filtering navigation spam.

Solution

Implemented context-aware version filtering that distinguishes between legitimate version
references and actual spam:

What Gets Preserved:

  • ✅ Version numbers in release pages (/releases/, release notes, changelogs)
  • ✅ Complex version strings (v26.1.0-beta.1)
  • ✅ Version numbers in descriptive sentences
  • ✅ Plain number searches (26.1) continue to work unchanged

What Gets Filtered:

  • ❌ Standalone short version numbers in navigation/UI contexts
  • ❌ Beta version identifiers outside release context
  • ❌ Version selector spam

Changes

  • Added _is_version_spam() method with context-aware logic
  • Modified is_bloat_content() to use contextual version filtering
  • Preserved existing bloat removal for non-version content

Impact

  • Both "26.1" and "v26.1" searches now return equivalent, relevant results
  • Version-specific documentation becomes discoverable regardless of search format
  • Navigation spam continues to be filtered appropriately
  • No impact on other search functionality

Testing

  • All contextual filtering scenarios tested and validated
  • Verified preservation of legitimate version content
  • Confirmed filtering of actual version spam
  • No regressions in existing search behavior

Improve version number search by implementing context-aware filtering
that distinguishes between legitimate version references and navigation spam.
Previously, searches for 'v26.1' returned significantly fewer results
than '26.1' due to overly aggressive version spam filtering.

The fix preserves version numbers in release pages, changelogs, and
complex version strings while still filtering UI navigation spam.
This ensures both search formats return equivalent, relevant results.

Resolves issue where version searches with 'v' prefix were less
discoverable than plain number searches.
@netlify
Copy link

netlify bot commented Oct 24, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 42f81bc
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/68fbc682e25cb800088fd394

@netlify
Copy link

netlify bot commented Oct 24, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 42f81bc
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/68fbc682f9f2cc0008d1b36d

@github-actions
Copy link

Files changed:

  • src/current/algolia_index_intelligent_bloat_removal.py

@netlify
Copy link

netlify bot commented Oct 24, 2025

Netlify Preview

Name Link
🔨 Latest commit 42f81bc
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/68fbc682d7f3ed0008c84c03
😎 Deploy Preview https://deploy-preview-20776--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant