-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context
This issue tracks recommendations for consolidating CrateDB search documentation following the addition of new getting-started content in PR #264.
It looks like the idea, layout and structure of the ingredients in this patch have been derived from the existing introductory section about CrateDB's search features in one way or another, maybe also its content.
It can be a good alternative to invest into improving the canonical existing pages instead of duplicating the same topic once more again, based on content that has apparently been created using GenAI / LLMs (large language models), so it is blurry and possibly incorrect, at least.
Maybe the most important ideas from the new content can be curated and merged into the existing pages, together with suggestions how to improve their guidance?
Please clarify.
Related:
- PR: Getting started / Search: Add new section (GenAI, edited) #264
- Discussion: Getting started / Search: Add new section (GenAI, edited) #264 (comment)
Current Situation
The repository now has two sets of search documentation:
-
Existing comprehensive documentation:
docs/feature/search/(~1,185 lines)- Well-structured with subdirectories (fts/, geo/, vector/, hybrid/)
- Detailed technical content with advanced topics
- Includes analyzer configuration, tutorials, best practices
-
New getting-started content:
docs/start/query/search/(~555 lines)- Simplified introductory content (47% size of existing)
- Flat file structure
- Cleaned up from initial GenAI generation but still has gaps
Quality Assessment
Improvements made:
- Most GenAI slop removed (only 2 "AI-powered" phrases remain)
- Content is cleaner and more focused
Remaining concerns:
- Technical accuracy gaps: BM25 mentions: 6 in new content vs. 22 in existing
fulltext.mdhas 0 BM25 references (vs. 16 infts/index.md)
- Formatting artifacts: Stray "sqlCopierModifier--" found in
geo.md - Topic duplication: All four main topics covered in both locations
- Maintenance burden: Two documentation sets to maintain
Recommendations
1. Consolidate, Don't Duplicate
- Keep
docs/feature/search/as the canonical, comprehensive documentation - Extract genuinely useful quick-start examples from
docs/start/query/search/ - Integrate them into existing documentation as introductory sections
- Remove duplicate content to maintain single source of truth
2. Improve Existing Documentation
- Add "Getting Started" or "Quick Start" sections to each topic in
docs/feature/search/ - Include simple, practical examples before diving into advanced features
- Improve navigation and cross-linking between getting-started and in-depth content
- Consider restructuring to have beginner-friendly entry points
3. Maintain Single Source of Truth
- Revert the toctree change from
search/indexback to../../feature/search/indexindocs/start/query/index.md - Or create a simple landing page in
docs/start/query/search/that links to sections withindocs/feature/search/ - Focus all improvement efforts on the existing, technically superior documentation
- Avoid parallel content that can diverge over time
4. Quality Control
- Address technical accuracy gaps (e.g., proper BM25 coverage in fulltext search)
- Fix formatting artifacts (e.g., "sqlCopierModifier--" in geo.md)
- Ensure all examples are tested and accurate
- Review for completeness against the comprehensive existing documentation
Benefits of Consolidation
- Reduced maintenance burden: Single documentation set to update
- Improved accuracy: Focus quality efforts on one authoritative source
- Better user experience: Clear path from beginner to advanced topics
- Avoid confusion: No conflicting or duplicate information
Implementation Approach
- Audit both documentation sets for unique, valuable content
- Identify gaps in existing documentation that new content addresses
- Create enhancement plan for
docs/feature/search/with beginner sections - Migrate valuable examples and patterns
- Update navigation and cross-references
- Remove or redirect duplicate content