Add basic search to MediaWiki static site generator #4

philpax · 2025-11-18T08:53:52Z

This commit implements a MediaWiki-style search feature for the static site generator.

Features:

Multi-word search support
Relevance scoring (title > headings > content)
Result snippets showing query context
Yellow highlighting of matching terms
Up to 20 results per query

This commit implements a MediaWiki-style search feature for the static site generator: Backend (Rust): - Added SearchEntry struct to represent indexed pages - Implemented extract_search_data() to extract text content and headings from wikitext AST - Modified generate_wiki() to build and write search-index.json during site generation - Added serde dependency for JSON serialization Frontend (JavaScript): - Created WikiSearch class with intelligent ranking algorithm - Implemented prefix and substring matching for queries - Added result highlighting with context snippets - Lazy-loading of search index on first interaction - Debounced search input for performance UI: - Added search bar to navigation with dropdown results - Styled with Tailwind CSS for consistency - Shows highlighted matches in titles and content snippets - Displays relevant headings for each result Features: - Multi-word search support - Relevance scoring (title > headings > content) - Result snippets showing query context - Yellow highlighting of matching terms - Up to 20 results per query

- Remove redundant if/else with identical branches in extract_search_data - Apply rustfmt formatting throughout codebase

Performance improvements: - Changed from linear full-text search to inverted index lookup - Search index now maps words to page indices instead of storing full content - Title words weighted 3x higher for better ranking - Multi-word queries require all words to be present Benefits: - O(k) lookup time where k = number of query words (vs O(n) scanning all pages) - Smaller memory footprint during search - Faster query responses - Better relevance ranking Index structure: - pages: array of {title, url, headings} - words: map of word -> [page indices with occurrence counts] Stats: - 1,208 pages indexed - 5,585 unique words - 442KB index file

Performance optimization: - Generate individual .txt files for each page (alongside .html) - Text files contain extracted searchable content (~376KB total) - JavaScript loads text on-demand for top 5 results only - Snippets show highlighted context around query matches - Text files cached in-memory after first load Benefits: - Initial search remains fast (426KB index) - Average text file only 0.3KB (~300 bytes) - Only ~1.5KB additional data loaded per search (5 × 0.3KB) - Better UX with context snippets for top results - Headings still shown for results 6-20 File structure: /wiki/Lua/Client/Window.html (36KB HTML) /wiki/Lua/Client/Window.txt (465B text content) /wiki/Lua/Client/Window.json (debug AST) Stats: - 1,208 .txt files generated - Search index: 426KB (unchanged from pure metadata) - Text files: 376KB total (loaded on-demand)

Index structure optimizations: - Pages now stored as simple title array (not objects) - URLs derived client-side from titles (title.replace(' ', '_')) - Removed headings from index (loaded from .txt if needed) - Use (page_id, weight) tuples instead of duplicate entries Weighting system: - Title words: weight 5 (highest priority) - Heading words: weight 3 (medium priority) - Content words: weight 1 (base priority) - Uses max() when word appears in multiple contexts Results: - Index size: 426KB → 298KB (30% reduction, 128KB saved) - 1,208 pages indexed - 5,585 unique words - Smarter relevance ranking with explicit weights Example index format: { "pages": ["Lua/Client/Window", "Feature Overview", ...], "words": { "window": [[173, 5], [174, 5], [9, 1], ...], ... } } Where [173, 5] means page 173 with weight 5 (title match)

With average text files of ~300 bytes, loading all 20 results only costs ~6KB total. This provides better UX by showing context snippets for all results, not just the top 5. Benefits: - All results show highlighted context snippets - Still efficient (~6KB for 20 results) - Text files cached after first load - Parallel loading keeps it fast

claude added 7 commits November 18, 2025 08:51

Fix clippy warning and apply rustfmt

cf6f660

- Remove redundant if/else with identical branches in extract_search_data - Apply rustfmt formatting throughout codebase

Fix search index URLs to include /wiki/ prefix

59c528a

philpax marked this pull request as ready for review November 18, 2025 10:07

philpax merged commit f6c7140 into main Nov 18, 2025
3 checks passed

philpax deleted the claude/add-search-feature-014pEUUDjASbdvUPmQVpjrTS branch November 18, 2025 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add basic search to MediaWiki static site generator #4

Add basic search to MediaWiki static site generator #4

Uh oh!

philpax commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add basic search to MediaWiki static site generator #4

Add basic search to MediaWiki static site generator #4

Uh oh!

Conversation

philpax commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants