Merged
Conversation
…PT-5.2) Replace the fragile 6-tier definition pipeline (plaintext parser → REST API → kaikki native → kaikki English → LLM → disk cache) with a clean 2-tier system: disk cache → GPT-5.2 structured JSON output. Key changes: - New webapp/definitions.py (~250 lines) replaces webapp/wiktionary.py (~950 lines) - GPT-5.2 with structured JSON output returns definition_native + definition_en - Confidence scoring (threshold 0.3) prevents hallucination - DALL-E image generation now uses definition_en (fixes wrong images) - Pre-generation script for daily cron (scripts/pregenerate_definitions.py) - Old parser code archived to webapp/deprecated/ (not deleted) - Tests moved to tests/deprecated/, new tests in tests/test_definitions.py - Frontend updated with new definition fields (definitionNative, definitionEn) Also includes: SEO description improvements for 55+ languages, kaikki definition quality improvements (sense-count selection), and template updates. Tested: 2025 Python tests pass, 81 frontend tests pass, real LLM definitions verified across 18 words in 12+ languages including edge cases.
|
Important Review skippedToo many files! This PR contains 225 files, which is 75 over the limit of 150. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (225)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the fragile 6-tier definition pipeline with a clean 3-tier system: disk cache → GPT-5.2 → kaikki (offline Wiktionary).
Why
What changed
webapp/definitions.py(~310 lines) — GPT-5.2 structured JSON output with confidence scoring, dual definitions (native + English), disk caching, kaikki fallbackscripts/pregenerate_definitions.py— daily cron pre-generates definitions so runtime never hits the LLMdefinition_en(English) instead of native-language definitionsdefinitionNative,definitionEn,confidence)webapp/deprecated/andtests/deprecated/(not deleted, recoverable)strip_html/import re, kaikki results now include Wiktionary URLsDefinition flow
Testing
Post-merge TODOs
scripts/pregenerate_definitions.py --backfill 30to populate cache for past daily wordsTest plan
tests/test_definitions.pycovers LLM parsing, cache behavior, kaikki fallback, backward compatpnpm build)@coderabbitai full review