fix(ingredient_parser): sanitize NLTK logs before JSON.parse#27
Draft
charliecreates[bot] wants to merge 2 commits intomainfrom
Draft
fix(ingredient_parser): sanitize NLTK logs before JSON.parse#27charliecreates[bot] wants to merge 2 commits intomainfrom
charliecreates[bot] wants to merge 2 commits intomainfrom
Conversation
…nce stdout in Python helper
The `ingredient_parser` Python package lazily downloads NLTK data and
prints progress logs (e.g. "Downloading required NLTK resour…") to
stdout. Because our Ruby `IngredientParser` expected *only* JSON, these
logs triggered `JSON::ParserError` in production (REC-7).
Changes
- Ruby: add `sanitize_output` to strip everything before the first `{` or
`[` token, rescue and log `JSON::ParserError` with context.
- Python: wrap `parse_ingredient` call in `contextlib.redirect_stdout` so
any library chatter stays hidden. Added file-level docstring for
rationale.
Downstream extractors (`SchemaExtractor::Ingredients`,
`RecipeExtractor`) receive fully parsed data again.
Closes https://linear.app/recipin/issue/REC-7
Owner
|
@CharlieHelps When I check on my current deployed container, the output is correct. Is this download re-happening every time after each deployment due to the container getting tossed? Should we pre-bake the python lib either in Docker or Kamal? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The external
ingredient_parserPython package downloads NLTK corpora thefirst time it runs and prints progress messages to stdout. Because our
Ruby
IngredientParserexpects only a JSON document, the stray log linesbreak parsing and raise
JSON::ParserError(see REC-7 / Sentry #RECIPIN-3).Changes
sanitize_outputto strip non-JSON prefixes, rescue & logJSON::ParserErrorwith sample payload.parse_ingredientusingcontextlib.redirect_stdout.Verification
Manual test:
(The command previously printed NLTK download text before the JSON.)
Linked issues
Closes https://linear.app/recipin/issue/REC-7