fix(parser): clean up resources on parse failure#101
fix(parser): clean up resources on parse failure#101Co-Messi wants to merge 2 commits intorun-llama:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5cebae6fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (this.ocrEngine && "terminate" in this.ocrEngine) { | ||
| await cleanupResource(async () => { | ||
| await (this.ocrEngine as TesseractEngine).terminate(); | ||
| }); |
There was a problem hiding this comment.
Guard OCR termination to the current parse invocation
LiteParse keeps this.ocrEngine on the parser instance, but this finally block now calls terminate() on every parse exit, including failures that occur before OCR work starts. If callers run parse() concurrently on the same instance, a fast-failing call can tear down Tesseract while another call is still in runOCR, and processPageOcr will swallow those OCR errors and return degraded output. Please only terminate OCR resources that were actually used by this invocation (or otherwise prevent concurrent use) to avoid this regression.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This feels like a sensible suggestion? A little worried that this might destroy concurrent calls to parse()
There was a problem hiding this comment.
Adjusted this so OCR termination only runs if that parse invocation actually entered OCR work. A failure before runOCR() now still closes the PDF and cleans up temp files, but it no longer tears down the shared Tesseract engine for another concurrent parse. I also updated the parser tests to cover that pre-OCR failure path while keeping termination on successful OCR parses.
|
whoops, I let this sit a little too long. Seems like some gnarly merge conflicts |
Problem
LiteParse.parse()only cleans up the converted PDF, closes the PDF document, and terminates Tesseract on the happy path. If parsing throws after conversion, the temporary conversion directory and OCR worker cleanup are skipped.What changed
finallypathWhy this fixes it
This keeps resource cleanup tied to parser lifetime instead of parser success, so failed parses do not leak temp files or OCR workers.
Verification
npm test -- src/core/parser.test.tsnpm run format:checknpm run buildFixes #100