feat(windows): add hidden OCR context capture from frontmost window for post-processing#808
Closed
evrenesat wants to merge 1 commit intocjpais:mainfrom
Closed
feat(windows): add hidden OCR context capture from frontmost window for post-processing#808evrenesat wants to merge 1 commit intocjpais:mainfrom
evrenesat wants to merge 1 commit intocjpais:mainfrom
Conversation
Owner
|
@evrenesat would you mind merging this into the same PR as #770? Would rather have the feature come in all at once when I decide to merge it. Closing this PR for now, I'm trying to slim down the number of active PR's so I can have some sanity and manage maintenance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Windows support for the hidden OCR context template variable (
${OCR}/${ocr}) used in post-processing prompts.(Windows counter-part of #770)The captured text is read from the current foreground window and injected only into the prompt payload.
Why
This improves prompt quality for post-processing while keeping the app local-first and extensible:
What changed
src-tauri/src/windows_ocr.rssrc-tauri/src/actions.rssrc-tauri/src/lib.rssrc-tauri/Cargo.tomlBehavior
${OCR}or${ocr}.Testing
Tested on Windows:
${OCR}.Note: I have built and manually tested this on a Windows 11 ARM guest running on MacOS. Even with help of Codex, it took hours to build on arm64. I tested that post-processing works both with and without OCR prompts. In my limited testing, with OCR, model consistently managed to use correct symbols in dictation result (e.g. createUpdaterArtifacts instead of "create-updater-artifacts" or "Create Updater Artifacts" or "create updater artifacts").
Scope / Non-goals
Breaking changes
None.
AI Assistance Disclosure
Note: I'm planning to try to create another PR for Linux as well, using Tesseract.