Skip to content

Comments

feat(windows): add hidden OCR context capture from frontmost window for post-processing#808

Closed
evrenesat wants to merge 1 commit intocjpais:mainfrom
evrenesat:windows-ocr-pr-main
Closed

feat(windows): add hidden OCR context capture from frontmost window for post-processing#808
evrenesat wants to merge 1 commit intocjpais:mainfrom
evrenesat:windows-ocr-pr-main

Conversation

@evrenesat
Copy link

Summary

This PR adds Windows support for the hidden OCR context template variable (${OCR} / ${ocr}) used in post-processing prompts.(Windows counter-part of #770)

The captured text is read from the current foreground window and injected only into the prompt payload.

Why

This improves prompt quality for post-processing while keeping the app local-first and extensible:

  • Better context-aware rewriting/correction prompts
  • No cloud dependency added
  • Clear platform-specific implementation that is easy to fork/extend

What changed

  • Added Windows OCR capture module:
    • src-tauri/src/windows_ocr.rs
  • Wired Windows OCR into prompt-building flow:
    • src-tauri/src/actions.rs
  • Registered Windows OCR module:
    • src-tauri/src/lib.rs
  • Added required Windows API feature flags:
    • src-tauri/Cargo.toml

Behavior

  • OCR context is only used if prompt contains ${OCR} or ${ocr}.
  • If OCR capture fails, processing continues with empty OCR context.
  • On unsupported platforms for this implementation path, OCR context resolves to empty string.

Testing

Tested on Windows:

  1. Configure a post-processing prompt containing ${OCR}.
  2. Start transcription with post-processing enabled.
  3. Verified prompt receives OCR text from frontmost window.
  4. Verified fallback to empty OCR context when capture fails.
  5. Verified normal transcription flow is unchanged.

Note: I have built and manually tested this on a Windows 11 ARM guest running on MacOS. Even with help of Codex, it took hours to build on arm64. I tested that post-processing works both with and without OCR prompts. In my limited testing, with OCR, model consistently managed to use correct symbols in dictation result (e.g. createUpdaterArtifacts instead of "create-updater-artifacts" or "Create Updater Artifacts" or "create updater artifacts").

Scope / Non-goals

  • This PR is Windows-only OCR.
  • No local machine build-workaround changes are included in this PR. (hopefully)
  • No updater/signing config changes included.

Breaking changes

None.

AI Assistance Disclosure

  • AI used: Yes
  • Tools used: GPT-5 Codex / ChatGPT
  • Extent: Assisted with implementation scaffolding, conflict resolution, and debugging workflow; final code reviewed and validated manually.

Note: I'm planning to try to create another PR for Linux as well, using Tesseract.

@cjpais
Copy link
Owner

cjpais commented Feb 16, 2026

@evrenesat would you mind merging this into the same PR as #770? Would rather have the feature come in all at once when I decide to merge it. Closing this PR for now, I'm trying to slim down the number of active PR's so I can have some sanity and manage maintenance.

@cjpais cjpais closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants