fix: sanitize terminal app accessibility context to prevent garbled transcription#99
fix: sanitize terminal app accessibility context to prevent garbled transcription#99MundusCaeli wants to merge 4 commits intoamicalhq:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a "terminal" AppType and terminal bundle mappings; strips ANSI escape sequences from accessibility text in transcription paths; maps terminal apps to "default" in cloud payloads; improves cloud fetch error handling and auth-retry; and normalizes pre-selection text for Whisper by removing ANSI codes. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Desktop App
participant Formatter as formatter-prompt
participant Whisper as whisper-provider
participant Cloud as Amical Cloud
App->>Formatter: send bundleId / accessibilityContext
Formatter-->>App: return appType (e.g., "terminal")
App->>Whisper: request initial prompt with accessibilityContext
alt appType == "terminal"
Whisper-->>App: strip ANSI from preSelectionText -> normalized prompt
else
Whisper-->>App: generate prompt (raw text)
end
App->>Cloud: build sharedContext (IIFE: map terminal->"default", strip ANSI where needed)
App->>Cloud: POST transcription request
Cloud-->>App: 200 / transcription or error
alt 401
App->>Cloud: refresh token + retry (once)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@MundusCaeli, I have been using iterm2 (plus oh my zsh) pretty extensively with Amical, and never ran into this. Have also used warp extensively and never ran into this issue. Would you be able to provide a minimal example of what your terminal contents were when you ran into this issue. |
…ranscription Terminal emulators (iTerm2, Ghostty, Terminal.app, Warp, kitty, etc.) expose command output, ANSI escape sequences, and shell prompt strings through the macOS Accessibility API. When this raw text is sent as sharedContext (beforeText/afterText) to the cloud transcription endpoint or used as initial_prompt for local Whisper, it degrades recognition quality -- causing garbled output especially for non-English (e.g. Japanese) dictation. Changes: - Add 'terminal' AppType with 8 common terminal bundle identifiers - Strip selectedText/beforeText/afterText from sharedContext for terminal apps in AmicalCloudProvider - Skip noisy accessibility context in WhisperProvider.generateInitialPrompt() for terminal apps - Add terminal-specific formatting rules and examples Fixes garbled transcription when dictating into terminal applications. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…dation error The cloud API does not yet recognize 'terminal' as a valid appType, returning 400 INVALID_REQUEST. Map terminal appType to 'default' in the request while still stripping noisy accessibility context fields. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
90a7763 to
38d0498
Compare
ReproductionThe issue occurs when the terminal has large amounts of output — specifically from agentic CLI tools (e.g., Factory Droid, Claude Code) that produce thousands of lines of mixed content (ASCII art, escape sequences, code snippets, conversation logs). A normal shell session with short commands and output does not trigger it, which explains why you have not seen it with regular iTerm2/Warp usage. Minimal reproduction:
Root Cause DiscoveryDuring debugging, we found the problem is actually two-fold:
Current FixThe current PR:
Your SuggestionI agree that stripping the entire surrounding context is too aggressive — for agentic CLIs the previous text genuinely helps with context. A better approach would be:
Happy to update the PR with a more selective sanitization approach if you prefer. Would a length-limited + ANSI-stripped version work for you? |
|
@MundusCaeli Thanks for this. I think we can just strip out the ANSI characters across the board for now and leave the app type as the default since no special prompt instructions need be applied. |
|
Thanks for the guidance @haritabh-z01! One concern I want to flag: in our case, the terminal accessibility context was tens of thousands of characters (from an agentic CLI session like Factory Droid). Even after stripping ANSI sequences, the remaining plain text is still very large. If the existing trimming on the cloud/local side handles that volume well, then ANSI stripping alone should be sufficient — but I wanted to mention it in case there are edge cases with extremely long terminal buffers. I will update the PR to:
Will test with the same reproduction scenario (large agentic CLI output in Ghostty/iTerm2, Japanese dictation) and report back. |
…ntext Per reviewer feedback, replace the full context stripping for terminal apps with ANSI escape sequence removal applied across the board. Terminal-specific formatting rules and examples are removed since the default rules apply. The terminal AppType and bundle ID detection are kept so appType can be sent as 'default' to avoid server validation errors until the server recognizes 'terminal'. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Updated the PR per your suggestion:
Tested with Ghostty as frontmost app running a large agentic CLI session (Factory Droid), Japanese dictation — transcription works correctly now. Your approach was right, ANSI stripping alone is sufficient. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@apps/desktop/src/pipeline/providers/formatting/formatter-prompt.ts`:
- Around line 41-42: APP_TYPE_RULES.terminal and APP_TYPE_EXAMPLES.terminal are
set to empty strings which prevents the nullish-coalescing (??) in the template
from falling back to defaults; either change those terminal values to undefined
(so APP_TYPE_RULES[appType] ?? APP_TYPE_RULES.default works as intended) or
update the template to use logical OR (||) instead of ?? to treat empty string
as falsy; if empty string is intentional (meaning no app-specific
rules/examples), add a clarifying comment next to
APP_TYPE_RULES.terminal/APP_TYPE_EXAMPLES.terminal indicating that behavior.
In `@apps/desktop/src/pipeline/providers/transcription/amical-cloud-provider.ts`:
- Around line 297-325: The selectedText in the sharedContext closure is not
sanitized like beforeText/afterText; update the closure where sharedContext is
built (inside the currentAccessibilityContext check that calls
detectApplicationType) to pass the selectedText through stripAnsiSequences as
well (same way preSelectionText/postSelectionText are handled), i.e. replace the
direct use of currentAccessibilityContext!.context?.textSelection?.selectedText
with a stripped version using stripAnsiSequences so selectedText, beforeText and
afterText are consistently sanitized.
🧹 Nitpick comments (3)
apps/desktop/src/pipeline/providers/transcription/whisper-provider.ts (1)
281-287: Duplicated ANSI-stripping regex — extract a shared utility.The same regex and stripping logic is duplicated here and in
amical-cloud-provider.ts(lines 21-28). Consider extracting a sharedstripAnsiSequenceshelper (e.g., in atext-utils.tsmodule) to keep a single source of truth and make future regex improvements apply everywhere.Also, while the
eslint-disablecomment silences ESLint, Biome still flags the control characters (\u001b,\u009b). You may want to add a Biome-specific suppression comment (// biome-ignore lint/suspicious/noControlCharactersInRegex: intentional ANSI matching) or configure this rule inbiome.jsonfor the shared module.Proposed shared utility
Create a new file (e.g.,
apps/desktop/src/pipeline/utils/text-sanitize.ts):// Strip ANSI escape sequences and control characters commonly found in // terminal accessibility output. const ANSI_REGEX = // biome-ignore lint/suspicious/noControlCharactersInRegex: intentional – matching terminal escape codes // eslint-disable-next-line no-control-regex /[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-ORZcf-nqry=><~]/g; export function stripAnsiSequences( text: string | undefined | null, ): string | undefined { if (!text) return undefined; return text.replace(ANSI_REGEX, ""); }Then import and use it in both providers.
apps/desktop/src/pipeline/providers/transcription/amical-cloud-provider.ts (2)
298-325: Consider extracting the IIFE into a named method for readability.The inline IIFE that builds
sharedContextmakes theJSON.stringifypayload harder to scan. A small private method likebuildSharedContext()would improve readability and make the non-null assertions unnecessary (the method can accept the context as a parameter with a non-nullable type).Sketch
+ private buildSharedContext( + accessibilityContext: GetAccessibilityContextResult, + ) { + const appType = detectApplicationType(accessibilityContext); + const sel = accessibilityContext.context?.textSelection; + const app = accessibilityContext.context?.application; + return { + selectedText: stripAnsiSequences(sel?.selectedText), + beforeText: stripAnsiSequences(sel?.preSelectionText), + afterText: stripAnsiSequences(sel?.postSelectionText), + appType: appType === "terminal" ? "default" : appType, + appBundleId: app?.bundleIdentifier, + appName: app?.name, + appUrl: accessibilityContext.context?.windowInfo?.url, + surroundingContext: "", + }; + }Then in the payload:
- sharedContext: this.currentAccessibilityContext - ? (() => { ... })() - : undefined, + sharedContext: this.currentAccessibilityContext + ? this.buildSharedContext(this.currentAccessibilityContext) + : undefined,
18-28: Biome lint suppression needed for intentional control characters.Same as noted in
whisper-provider.ts— theeslint-disablecomment doesn't suppress Biome'snoControlCharactersInRegexrule. Once the regex is extracted to a shared utility (as suggested in the other file), a single Biome suppression comment there will cover both call sites.
| terminal: "", | ||
| default: "", |
There was a problem hiding this comment.
Empty-string entries silently suppress default rules/examples for terminal apps.
APP_TYPE_RULES.terminal and APP_TYPE_EXAMPLES.terminal are "". Because ?? only falls through on null/undefined (not empty string), the template on lines 226 and 231 will inject an empty string rather than the default rules/examples:
${APP_TYPE_RULES[appType] ?? APP_TYPE_RULES.default ?? ""}
// "" → "" (not null, so ?? doesn't trigger)If the intent is for terminal apps to use no app-specific rules (relying on universal rules only), this is fine but worth a brief comment. If the intent is to fall back to default rules, change the values to undefined or use || instead of ??.
Also applies to: 110-111
🤖 Prompt for AI Agents
In `@apps/desktop/src/pipeline/providers/formatting/formatter-prompt.ts` around
lines 41 - 42, APP_TYPE_RULES.terminal and APP_TYPE_EXAMPLES.terminal are set to
empty strings which prevents the nullish-coalescing (??) in the template from
falling back to defaults; either change those terminal values to undefined (so
APP_TYPE_RULES[appType] ?? APP_TYPE_RULES.default works as intended) or update
the template to use logical OR (||) instead of ?? to treat empty string as
falsy; if empty string is intentional (meaning no app-specific rules/examples),
add a clarifying comment next to
APP_TYPE_RULES.terminal/APP_TYPE_EXAMPLES.terminal indicating that behavior.
apps/desktop/src/pipeline/providers/transcription/amical-cloud-provider.ts
Show resolved
Hide resolved
Update: Additional findings from further testingANSI stripping alone is not sufficientAfter more extensive testing, I found that the root cause is not ANSI escape sequences specifically, but the sheer volume of accessibility context text from terminal sessions. Reproduction:
This confirms the issue is the size of the terminal buffer being sent as The existing server-side trimming may not be handling these extreme cases (50,000+ characters of mixed multilingual content). SuggestionIn addition to ANSI stripping (which is still a good improvement), a client-side length limit on The ANSI stripping changes in this PR are still validEven though they don't fully solve the large-context problem, stripping ANSI sequences is a genuine improvement for any terminal accessibility context. I'll keep the PR as-is for the ANSI fix, and the length limiting can be addressed separately if preferred. |
…ents - Apply stripAnsiSequences to selectedText for consistency - Add clarifying comments on intentionally empty terminal entries - Remove extra blank line in whisper-provider.ts Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/desktop/src/pipeline/providers/formatting/formatter-prompt.ts (1)
262-265:⚠️ Potential issue | 🟠 MajorANSI sequences are not stripped from
beforeText/afterTextin the formatting prompt path for terminal applications.
constructFormatterPromptreadspreSelectionText/postSelectionTextdirectly without ANSI stripping. Theamical-cloud-provider.tstranscription provider strips these sequences (lines 304, 308, 312) with the note that "Terminal apps expose raw terminal output via the accessibility API which contains escape codes." Sinceformatter-prompt.tsexplicitly supports terminal app types (lines 291–298), ANSI codes could leak into the LLM formatting prompt if terminal apps provide accessibility context.Extract
stripAnsiSequencesto a shared utility and apply it here for consistency, or document that ANSI stripping is guaranteed upstream.
🤖 Fix all issues with AI agents
In `@apps/desktop/src/pipeline/providers/transcription/amical-cloud-provider.ts`:
- Around line 25-28: The current stripAnsiSequences function treats empty string
as absent because it uses a truthy check; change the guard to only return
undefined for null/undefined (e.g., check text == null or text === undefined ||
text === null) so that stripAnsiSequences("") returns "" and non-null empty
strings are preserved, then continue to call text.replace(ANSI_REGEX, "") for
other inputs; reference the stripAnsiSequences function and ANSI_REGEX when
making this change.
🧹 Nitpick comments (1)
apps/desktop/src/pipeline/providers/transcription/whisper-provider.ts (1)
280-286: Extract the ANSI-stripping regex into a shared utility to avoid duplication.The same regex pattern and stripping logic is duplicated here (inline) and in
amical-cloud-provider.ts(asANSI_REGEX+stripAnsiSequences). Consider extracting a sharedstripAnsiSequenceshelper (e.g., in atext-utils.tsmodule) and reusing it in both providers.♻️ Suggested approach
Create a shared utility, e.g.
apps/desktop/src/pipeline/utils/text-sanitize.ts:// eslint-disable-next-line no-control-regex const ANSI_REGEX = /[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-ORZcf-nqry=><~]/g; export function stripAnsiSequences(text: string | undefined | null): string | undefined { if (!text) return undefined; return text.replace(ANSI_REGEX, ""); }Then in this file:
- const rawBeforeText = - accessibilityContext?.context?.textSelection?.preSelectionText; - // Strip ANSI escape sequences that terminal apps expose via accessibility - // context — they degrade Whisper recognition quality. - const beforeText = rawBeforeText - // eslint-disable-next-line no-control-regex - ?.replace(/[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-ORZcf-nqry=><~]/g, ""); + // Strip ANSI escape sequences that terminal apps expose via accessibility + // context — they degrade Whisper recognition quality. + const beforeText = stripAnsiSequences( + accessibilityContext?.context?.textSelection?.preSelectionText, + );
| function stripAnsiSequences(text: string | undefined | null): string | undefined { | ||
| if (!text) return undefined; | ||
| return text.replace(ANSI_REGEX, ""); | ||
| } |
There was a problem hiding this comment.
stripAnsiSequences("") returns undefined instead of "".
The !text guard is true for empty strings, so an input of "" is coerced to undefined. If the server or any downstream consumer distinguishes between an absent field and an empty string, this could cause subtle differences in behavior.
If this is intentional (empty and absent are equivalent), this is fine. Otherwise, a narrower null-check would preserve empty strings:
♻️ Suggested fix
function stripAnsiSequences(text: string | undefined | null): string | undefined {
- if (!text) return undefined;
+ if (text == null) return undefined;
return text.replace(ANSI_REGEX, "");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function stripAnsiSequences(text: string | undefined | null): string | undefined { | |
| if (!text) return undefined; | |
| return text.replace(ANSI_REGEX, ""); | |
| } | |
| function stripAnsiSequences(text: string | undefined | null): string | undefined { | |
| if (text == null) return undefined; | |
| return text.replace(ANSI_REGEX, ""); | |
| } |
🤖 Prompt for AI Agents
In `@apps/desktop/src/pipeline/providers/transcription/amical-cloud-provider.ts`
around lines 25 - 28, The current stripAnsiSequences function treats empty
string as absent because it uses a truthy check; change the guard to only return
undefined for null/undefined (e.g., check text == null or text === undefined ||
text === null) so that stripAnsiSequences("") returns "" and non-null empty
strings are preserved, then continue to call text.replace(ANSI_REGEX, "") for
other inputs; reference the stripAnsiSequences function and ANSI_REGEX when
making this change.
Problem
When dictating into terminal emulators (iTerm2, Ghostty, Terminal.app, Warp, etc.), the transcription output is garbled — the text in Amical's own history is also corrupted, not just the pasted result. However, when Amical is the frontmost app, transcription works correctly.
Root cause: macOS Accessibility API exposes terminal content (command output, ANSI escape sequences, shell prompts) as
preSelectionText/postSelectionText. This raw text is sent to the cloud transcription endpoint assharedContext.beforeText/afterText, and used asinitial_promptfor local Whisper. The noisy context degrades recognition quality, especially for non-English languages like Japanese.Solution
Add
"terminal"AppType with bundle identifiers for 8 common terminal apps: Terminal.app, iTerm2, Ghostty, Warp, kitty, WezTerm, Alacritty, HyperSanitize cloud API context (
amical-cloud-provider.ts): When the active app is a terminal, stripselectedText,beforeText, andafterTextfromsharedContextwhile still sendingappType,appBundleId, andappNameSkip noisy initial_prompt (
whisper-provider.ts): For terminal apps, skip using accessibilitypreSelectionTextas Whisper'sinitial_promptto prevent local transcription degradationTerminal-specific formatting rules (
formatter-prompt.ts): Add formatting rules and examples appropriate for terminal context (preserve technical terminology, plain text output)Testing
Related
Summary by CodeRabbit
New Features
Bug Fixes / Reliability