Add Gemini provider and Windows port (M1 foundation)#71
Open
PsychoSatsujin wants to merge 8 commits intofarzaa:mainfrom
Open
Add Gemini provider and Windows port (M1 foundation)#71PsychoSatsujin wants to merge 8 commits intofarzaa:mainfrom
PsychoSatsujin wants to merge 8 commits intofarzaa:mainfrom
Conversation
Two steps toward a multi-provider, cross-platform Clicky. Gemini alongside Claude - Worker gains POST /chat-gemini. Model ID travels in the request body; the Worker plugs it into the upstream Gemini URL path. - GeminiAPI.swift mirrors ClaudeAPI's streaming signature so the call sites don't care which provider is active. CompanionManager gets a runStreamingVisionRequest dispatcher and an isGeminiModelID helper; setSelectedModel updates whichever provider owns the new ID. - Panel picker gains a Gemini row (Flash default, Pro option). Flash is the default because the motivation here is reducing credit spend. Windows port (Milestone 1 of 6) - New windows/ folder with a C# + WPF solution on .NET 8. Single-instance guarded, tray icon via H.NotifyIcon.Wpf, borderless non-activating popover panel matching the macOS design system 1:1 (colors and radii ported verbatim from DesignSystem.swift), global Ctrl+Alt push-to-talk via low-level keyboard hook, settings persisted to %APPDATA%\Clicky. - No voice pipeline yet — pressing Ctrl+Alt flips AppState so the hook is verifiable. M2 (mic + AssemblyAI + AI + TTS), M3 (screen capture), M4 (cursor overlay), M5 (element pointing), and M6 (onboarding) land in follow-up PRs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end push-to-talk flow on Windows: mic capture, AssemblyAI v3 streaming transcription, Claude/Gemini SSE chat, ElevenLabs TTS playback. Text-only (no screenshots yet — that's M3). Services (windows/Clicky/Services/): - WorkerConfig.cs Cloudflare Worker base URL + route constants - IChatClient.cs provider-agnostic streaming chat interface - ClaudeClient.cs /chat SSE (port of ClaudeAPI.swift) - GeminiClient.cs /chat-gemini SSE (port of GeminiAPI.swift) - AssemblyAIStreamingClient v3 realtime WebSocket transcription - MicrophoneCaptureService NAudio WaveInEvent @ 16 kHz PCM16 mono - ElevenLabsTtsClient /tts MP3 fetch + NAudio Mp3FileReader playback - DictationSession mic ↔ AssemblyAI bridge, 2.8s finalize fallback - VoicePipelineOrchestrator end-to-end press→listen→process→respond→idle Wire-up: - AppState gains LiveTranscript, StreamedResponseText, LastStatusMessage - TrayPanelViewModel exposes AppState so the panel can bind directly - TrayPanelWindow.xaml renders live transcript + streaming response rows, collapsing via a new StringToVisibilityConverter - App.xaml.cs hands Ctrl+Alt press/release to the orchestrator and tears it down cleanly on exit - Clicky.csproj adds NAudio 2.2.1 Housekeeping: - .gitignore excludes windows/**/bin|obj/ and *.user - windows/README.md marks M2 complete and documents WorkerConfig edit step Build verified: dotnet build → 0 errors (6 pre-existing M1 warnings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds GDI BitBlt-based ScreenCaptureService that enumerates every attached monitor, captures PerMonitorV2-aware JPEGs (quality 80, downscaled to 1280px longest side), and orders them cursor-first to match the macOS CompanionScreenCaptureUtility contract. InlineImage gains an optional label so ClaudeClient and GeminiClient can emit a text part before each image, giving the model the "screen N of M — cursor is on this screen (primary focus) (image dimensions: WxH pixels)" context macOS already uses. VoicePipelineOrchestrator captures every push-to-talk release, feeds the labeled JPEGs to the selected provider, and strips the trailing [POINT:…] tag before TTS speaks the reply (M4/M5 will start consuming it). System prompt is now the verbatim macOS companionVoiceResponseSystemPrompt so prompt-engineering tweaks there translate directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds one click-through topmost OverlayWindow per connected display (WS_EX_TRANSPARENT | WS_EX_LAYERED | WS_EX_NOACTIVATE | WS_EX_TOOLWINDOW), with a 16-DIP equilateral blue triangle (#3380FF, rotated -35°, soft blue glow) that follows the system cursor at 60 fps. Only the overlay on the cursor's monitor renders the triangle; the rest stay hidden so nothing flickers between displays. OverlayWindowManager owns the per-monitor lifecycle and the DispatcherTimer that polls GetCursorPos every 16 ms. Triangle visibility follows AppState.CurrentVoiceState so the triangle only shows during Idle / Responding — Listening and Processing hide it, leaving room for the waveform and spinner that will land in M5/M6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parses [POINT:x,y:label:screenN] tags out of each reply, rescales the screenshot coords to the monitor's native device pixels, and flies the overlay triangle along a quadratic bezier arc (smoothstep easing, tangent-based rotation, scale pulse) to the target. On arrival a blue speech bubble spring-bounces in with a streamed random phrase, holds 3 s, fades, then the triangle flies back to the cursor. Port of the macOS OverlayWindow.animateBezierFlightArc + streamBubbleText flow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the three remaining parity surfaces: - MicrophonePermissionHelper probes for an active capture endpoint at startup and offers a one-click shortcut to ms-settings:privacy-microphone when Windows privacy settings are blocking the mic. The tray panel shows an inline callout whenever AppState.IsMicrophonePermissionIssue is set. - First-run onboarding: if HasCompletedOnboarding is false the panel auto-opens centered on the primary monitor with a welcome block and a "Get started" button that flips the flag. A "Watch welcome again" footer link replays it. - ClickyAnalytics POSTs directly to PostHog /capture/ with the same event surface as the macOS ClickyAnalytics.swift (app_opened, onboarding_*, permission_*, push_to_talk_*, user_message_sent, ai_response_received, element_pointed, response_error, tts_error), keyed by a stable anonymous distinct_id persisted in settings.json. Write key placeholder in WorkerConfig.cs — every event is silently dropped until it's swapped for a real key. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Install-Clicky.bat double-clicks into Install-Clicky.ps1 (runs per-user with ExecutionPolicy Bypass, no admin needed). The PowerShell installer publishes a self-contained single-file Clicky.exe to %LOCALAPPDATA%\Programs\Clicky, creates Start Menu + Desktop shortcuts (minimised since Clicky is a tray app), registers Clicky in Apps & Features via the HKCU Uninstall key, optionally adds the HKCU Run entry so it launches on login, and drops a self-contained Uninstall-Clicky.ps1 next to the exe. -FrameworkDependent, -NoAutoStart, and -NoLaunch switches let power users opt out of the defaults. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace em-dashes and Unicode arrows in Install-Clicky.ps1 with ASCII
equivalents. PS 5.1 reads .ps1 files as ANSI by default, so the UTF-8
multi-byte sequences were mis-decoded and broke string parsing ("Missing
closing '}'", "Unexpected token ')'", "The string is missing the
terminator"). Verified with [Parser]::ParseFile -- no parse errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/chat-geminiroute on the Cloudflare Worker,GeminiAPI.swiftmirroringClaudeAPI's streaming interface, and provider dispatch inCompanionManager. Model picker now shows Claude (Sonnet/Opus) and Gemini (Flash/Pro).windows/). Tray icon, borderless non-activating popover, global Ctrl+Alt push-to-talk via low-level keyboard hook, settings persistence, design-system parity (DesignSystem.xaml). Zero embedded secrets — shares the same Worker proxy.AGENTS.mdupdated to cross-platform framing;windows/README.mdcovers build steps and milestone roadmap (M2 voice → M3 capture → M4 overlay → M5 pointing → M6 polish).Test plan
dotnet run --project windows/Clicky, confirm tray icon, click opens popover near tray, Ctrl+Alt fires press/release events, settings persist to%APPDATA%\Clicky\settings.json.🤖 Generated with Claude Code