Skip to content

Add Gemini provider and Windows port (M1 foundation)#71

Open
PsychoSatsujin wants to merge 8 commits intofarzaa:mainfrom
PsychoSatsujin:claude/hungry-elbakyan-59854c
Open

Add Gemini provider and Windows port (M1 foundation)#71
PsychoSatsujin wants to merge 8 commits intofarzaa:mainfrom
PsychoSatsujin:claude/hungry-elbakyan-59854c

Conversation

@PsychoSatsujin
Copy link
Copy Markdown

Summary

  • Gemini provider: new /chat-gemini route on the Cloudflare Worker, GeminiAPI.swift mirroring ClaudeAPI's streaming interface, and provider dispatch in CompanionManager. Model picker now shows Claude (Sonnet/Opus) and Gemini (Flash/Pro).
  • Windows port — M1: C# + WPF on .NET 8 (windows/). Tray icon, borderless non-activating popover, global Ctrl+Alt push-to-talk via low-level keyboard hook, settings persistence, design-system parity (DesignSystem.xaml). Zero embedded secrets — shares the same Worker proxy.
  • Docs: AGENTS.md updated to cross-platform framing; windows/README.md covers build steps and milestone roadmap (M2 voice → M3 capture → M4 overlay → M5 pointing → M6 polish).

Test plan

  • macOS: build in Xcode 16, switch between Claude/Gemini models, confirm streaming + vision both work against the Worker.
  • Windows: dotnet run --project windows/Clicky, confirm tray icon, click opens popover near tray, Ctrl+Alt fires press/release events, settings persist to %APPDATA%\Clicky\settings.json.

🤖 Generated with Claude Code

PsychoSatsujin and others added 8 commits April 21, 2026 11:02
Two steps toward a multi-provider, cross-platform Clicky.

Gemini alongside Claude
- Worker gains POST /chat-gemini. Model ID travels in the request body;
  the Worker plugs it into the upstream Gemini URL path.
- GeminiAPI.swift mirrors ClaudeAPI's streaming signature so the call
  sites don't care which provider is active. CompanionManager gets a
  runStreamingVisionRequest dispatcher and an isGeminiModelID helper;
  setSelectedModel updates whichever provider owns the new ID.
- Panel picker gains a Gemini row (Flash default, Pro option). Flash is
  the default because the motivation here is reducing credit spend.

Windows port (Milestone 1 of 6)
- New windows/ folder with a C# + WPF solution on .NET 8. Single-instance
  guarded, tray icon via H.NotifyIcon.Wpf, borderless non-activating
  popover panel matching the macOS design system 1:1 (colors and radii
  ported verbatim from DesignSystem.swift), global Ctrl+Alt push-to-talk
  via low-level keyboard hook, settings persisted to %APPDATA%\Clicky.
- No voice pipeline yet — pressing Ctrl+Alt flips AppState so the hook
  is verifiable. M2 (mic + AssemblyAI + AI + TTS), M3 (screen capture),
  M4 (cursor overlay), M5 (element pointing), and M6 (onboarding) land
  in follow-up PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end push-to-talk flow on Windows: mic capture, AssemblyAI v3
streaming transcription, Claude/Gemini SSE chat, ElevenLabs TTS playback.
Text-only (no screenshots yet — that's M3).

Services (windows/Clicky/Services/):
 - WorkerConfig.cs             Cloudflare Worker base URL + route constants
 - IChatClient.cs              provider-agnostic streaming chat interface
 - ClaudeClient.cs             /chat SSE (port of ClaudeAPI.swift)
 - GeminiClient.cs             /chat-gemini SSE (port of GeminiAPI.swift)
 - AssemblyAIStreamingClient   v3 realtime WebSocket transcription
 - MicrophoneCaptureService    NAudio WaveInEvent @ 16 kHz PCM16 mono
 - ElevenLabsTtsClient         /tts MP3 fetch + NAudio Mp3FileReader playback
 - DictationSession            mic ↔ AssemblyAI bridge, 2.8s finalize fallback
 - VoicePipelineOrchestrator   end-to-end press→listen→process→respond→idle

Wire-up:
 - AppState gains LiveTranscript, StreamedResponseText, LastStatusMessage
 - TrayPanelViewModel exposes AppState so the panel can bind directly
 - TrayPanelWindow.xaml renders live transcript + streaming response rows,
   collapsing via a new StringToVisibilityConverter
 - App.xaml.cs hands Ctrl+Alt press/release to the orchestrator and
   tears it down cleanly on exit
 - Clicky.csproj adds NAudio 2.2.1

Housekeeping:
 - .gitignore excludes windows/**/bin|obj/ and *.user
 - windows/README.md marks M2 complete and documents WorkerConfig edit step

Build verified: dotnet build → 0 errors (6 pre-existing M1 warnings).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds GDI BitBlt-based ScreenCaptureService that enumerates every attached
monitor, captures PerMonitorV2-aware JPEGs (quality 80, downscaled to
1280px longest side), and orders them cursor-first to match the macOS
CompanionScreenCaptureUtility contract.

InlineImage gains an optional label so ClaudeClient and GeminiClient can
emit a text part before each image, giving the model the "screen N of M
— cursor is on this screen (primary focus) (image dimensions: WxH
pixels)" context macOS already uses. VoicePipelineOrchestrator captures
every push-to-talk release, feeds the labeled JPEGs to the selected
provider, and strips the trailing [POINT:…] tag before TTS speaks the
reply (M4/M5 will start consuming it).

System prompt is now the verbatim macOS companionVoiceResponseSystemPrompt
so prompt-engineering tweaks there translate directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds one click-through topmost OverlayWindow per connected display
(WS_EX_TRANSPARENT | WS_EX_LAYERED | WS_EX_NOACTIVATE | WS_EX_TOOLWINDOW),
with a 16-DIP equilateral blue triangle (#3380FF, rotated -35°, soft
blue glow) that follows the system cursor at 60 fps. Only the overlay
on the cursor's monitor renders the triangle; the rest stay hidden so
nothing flickers between displays.

OverlayWindowManager owns the per-monitor lifecycle and the
DispatcherTimer that polls GetCursorPos every 16 ms. Triangle visibility
follows AppState.CurrentVoiceState so the triangle only shows during
Idle / Responding — Listening and Processing hide it, leaving room for
the waveform and spinner that will land in M5/M6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parses [POINT:x,y:label:screenN] tags out of each reply, rescales the
screenshot coords to the monitor's native device pixels, and flies the
overlay triangle along a quadratic bezier arc (smoothstep easing,
tangent-based rotation, scale pulse) to the target. On arrival a blue
speech bubble spring-bounces in with a streamed random phrase, holds
3 s, fades, then the triangle flies back to the cursor. Port of the
macOS OverlayWindow.animateBezierFlightArc + streamBubbleText flow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the three remaining parity surfaces:
- MicrophonePermissionHelper probes for an active capture endpoint at
  startup and offers a one-click shortcut to
  ms-settings:privacy-microphone when Windows privacy settings are
  blocking the mic. The tray panel shows an inline callout whenever
  AppState.IsMicrophonePermissionIssue is set.
- First-run onboarding: if HasCompletedOnboarding is false the panel
  auto-opens centered on the primary monitor with a welcome block and
  a "Get started" button that flips the flag. A "Watch welcome again"
  footer link replays it.
- ClickyAnalytics POSTs directly to PostHog /capture/ with the same
  event surface as the macOS ClickyAnalytics.swift (app_opened,
  onboarding_*, permission_*, push_to_talk_*, user_message_sent,
  ai_response_received, element_pointed, response_error, tts_error),
  keyed by a stable anonymous distinct_id persisted in settings.json.
  Write key placeholder in WorkerConfig.cs — every event is silently
  dropped until it's swapped for a real key.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Install-Clicky.bat double-clicks into Install-Clicky.ps1 (runs per-user with
ExecutionPolicy Bypass, no admin needed). The PowerShell installer publishes
a self-contained single-file Clicky.exe to %LOCALAPPDATA%\Programs\Clicky,
creates Start Menu + Desktop shortcuts (minimised since Clicky is a tray
app), registers Clicky in Apps & Features via the HKCU Uninstall key,
optionally adds the HKCU Run entry so it launches on login, and drops a
self-contained Uninstall-Clicky.ps1 next to the exe. -FrameworkDependent,
-NoAutoStart, and -NoLaunch switches let power users opt out of the defaults.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace em-dashes and Unicode arrows in Install-Clicky.ps1 with ASCII
equivalents. PS 5.1 reads .ps1 files as ANSI by default, so the UTF-8
multi-byte sequences were mis-decoded and broke string parsing ("Missing
closing '}'", "Unexpected token ')'", "The string is missing the
terminator"). Verified with [Parser]::ParseFile -- no parse errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant