Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in th
- **Framework**: SwiftUI (macOS native) with AppKit bridging for menu bar panel and cursor overlay
- **Pattern**: MVVM with `@StateObject` / `@Published` state management
- **AI Chat**: Claude (Sonnet 4.6 default, Opus 4.6 optional) via Cloudflare Worker proxy with SSE streaming
- **Speech-to-Text**: AssemblyAI real-time streaming (`u3-rt-pro` model) via websocket, with OpenAI and Apple Speech as fallbacks
- **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy
- **Speech-to-Text**: AssemblyAI real-time streaming (`u3-rt-pro` model) via websocket, with OpenAI, Apple Speech, and Parakeet (on-device, via FluidAudio/CoreML) as options. Selectable at runtime via the panel UI.
- **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy, or Supertonic (on-device ONNX, 66M params, ~167× realtime on Apple Silicon). Selectable at runtime via the panel UI.
- **Screen Capture**: ScreenCaptureKit (macOS 14.2+), multi-monitor support
- **Voice Input**: Push-to-talk via `AVAudioEngine` + pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap.
- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target.
Expand Down Expand Up @@ -53,9 +53,9 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| File | Lines | Purpose |
|------|-------|---------|
| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. |
| `CompanionManager.swift` | ~1026 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `CompanionManager.swift` | ~1100 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, Supertonic TTS, and overlay management. Tracks voice state, conversation history, model selection, TTS provider selection, and STT provider selection. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. |
| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `CompanionPanelView.swift` | ~870 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), voice picker (ElevenLabs/Supertonic), speech picker (AssemblyAI/Parakeet), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. |
| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. |
| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. |
Expand All @@ -69,6 +69,9 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. |
| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. |
| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. |
| `SupertonicTTSClient.swift` | ~160 | On-device TTS client backed by Supertonic ONNX (66M params, ~167× realtime). Auto-downloads models from HuggingFace on first use. Mirrors `ElevenLabsTTSClient` interface. |
| `SupertonicEngine.swift` | ~600 | ONNX inference engine for Supertonic. Vendored from supertone-inc/supertonic. Handles text preprocessing, chunking, duration prediction, latent diffusion denoising, and vocoder synthesis via ONNX Runtime. |
| `ParakeetTranscriptionProvider.swift` | ~160 | On-device ASR provider using NVIDIA Parakeet via FluidAudio (CoreML/ANE). Implements `BuddyTranscriptionProvider` with the same buffer-then-transcribe pattern as the OpenAI provider. No API key required. |
| `ElementLocationDetector.swift` | ~335 | Detects UI element locations in screenshots for cursor pointing. |
| `DesignSystem.swift` | ~880 | Design system tokens — colors, corner radii, shared styles. All UI references `DS.Colors`, `DS.CornerRadius`, etc. |
| `ClickyAnalytics.swift` | ~121 | PostHog analytics integration for usage tracking. |
Expand All @@ -88,6 +91,15 @@ open leanring-buddy.xcodeproj
# deprecated onChange warning in OverlayWindow.swift. Do NOT attempt to fix these.
```

### Required Swift Packages (add via Xcode → File → Add Package Dependencies)

| Package | URL | Purpose |
|---------|-----|---------|
| onnxruntime-swift-package-manager | `https://github.com/microsoft/onnxruntime-swift-package-manager.git` | ONNX Runtime for Supertonic on-device TTS |
| FluidAudio | `https://github.com/FluidInference/FluidAudio.git` | Parakeet CoreML models for on-device ASR |

After adding, link both products to the `leanring-buddy` target. Supertonic downloads ~200MB of ONNX model files from HuggingFace on first use. Parakeet downloads ~600MB of CoreML models on first use. Both are cached in `~/Library/Application Support/Clicky/models/`.

**Do NOT run `xcodebuild` from the terminal** — it invalidates TCC (Transparency, Consent, and Control) permissions and the app will need to re-request screen recording, accessibility, etc.

## Cloudflare Worker
Expand Down
50 changes: 50 additions & 0 deletions leanring-buddy.xcodeproj/project.pbxproj
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@
/* Begin PBXBuildFile section */
AA00BB032F6500030039DA55 /* Sparkle in Frameworks */ = {isa = PBXBuildFile; productRef = AA00BB022F6500020039DA55 /* Sparkle */; };
AA00BB062F6500060039DA55 /* PostHog in Frameworks */ = {isa = PBXBuildFile; productRef = AA00BB052F6500050039DA55 /* PostHog */; };
C25D88DC2F8E1B20008ECA05 /* onnxruntime in Frameworks */ = {isa = PBXBuildFile; productRef = C25D88DB2F8E1B20008ECA05 /* onnxruntime */; };
C25D88DE2F8E1B20008ECA05 /* onnxruntime_extensions in Frameworks */ = {isa = PBXBuildFile; productRef = C25D88DD2F8E1B20008ECA05 /* onnxruntime_extensions */; };
C25D88E12F8E1B45008ECA05 /* FluidAudio in Frameworks */ = {isa = PBXBuildFile; productRef = C25D88E02F8E1B45008ECA05 /* FluidAudio */; };
C25D88E32F8E1B45008ECA05 /* fluidaudiocli in Frameworks */ = {isa = PBXBuildFile; productRef = C25D88E22F8E1B45008ECA05 /* fluidaudiocli */; };
/* End PBXBuildFile section */

/* Begin PBXContainerItemProxy section */
Expand Down Expand Up @@ -57,6 +61,10 @@
isa = PBXFrameworksBuildPhase;
buildActionMask = 2147483647;
files = (
C25D88DE2F8E1B20008ECA05 /* onnxruntime_extensions in Frameworks */,
C25D88E32F8E1B45008ECA05 /* fluidaudiocli in Frameworks */,
C25D88E12F8E1B45008ECA05 /* FluidAudio in Frameworks */,
C25D88DC2F8E1B20008ECA05 /* onnxruntime in Frameworks */,
AA00BB032F6500030039DA55 /* Sparkle in Frameworks */,
AA00BB062F6500060039DA55 /* PostHog in Frameworks */,
);
Expand Down Expand Up @@ -121,6 +129,10 @@
packageProductDependencies = (
AA00BB022F6500020039DA55 /* Sparkle */,
AA00BB052F6500050039DA55 /* PostHog */,
C25D88DB2F8E1B20008ECA05 /* onnxruntime */,
C25D88DD2F8E1B20008ECA05 /* onnxruntime_extensions */,
C25D88E02F8E1B45008ECA05 /* FluidAudio */,
C25D88E22F8E1B45008ECA05 /* fluidaudiocli */,
);
productName = "leanring-buddy";
productReference = 28F22CBF2F56440300A0FC59 /* Clicky.app */;
Expand Down Expand Up @@ -207,6 +219,8 @@
packageReferences = (
AA00BB012F6500010039DA55 /* XCRemoteSwiftPackageReference "Sparkle" */,
AA00BB042F6500040039DA55 /* XCRemoteSwiftPackageReference "posthog-ios" */,
C25D88DA2F8E1B20008ECA05 /* XCRemoteSwiftPackageReference "onnxruntime-swift-package-manager" */,
C25D88DF2F8E1B45008ECA05 /* XCRemoteSwiftPackageReference "FluidAudio" */,
);
preferredProjectObjectVersion = 77;
productRefGroup = 28F22CC02F56440300A0FC59 /* Products */;
Expand Down Expand Up @@ -616,6 +630,22 @@
minimumVersion = 3.0.0;
};
};
C25D88DA2F8E1B20008ECA05 /* XCRemoteSwiftPackageReference "onnxruntime-swift-package-manager" */ = {
isa = XCRemoteSwiftPackageReference;
repositoryURL = "https://github.com/microsoft/onnxruntime-swift-package-manager.git";
requirement = {
kind = upToNextMajorVersion;
minimumVersion = 1.24.2;
};
};
C25D88DF2F8E1B45008ECA05 /* XCRemoteSwiftPackageReference "FluidAudio" */ = {
isa = XCRemoteSwiftPackageReference;
repositoryURL = "https://github.com/FluidInference/FluidAudio.git";
requirement = {
branch = main;
kind = branch;
};
};
/* End XCRemoteSwiftPackageReference section */

/* Begin XCSwiftPackageProductDependency section */
Expand All @@ -629,6 +659,26 @@
package = AA00BB042F6500040039DA55 /* XCRemoteSwiftPackageReference "posthog-ios" */;
productName = PostHog;
};
C25D88DB2F8E1B20008ECA05 /* onnxruntime */ = {
isa = XCSwiftPackageProductDependency;
package = C25D88DA2F8E1B20008ECA05 /* XCRemoteSwiftPackageReference "onnxruntime-swift-package-manager" */;
productName = onnxruntime;
};
C25D88DD2F8E1B20008ECA05 /* onnxruntime_extensions */ = {
isa = XCSwiftPackageProductDependency;
package = C25D88DA2F8E1B20008ECA05 /* XCRemoteSwiftPackageReference "onnxruntime-swift-package-manager" */;
productName = onnxruntime_extensions;
};
C25D88E02F8E1B45008ECA05 /* FluidAudio */ = {
isa = XCSwiftPackageProductDependency;
package = C25D88DF2F8E1B45008ECA05 /* XCRemoteSwiftPackageReference "FluidAudio" */;
productName = FluidAudio;
};
C25D88E22F8E1B45008ECA05 /* fluidaudiocli */ = {
isa = XCSwiftPackageProductDependency;
package = C25D88DF2F8E1B45008ECA05 /* XCRemoteSwiftPackageReference "FluidAudio" */;
productName = fluidaudiocli;
};
/* End XCSwiftPackageProductDependency section */
};
rootObject = 28F22CB72F56440300A0FC59 /* Project object */;
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 14 additions & 1 deletion leanring-buddy/BuddyDictationManager.swift
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ final class BuddyDictationManager: NSObject, ObservableObject {
return AVCaptureDevice.authorizationStatus(for: .audio) == .notDetermined
}

private let transcriptionProvider: any BuddyTranscriptionProvider
private var transcriptionProvider: any BuddyTranscriptionProvider
private let audioEngine = AVAudioEngine()
private var activeTranscriptionSession: (any BuddyStreamingTranscriptionSession)?
private var activeStartSource: BuddyDictationStartSource?
Expand All @@ -287,6 +287,19 @@ final class BuddyDictationManager: NSObject, ObservableObject {
super.init()
}

/// Swaps the active transcription provider between push-to-talk sessions.
/// Safe to call at any time — if a session is in progress the change takes
/// effect after the current session finishes.
func switchTranscriptionProvider(to provider: any BuddyTranscriptionProvider) {
guard !isDictationInProgress else {
print("⚠️ Transcription: provider switch deferred — session in progress")
return
}
transcriptionProvider = provider
transcriptionProviderDisplayName = provider.displayName
print("🎙️ Transcription: switched to \(provider.displayName)")
}

func updateContextualKeyterms(_ contextualKeyterms: [String]) {
self.contextualKeyterms = contextualKeyterms
}
Expand Down
29 changes: 23 additions & 6 deletions leanring-buddy/BuddyTranscriptionProvider.swift
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ protocol BuddyTranscriptionProvider {
}

enum BuddyTranscriptionProviderFactory {
private enum PreferredProvider: String {
enum PreferredProvider: String {
case assemblyAI = "assemblyai"
case openAI = "openai"
case appleSpeech = "apple"
case parakeet = "parakeet"
}

static func makeDefaultProvider() -> any BuddyTranscriptionProvider {
Expand All @@ -42,15 +43,31 @@ enum BuddyTranscriptionProviderFactory {
return provider
}

private static func resolveProvider() -> any BuddyTranscriptionProvider {
let preferredProviderRawValue = AppBundleConfiguration
.stringValue(forKey: "VoiceTranscriptionProvider")?
.lowercased()
let preferredProvider = preferredProviderRawValue.flatMap(PreferredProvider.init(rawValue:))
static func makeProvider(for preferredProvider: PreferredProvider) -> any BuddyTranscriptionProvider {
let provider = resolveProvider(preferred: preferredProvider)
print("🎙️ Transcription: switching to \(provider.displayName)")
return provider
}

private static func resolveProvider(preferred: PreferredProvider? = nil) -> any BuddyTranscriptionProvider {
// Use the explicit preferred value if passed, otherwise read from Info.plist
let preferredProvider: PreferredProvider?
if let preferred {
preferredProvider = preferred
} else {
let rawValue = AppBundleConfiguration
.stringValue(forKey: "VoiceTranscriptionProvider")?
.lowercased()
preferredProvider = rawValue.flatMap(PreferredProvider.init(rawValue:))
}

let assemblyAIProvider = AssemblyAIStreamingTranscriptionProvider()
let openAIProvider = OpenAIAudioTranscriptionProvider()

if preferredProvider == .parakeet {
return ParakeetTranscriptionProvider()
}

if preferredProvider == .appleSpeech {
return AppleSpeechTranscriptionProvider()
}
Expand Down
31 changes: 31 additions & 0 deletions leanring-buddy/ClaudeAPI.swift
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,25 @@
import Foundation

/// Claude API helper with streaming for progressive text display.
///
/// Supports two modes:
/// - **Proxy mode** (default): Sends requests to a Cloudflare Worker that injects the API key.
/// - **Direct mode**: When an `apiKey` is provided, sends requests straight to
/// `api.anthropic.com` with the key in the `x-api-key` header. No Worker needed.
class ClaudeAPI {
private static let tlsWarmupLock = NSLock()
private static var hasStartedTLSWarmup = false

private let apiURL: URL
var model: String
private let apiKey: String?
private let session: URLSession

/// Creates a ClaudeAPI in proxy mode (requests go to a Cloudflare Worker).
init(proxyURL: String, model: String = "claude-sonnet-4-6") {
self.apiURL = URL(string: proxyURL)!
self.model = model
self.apiKey = nil

// Use .default instead of .ephemeral so TLS session tickets are cached.
// Ephemeral sessions do a full TLS handshake on every request, which causes
Expand All @@ -36,11 +44,34 @@ class ClaudeAPI {
warmUpTLSConnectionIfNeeded()
}

/// Creates a ClaudeAPI in direct mode (requests go straight to Anthropic).
init(apiKey: String, model: String = "claude-sonnet-4-6") {
self.apiURL = URL(string: "https://api.anthropic.com/v1/messages")!
self.model = model
self.apiKey = apiKey

let config = URLSessionConfiguration.default
config.timeoutIntervalForRequest = 120
config.timeoutIntervalForResource = 300
config.waitsForConnectivity = true
config.urlCache = nil
config.httpCookieStorage = nil
self.session = URLSession(configuration: config)

warmUpTLSConnectionIfNeeded()
}

private func makeAPIRequest() -> URLRequest {
var request = URLRequest(url: apiURL)
request.httpMethod = "POST"
request.timeoutInterval = 120
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
// In direct mode, add the Anthropic auth headers that the Worker would
// normally inject. In proxy mode these are omitted — the Worker adds them.
if let apiKey {
request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version")
}
return request
}

Expand Down
Loading