Skip to content

Integrate MLX for on-device AI assistant and multi-platform streaming#239

Open
arkavo-com wants to merge 14 commits intomainfrom
feature/mlx-local-assistant
Open

Integrate MLX for on-device AI assistant and multi-platform streaming#239
arkavo-com wants to merge 14 commits intomainfrom
feature/mlx-local-assistant

Conversation

@arkavo-com
Copy link
Copy Markdown
Contributor

No description provided.

arkavo-com and others added 8 commits March 17, 2026 23:31
Integrates mlx-swift-examples v2 into MuseCore for local LLM inference,
with a streaming chat UI in ArkavoCreator that helps creators draft,
rewrite, and adapt content across platforms (Bluesky, YouTube, Twitch,
Reddit, Micro.blog) — no backend required.

MuseCore additions:
- StreamingLLMProvider protocol for async token streams
- MLXBackend wrapping MLXLMCommon generate API
- ModelRegistry catalog (Gemma 270M default, Qwen 3.5 scale-up)
- ModelManager for lifecycle, memory budgeting, GPU coexistence

ArkavoCreator additions:
- PlatformContext protocol with per-platform constraints/actions
- AssistantPromptBuilder for context-aware system prompts
- AssistantChatView (full section) and AssistantPanelView (Cmd+Shift+A)
- AssistantViewModel with streaming generation and auto-unload in Studio
- localAssistant feature flag (ships enabled, independent of aiAgent)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ekick

Transform the text chatbot into a role-based architecture where Muse fills
three distinct roles for creators. Producer monitors streams via a private
Studio overlay panel. Publicist generates platform-native content across
six connected platforms. Sidekick (Phase 1.1b) will be the on-camera avatar.

- Add AvatarRole enum and RolePromptProvider with per-role/locale prompts
- Add MLXResponseProvider adapter (MLXBackend → LLMResponseProvider)
- Add Producer panel (slide-in overlay in Studio, Cmd+P toggle)
- Add Publicist view (content workspace replacing chat UI)
- Add role-aware ConversationManager with context injection
- Fix model auto-load on restart (sandbox-aware cache detection)
- Fix concurrent model load race condition (generation counter)
- Fix <end_of_turn> token leaking into output (stop sequence filtering)
- Remove AssistantChatView, AssistantPanelView, AssistantViewModel
- Remove StreamingLLMProvider protocol (inlined into MLXBackend)
- Rename AssistantAction → PublicistAction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move Publicist from standalone sidebar section to a slide-in panel on
Dashboard (⌘E), matching Producer's panel pattern on Studio (⌘P).
Sidebar is now five items: Dashboard, Profile, Studio, Library, Settings.

- Wire Sidekick: MLXResponseProvider + ConversationManager into
  MuseAvatarViewModel's LLM fallback chain for on-device chat responses
- Add PublicistPanelView (compact panel for Dashboard trailing edge)
- Move Settings to bottom of sidebar (below divider)
- Move Send Feedback to top of Settings page
- Remove feedback toggle and unused appState references
- Dashboard subtitle: "Your Social Command Center"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Settings sidebar using same NavigationLink highlight as other items
- Change LIVE button from blue/purple gradient to red (broadcast standard)
- Improve timer contrast and enlarge primary screen star badge
- Format recording titles as human-readable dates instead of raw timestamps
- Consolidate recording card metadata into single compact subtitle line
- Demote Settings Reset button from destructive red to secondary gray
- Fix Settings text hierarchy and compact the feedback banner
- Add segmented control container styling to Publicist selectors
- Add placeholder text and border to Publicist source TextEditor
- Clarify character limit label from "max" to "chars"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…io layout

Phase 2 visual overhaul:
- Replace flat controlBackgroundColor with .ultraThinMaterial + specular
  gradient borders (top-left light source) on all cards
- Add ambient gradient background behind NavigationSplitView for glass
  refraction
- Spring animations replace all easeInOut transitions
- Remove page headers (headerless pro layout), move Library metadata
  to native toolbar

Studio restructure:
- Fixed bottom control bar (edge-to-edge .regularMaterial, 68pt)
- Consolidate Chat + Producer into unified Producer command center with
  integrated dense monospaced chat feed at bottom
- Single panel toggle button replaces three separate panel buttons
- Audio controls: mic/speaker toggle + chevron volume popovers
- Scene picker moved next to timer as chevron popover
- Control bar grouped: inputs (left), broadcast (center), toggle (right)
- LIVE button pulses red shadow when streaming

Recording cards:
- Flush thumbnails clipped by outer card shape (no internal cornerRadius)
- Adaptive grid columns (280-400pt) for fluid window resizing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable YouTube feature flag and implement full broadcast lifecycle:
create broadcast, bind stream, transition ready→testing→live, and
end on stop. Add network.server entitlement for OAuth callback,
silent audio generator for YouTube stream activation, and background
RTMP server message handler for ping/pong and window acknowledgements
to prevent connection drops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encode video/audio once, fan out to N RTMPPublisher instances in parallel.
Each publisher has independent send tasks, sequence headers, AsyncStream
continuations with .bufferingNewest(30) backpressure, and per-destination
frame rate limiting to prevent YouTube "faster than realtime" errors.

- VideoEncoder: startStreaming(to:) accepts array of destinations
- RecordingSession: multi-destination pass-through, per-platform stop
- StreamViewModel: selectedPlatforms set, per-platform PlatformConfig
- StreamDestinationPicker: multi-select toggle cards, bandwidth estimate
- StreamInfoFormView: universal form with platform-specific sections
- ChatPanelViewModel: concurrent Twitch IRC + YouTube polling into
  unified message feed with per-platform connect/disconnect
- TwitchAuthClient: token refresh, EventSub scopes, keychain storage
- TwitchEventSubClient: WebSocket client for follows, subs, cheers, raids
- YouTubeClient: live chat polling via OAuth, Sendable response types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@superninja-app
Copy link
Copy Markdown

Code Review: PR #239 — Integrate MLX for on-device AI assistant and multi-platform streaming

Summary

This is a large PR (+4900/-998 across 44 files) that introduces three major feature areas:

  1. MLX On-Device AI Integration — Adds Apple MLX framework support for local LLM inference via MuseCore, enabling three AI "Muse" roles: Producer (stream monitoring), Publicist (content creation), and Sidekick (on-camera avatar).
  2. Multi-Platform Simulcast Streaming — Refactors the RTMP streaming stack from single-destination to multi-destination (simulcast) with per-platform stream key management.
  3. YouTube Live API Integration — Full YouTube broadcast lifecycle: create/bind/transition broadcasts, live chat polling, Super Chat events, and OAuth scope upgrades.

Additionally includes: UI refresh with glassmorphism design, Swift 6.3 version bump, Twitch EventSub WebSocket client, token refresh flow, feedback UX refactor, and feature flag updates.


Strengths

Architecture & Design

  • Clean role separation: The Producer/Publicist/Sidekick role pattern is well-designed with clear boundaries. AvatarRole, RolePromptProvider, and PlatformContext provide excellent extensibility.
  • Protocol-oriented platform contexts: PlatformContext protocol with per-platform structs (BlueskyContext, YouTubeContext, etc.) is a clean, maintainable pattern.
  • MLX integration is well-layered: MLXBackendMLXResponseProviderModelManagerPublicistViewModel is a clean dependency chain.
  • Backward compatibility preserved: StreamViewModel maintains computed property shims (selectedPlatform, streamKey) for single-platform code paths while adding multi-platform support.
  • Feature-flagged rollout: FeatureFlags.localAssistant gates the new MLX features, allowing safe incremental rollout.

Code Quality

  • Consistent use of modern Swift patterns: @Observable, AsyncStream, AsyncThrowingStream, structured concurrency, and Mutex for thread safety.
  • Good prompt engineering: System prompts for each role are well-crafted with clear boundaries, safety guidelines, and bilingual support (EN/JA).
  • Accessibility identifiers: UI elements like "Platform_Bluesky", "Btn_Generate", "Source_Face" are properly tagged for UI testing.
  • Good error handling in streaming: The RTMP simulcast fan-out properly handles per-destination errors without killing other streams.

Testing

  • UI tests for both Producer and Publicist features cover core navigation and element existence.
  • Tests use proper waitForExistence patterns for async UI.

Issues / Concerns

🔴 Critical

1. Debug print statements left in production audio encoder
AudioEncoder.swift adds print statements with emoji that fire on every 500th feed() call and on the first 3 AAC frame emissions. These will pollute logs in production builds and slightly impact audio encoding performance on the hot path.

// AudioEncoder.swift
nonisolated(unsafe) private var feedCount = 0
public func feed(_ sampleBuffer: CMSampleBuffer) {
    feedCount += 1
    if feedCount == 1 || feedCount % 500 == 0 {
        print("🔊 AudioEncoder.feed() called #\(feedCount)...")
    }

Recommendation: Remove or gate behind #if DEBUG. Also, nonisolated(unsafe) on feedCount is a data race risk since feed() can be called from multiple threads — use OSAtomicIncrement64 or Mutex instead.

2. Silent audio generator has if true dead code and memory concern
In VideoEncoder.swift, the silent audio generator contains if true { ... } which is clearly leftover from development. More concerning, it creates a new CMAudioFormatDescription and CMBlockBuffer every ~21ms for the entire duration of the stream, which is wasteful.

// Always generate silent audio as fallback
if true {
    // Create a CMSampleBuffer with silent PCM data
    var formatDesc: CMAudioFormatDescription?

Recommendation: Remove the if true guard. Cache the CMAudioFormatDescription and reuse it. Consider only generating silent audio when no real audio source is active, rather than always.

3. YouTube OAuth scope broadened significantly without justification
The YouTube OAuth scope changed from youtube.readonly + youtube.upload + youtube.force-ssl to just youtube + youtube.force-ssl. The youtube scope grants full read/write/delete access to the user's YouTube account. This is overly permissive.

// Before: youtube.readonly youtube.upload youtube.force-ssl
// After:  youtube youtube.force-ssl

Recommendation: Use the minimum required scopes. For live streaming, you need youtube.force-ssl (which covers live broadcasts) — determine the exact minimal set rather than granting full account access.


🟡 Major

4. PlatformConfig.transitionTask stores a Task in a struct — value semantics issue
StreamViewModel.PlatformConfig is a struct containing a Task<Void, Never>?. Since structs use value semantics, copying a PlatformConfig (e.g., via platformConfigs[.youtube, default: PlatformConfig()]) can silently duplicate or lose the task reference.

struct PlatformConfig {
    var streamKey: String = ""
    var broadcastId: String?
    var transitionTask: Task<Void, Never>?  // ⚠️ Reference type in value type

Recommendation: Either make PlatformConfig a class, or store the transitionTask separately (e.g., var youtubeTransitionTask: Task<Void, Never>? on the view model directly — which you already have as a computed property, so remove it from the struct).

5. @unchecked Sendable on MLXBackend and MLXResponseProvider
Both types use @unchecked Sendable to bypass concurrency checking. MLXBackend uses Mutex which is correct, but MLXResponseProvider has mutable properties (activeRole, voiceLocale, contextInjection) with no synchronization.

public final class MLXResponseProvider: LLMResponseProvider, @unchecked Sendable {
    public var activeRole: AvatarRole = .sidekick     // ⚠️ No synchronization
    public var voiceLocale: VoiceLocale = .english     // ⚠️ No synchronization
    public var contextInjection: String?               // ⚠️ No synchronization

Recommendation: Either make MLXResponseProvider an actor, use Mutex for mutable state, or ensure it's only ever accessed from @MainActor.

6. No error handling for failed RTMP destinations in simulcast
In VideoEncoder.startStreaming(to destinations:), if one destination fails to connect in the withThrowingTaskGroup, the entire simulcast setup throws. For true simulcast, you likely want partial success (e.g., Twitch connects but YouTube fails — you should still stream to Twitch).

try await withThrowingTaskGroup(of: StreamDestination.self) { group in
    for dest in destinations {
        group.addTask { /* connect */ }
    }
    for try await dest in group {  // ⚠️ One failure kills all
        streamDestinations[dest.id] = dest
    }
}

Recommendation: Use TaskGroup instead of ThrowingTaskGroup and handle per-destination errors, allowing partial success with user notification.

7. YouTube broadcast hardcodes "public" privacy status
YouTubeClient.createAndBindBroadcast hardcodes privacyStatus: "public". The StreamInfoFormView has a privacy picker (privacyStatus state), but it's never wired through to the broadcast creation.

"status": ["privacyStatus": "public"]  // ⚠️ Ignores user's selection

Recommendation: Pass the privacy status through StreamViewModel to YouTubeClient.createAndBindBroadcast().

8. Duplicate streaming logic between RecordView and StreamViewModel
Both RecordView.startStreaming() and StreamViewModel.startStreaming() contain nearly identical YouTube broadcast creation, multi-destination setup, and transition logic. This duplication creates a maintenance risk and makes it unclear which code path is actually used.

Recommendation: Consolidate into StreamViewModel as the single source of truth for streaming logic. RecordView should delegate entirely to it.


🟢 Minor

9. BackendState uses ~Copyable unnecessarily

private struct BackendState: ~Copyable {

BackendState is only ever used inside a Mutex<BackendState>, which already prevents copies. The ~Copyable annotation is unnecessary and adds complexity. Also, since it holds optional Task and ModelContainer references, it should still work fine as a regular struct in the Mutex.

10. Magic numbers in video bitrate estimation

private var videoBitrate: Int {
    let cores = ProcessInfo.processInfo.activeProcessorCount
    if cores >= 8 { return 4_500_000 }
    if cores >= 4 { return 3_000_000 }
    return 1_500_000
}

These should be documented constants or reference the VideoEncoder's actual auto-detected bitrate.

11. com.apple.security.network.server entitlement added without documentation
The server network entitlement was added to ArkavoCreator.entitlements. This allows the app to listen for incoming connections. While likely needed for the YouTube OAuth local callback server, it should be documented.

12. Inconsistent platform identification strings
Platform identification uses raw strings ("twitch", "youtube") in ChatPanelViewModel.connectedPlatforms and StreamEvent.platform, but enums elsewhere. This creates fragility.

Recommendation: Use the StreamPlatform enum consistently, or at minimum define string constants.

13. Missing scrollContentBackground(.hidden) in PublicistView's TextEditor
PublicistPanelView correctly uses .scrollContentBackground(.hidden) on its TextEditor, but PublicistView.sourceInputSection does not, which will show a default white background on macOS.

14. Hardcoded frame-rate drop threshold in video send task

if elapsed < .milliseconds(Int(targetInterval * 900)) && !frame.isKeyframe {
    continue // Drop frame — too fast
}

The 900 multiplier (90% of target interval) is a magic number that should be documented or made configurable.

15. Several TODO/cleanup items

  • The PublicistViewModel.mapContentTypeToAction() maps .thread to .draftPost with a comment "Thread uses draft post with thread-specific prompt" — but no thread-specific prompt exists.
  • sentimentLabel in StreamStateContext doesn't handle the exact value 0.3 (it falls through to default: "Positive" since the range 0..<0.3 excludes 0.3 and 0.3..<0.7 starts at 0.3 — actually this is correct, but worth double-checking edge cases).

Suggestions

  1. Consider splitting this PR: This PR touches three distinct feature areas (MLX AI, simulcast streaming, YouTube Live API). Splitting into 2-3 focused PRs would make review, testing, and potential rollback much safer.

  2. Add unit tests for core logic: The PR adds UI tests but no unit tests for critical logic like PublicistPromptBuilder, StreamStateContext.formattedForPrompt(), ModelRegistry.availableModels(), or StreamViewModel.validateInputs(). These are all easily unit-testable.

  3. Add a model memory warning: ModelManager auto-loads models on init if cached. On low-memory devices, this could cause issues. Consider adding a memory pressure observer (DispatchSource.makeMemoryPressureSource) to auto-unload.

  4. Document the YouTube broadcast lifecycle: The ready → testing → live → complete state machine is non-trivial. A brief architecture doc or code comment explaining the flow and retry logic would help future maintainers.

  5. Consider rate limiting the YouTube chat poller: The minimum polling interval is clamped to 5 seconds, but YouTube's API quota is limited. Consider exponential backoff on errors and tracking quota usage.

  6. Consolidate animation constants: The PR introduces .spring(response: 0.3, dampingFraction: 0.75) in many places. Consider defining a shared animation constant like Animation.panelSpring.


Questions

  1. Is the youtube feature flag intentionally enabled (true) in this PR? It was previously false. Is this ready for production, or should it remain gated?

  2. What's the expected behavior when MLX model loading fails on a device with insufficient memory? The ModelManager shows an error state, but the Publicist/Producer panels don't provide recovery guidance to the user.

  3. Is the silent audio generator needed for all platforms, or only YouTube? The comment says "required by YouTube" but it runs for all RTMP destinations. Consider gating it to only generate when YouTube is a selected destination.

  4. Are the Twitch EventSub scopes (moderator:read:followers, channel:read:subscriptions, bits:read) already approved for the app? Adding new OAuth scopes may require Twitch app review.

  5. Why was the isCreator gating removed from availableSections? The old code filtered out .patrons for non-creators. The new code relies entirely on feature flags, which means the Patreon section would appear for all users if the flag is enabled.


Overall Recommendation

Request Changes 🔄

The architecture and feature design are strong, but the PR should address the critical issues (debug logging in production audio path, silent audio generator cleanup, overly broad YouTube OAuth scope) and the major issue of @unchecked Sendable without synchronization before merging. The duplicate streaming logic between RecordView and StreamViewModel should also be consolidated to prevent maintenance issues.

I'd also strongly recommend splitting this into smaller PRs if timeline permits — the blast radius of a 4900-line PR across core streaming infrastructure is high.

arkavo-com and others added 6 commits April 4, 2026 21:15
- Replace mlx-swift-examples 2.29.1 with mlx-swift 0.31.3 + mlx-swift-lm
  (arkavo-ai fork with Gemma 4 text model at 73.7 tok/s)
- Add MLXHuggingFace, Tokenizers, HuggingFace dependencies
- Update MLXBackend to use #huggingFaceLoadModelContainer macro
- Add Gemma 4 E4B (8B params, 4B active MoE, 8-bit) to ModelRegistry
- Fix Sendable capture in MLXBackend.generate()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sandboxed apps default to Library/Caches/huggingface/hub inside
the app container. This means models downloaded by the Python CLI
(at ~/.cache/huggingface/hub) aren't found, triggering a 9 GB
re-download.

Fix: explicitly configure HubClient with the shared cache path
so the app finds models cached by any HF client.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Settings UI:
- Preferred model picker (Gemma 4 E4B, Qwen 3.5 0.8B, Qwen 3.5 9B)
- Model state indicator (idle/downloading/loading/ready/error)
- Load/Unload/Retry buttons
- Custom model cache folder with NSOpenPanel folder picker
- Persisted via UserDefaults

Debug logging:
- MLXBackend: logs cache directory resolution, model cache check,
  load start/success
- ModelManager: logs init state, auto-load decisions, load lifecycle,
  errors with full descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync project-level Package.resolved with workspace-level to fix
Xcode Cloud build failure from stale mlx-swift-examples reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Capture Sendable values (String) before the Task boundary instead
of sending the non-Sendable Profile across isolation domains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant