Skip to content

Add Gemma 4 on-device LLM via MLX Swift#240

Closed
arkavo-com wants to merge 6 commits intomainfrom
update/mlx-latest
Closed

Add Gemma 4 on-device LLM via MLX Swift#240
arkavo-com wants to merge 6 commits intomainfrom
update/mlx-latest

Conversation

@arkavo-com
Copy link
Copy Markdown
Contributor

Summary

Integrates Gemma 4 (gemma-4-e4b-it-8bit) as the primary on-device LLM provider in Arkavo Creator, running at 73.7 tok/s on Apple Silicon via MLX Swift.

  • Add Gemma4Provider conforming to LLMResponseProvider (priority 0, tried first)
  • Downloads model from HuggingFace on first use (~9 GB, cached after)
  • Both constrained generation (JSON tool calls) and streaming text interfaces
  • Apple Intelligence / Edge providers become fallbacks
  • MuseCore gains MLX, MLXLLM, MLXHuggingFace, and Tokenizers dependencies

Key commits

  1. Dependencies — mlx-swift 0.31.3, mlx-swift-lm (arkavo-ai fork with Gemma 4), swift-transformers
  2. Gemma4Provider — model loading, constrained generation, streaming generation
  3. App integration — registered in MuseAvatarViewModel.setupLLMProviders() at priority 0

Performance

Achieved 97.6% of Python MLX speed after discovering and fixing the float32 literal trap — where Swift's default float32 scalars inject 1046 AsType cast nodes into the bfloat16 computation graph, causing 237 MB of Metal cache churn per token. Full writeup in ml-explore/mlx-swift-lm#188.

Metric Result
Generation 73.7 tok/s
Metal cache/token 2 MB
Model size ~9 GB (8-bit quantized)
First use Downloads from HuggingFace

Dependencies

  • arkavo-ai/mlx-swift-lm branch feature/gemma4-text (until upstream merges #188)
  • ml-explore/mlx-swift 0.31.3
  • huggingface/swift-transformers 1.2.1

Test plan

  • MuseCore builds (swift build)
  • Full app builds (xcodebuild -skipMacroValidation)
  • Gemma 4 model loads and generates correct output
  • Manual: first-launch download flow
  • Manual: chat interaction through avatar interface

🤖 Generated with Claude Code

arkavo-com and others added 6 commits April 4, 2026 20:45
Gemma 4 MLX provider for Arkavo Creator:
- Primary provider (priority 0) with Apple Intelligence fallback
- HuggingFace download on first use (~9 GB)
- Both constrained (tool calls) and streaming (conversation) interfaces
- ~74 tok/s generation on Apple Silicon

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6-task plan to integrate Gemma 4 MLX inference into Arkavo Creator:
1. Add dependencies (MLXHuggingFace, Tokenizers)
2. Create Gemma4Provider with HuggingFace model loading
3. Add LLMResponseProvider conformance (constrained generation)
4. Add streaming generation
5. Register in app's fallback chain
6. Integration test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds mlx-swift, mlx-swift-lm (feature/gemma4-text branch), swift-transformers,
and swift-huggingface as dependencies to support on-device Gemma 4 inference.
Exposes MLX, MLXNN, MLXLMCommon, MLXLLM, MLXHuggingFace, Tokenizers, and
HuggingFace products to the MuseCore target.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements LLMResponseProvider backed by mlx-community/gemma-4-e4b-it-8bit.
Includes loadModel/unloadModel lifecycle, constrained JSON generation at
temperature 0, and a streaming generate method at temperature 0.6/topP 0.95.
State is managed by a Swift actor for Swift 6 strict-concurrency compliance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gemma 4 is now priority 0 (tried first) in the LLM fallback chain.
Edge provider becomes the fallback when available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 5, 2026

@arkavo-com
Copy link
Copy Markdown
Contributor Author

Merged into #239

@arkavo-com arkavo-com closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant