Add Gemma 4 on-device LLM via MLX Swift by arkavo-com · Pull Request #240 · arkavo-org/app

arkavo-com · 2026-04-05T01:01:20Z

Summary

Integrates Gemma 4 (gemma-4-e4b-it-8bit) as the primary on-device LLM provider in Arkavo Creator, running at 73.7 tok/s on Apple Silicon via MLX Swift.

Add Gemma4Provider conforming to LLMResponseProvider (priority 0, tried first)
Downloads model from HuggingFace on first use (~9 GB, cached after)
Both constrained generation (JSON tool calls) and streaming text interfaces
Apple Intelligence / Edge providers become fallbacks
MuseCore gains MLX, MLXLLM, MLXHuggingFace, and Tokenizers dependencies

Key commits

Dependencies — mlx-swift 0.31.3, mlx-swift-lm (arkavo-ai fork with Gemma 4), swift-transformers
Gemma4Provider — model loading, constrained generation, streaming generation
App integration — registered in MuseAvatarViewModel.setupLLMProviders() at priority 0

Performance

Achieved 97.6% of Python MLX speed after discovering and fixing the float32 literal trap — where Swift's default float32 scalars inject 1046 AsType cast nodes into the bfloat16 computation graph, causing 237 MB of Metal cache churn per token. Full writeup in ml-explore/mlx-swift-lm#188.

Metric	Result
Generation	73.7 tok/s
Metal cache/token	2 MB
Model size	~9 GB (8-bit quantized)
First use	Downloads from HuggingFace

Dependencies

arkavo-ai/mlx-swift-lm branch feature/gemma4-text (until upstream merges #188)
ml-explore/mlx-swift 0.31.3
huggingface/swift-transformers 1.2.1

Test plan

MuseCore builds (swift build)
Full app builds (xcodebuild -skipMacroValidation)
Gemma 4 model loads and generates correct output
Manual: first-launch download flow
Manual: chat interaction through avatar interface

🤖 Generated with Claude Code

Gemma 4 MLX provider for Arkavo Creator: - Primary provider (priority 0) with Apple Intelligence fallback - HuggingFace download on first use (~9 GB) - Both constrained (tool calls) and streaming (conversation) interfaces - ~74 tok/s generation on Apple Silicon Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

6-task plan to integrate Gemma 4 MLX inference into Arkavo Creator: 1. Add dependencies (MLXHuggingFace, Tokenizers) 2. Create Gemma4Provider with HuggingFace model loading 3. Add LLMResponseProvider conformance (constrained generation) 4. Add streaming generation 5. Register in app's fallback chain 6. Integration test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds mlx-swift, mlx-swift-lm (feature/gemma4-text branch), swift-transformers, and swift-huggingface as dependencies to support on-device Gemma 4 inference. Exposes MLX, MLXNN, MLXLMCommon, MLXLLM, MLXHuggingFace, Tokenizers, and HuggingFace products to the MuseCore target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements LLMResponseProvider backed by mlx-community/gemma-4-e4b-it-8bit. Includes loadModel/unloadModel lifecycle, constrained JSON generation at temperature 0, and a streaming generate method at temperature 0.6/topP 0.95. State is managed by a Swift actor for Swift 6 strict-concurrency compliance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Gemma 4 is now priority 0 (tried first) in the LLM fallback chain. Edge provider becomes the fallback when available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-04-05T01:01:47Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

arkavo-com · 2026-04-05T02:43:06Z

Merged into #239

arkavo-com and others added 6 commits April 4, 2026 20:45

feat: register Gemma4Provider as primary LLM in avatar view model

ff845ad

Gemma 4 is now priority 0 (tried first) in the LLM fallback chain. Edge provider becomes the fallback when available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update workspace Package.resolved for MLX dependencies

2f08170

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com closed this Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 on-device LLM via MLX Swift#240

Add Gemma 4 on-device LLM via MLX Swift#240
arkavo-com wants to merge 6 commits intomainfrom
update/mlx-latest

arkavo-com commented Apr 5, 2026

Uh oh!

sonarqubecloud bot commented Apr 5, 2026

Uh oh!

arkavo-com commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arkavo-com commented Apr 5, 2026

Summary

Key commits

Performance

Dependencies

Test plan

Uh oh!

sonarqubecloud bot commented Apr 5, 2026

Quality Gate passed

Uh oh!

arkavo-com commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant