Skip to content

Feature: Parakeet-MLX for audio transcription instead of Gemini#82

Draft
Caffa wants to merge 16 commits intoremorses:mainfrom
Caffa:main
Draft

Feature: Parakeet-MLX for audio transcription instead of Gemini#82
Caffa wants to merge 16 commits intoremorses:mainfrom
Caffa:main

Conversation

@Caffa
Copy link
Copy Markdown

@Caffa Caffa commented Mar 16, 2026

Makes parakeet-mlx the default voice transcription provider. No API keys required. I have tested that parakeet-mlx can run on 16GB old macbook and 64GB M1 Ultra (2022) mac studio.

I have tested by sending a voice ogg file through discord and it sent back the transcription + continued the chat.

Changes

  • Default ASR_PROVIDER changed from 'gemini' to 'parakeet' (local)
  • Parakeet service auto-starts when voice channels are enabled
  • Cloud providers (OpenAI/Gemini) only used when explicitly configured via ASR_PROVIDER
  • Switched from base64 JSON to FormData file upload for better compatibility
  • Fixed FastAPI Body parameter handling in asr_server.py
  • Added startup logging to show active ASR provider

Why

  • No setup friction: Users don't need API keys for voice transcription
  • Privacy: Audio processed locally
  • Speed: parakeet-mlx is faster than Whisper and other cloud solutions
  • Cost: Zero transcription costs

Backward Compatibility

Cloud providers still work when explicitly set:

ASR_PROVIDER=openai kimaki  # or 'gemini'

Caffa added 5 commits March 17, 2026 17:15
- Add asr-service/ with FastAPI wrapper for parakeet-mlx
- parakeet-mlx runs locally on Apple Silicon (MLX) - 10x faster than Whisper
- No API keys needed - 100% free and local
- Add 'parakeet' as TranscriptionProvider option
- Add ASR_PROVIDER env var for switching providers
- Works directly with OGG files - no preprocessing needed

To use:
1. cd asr-service && pip install -r requirements.txt
2. python asr_server.py
3. Set ASR_PROVIDER=parakeet when running kimaki

Tested on user's voice message - excellent transcription quality.
- Add asr-service-manager.ts to spawn/manage parakeet-mlx service
- Auto-start ASR service when bot starts (if ASR_PROVIDER=parakeet)
- Graceful shutdown of ASR service on bot exit
- Add ASR log prefix for logging
Switch from cloud-based ASR to local parakeet-mlx as the default.
No API keys required - works out of the box with reasonable speed.

Changes:
- Default ASR_PROVIDER changed from 'gemini' to 'parakeet'
- Parakeet service starts automatically when voice channels enabled
- Cloud providers only used when explicitly configured (ASR_PROVIDER=openai/gemini)
- Updated voice.ts to use FormData file upload instead of base64
- Fixed asr_server.py Body parameter handling for FastAPI
- Added startup logging to show active ASR provider
Caffa added 2 commits March 18, 2026 00:45
- Add isAppleSilicon() to detect macOS arm64 (M1/M2/M3 chips)
- Update shouldAutoStartAsr() to only auto-start on Apple Silicon
- Update voice-handler to default parakeet only on Apple Silicon
- Add helpful error message for non-Apple Silicon users

Parakeet MLX (local ASR) only works on Apple Silicon. Users on
other platforms must use cloud providers (OpenAI/Gemini).

Fixes issue where ASR service would default to parakeet but not
auto-start, breaking voice transcription.
Update snapshots to match upstream changes that added z_orchestrator
to model/footer output in Discord messages.

These changes were introduced by upstream commits:
- ede6a6b release: kimaki@0.4.78
- d156b9a fix(plugins): prefix part IDs with prt_ to satisfy OpenCode validation
- 7636bbc Update SKILL.md
@remorses remorses marked this pull request as draft March 17, 2026 17:47
Caffa added 9 commits March 25, 2026 11:05
- Add vLLM transcription provider with OpenAI-compatible API
- Implement auto-start capability for vLLM service when needed
- Add fallback chain: Parakeet → vLLM → OpenAI → Gemini
- Support ASR_PROVIDER=vllm for direct vLLM usage
- Add VLLM_AUTO_START=true option for on-demand startup

New files:
- src/vllm-service-manager.ts - vLLM service lifecycle management
- src/vllm-service-manager.test.ts - 15 tests for service manager
- src/vllm-transcription.test.ts - 10 tests for transcription

Features:
- Auto-detect and start vLLM service when configured
- Health check and status monitoring
- Configurable host/port/model via environment variables
- Proper cleanup on process exit

Environment variables:
- VLLM_AUTO_START=true - Auto-start vLLM on demand
- ASR_PROVIDER=vllm - Use vLLM as primary transcription
- VLLM_HOST, VLLM_PORT, VLLM_MODEL, VLLM_EXTRA_ARGS

Tests: 25 tests passing for vLLM functionality
…anner

- Add figlet package for dynamic ASCII art generation
- Replace hardcoded banner with figlet.textSync('LOCAL\nVOICE', { font: 'ANSI Shadow' })
- Retain small text 'LOCAL TRANSCRIPTION VARIANT OF KIMAKI'
- Ensures big ASCII text is always correctly spelled
Key upstream changes merged:
- New /tasks Discord command for task management
- libsqlproxy package for SQLite proxy
- Renamed opencode-plugin -> kimaki-opencode-plugin
- Moved usecomputer to standalone repo
- Removed zeke/zoke folders
- Various performance and bug fixes

Conflicts resolved:
- discord/src/cli.ts: kept LOCAL VOICE transcription banner
- errore: updated submodule to upstream (8 commits ahead)
- pnpm-lock.yaml: merged lockfile
…cts-2026

The getProjectsDir() function now returns ~/Local-Projects-2026 instead
of ~/.kimaki/projects. This removes the previous behavior where projects
were stored under a configurable dataDir.

This is intentional for my personal workflow - I prefer keeping projects
in a dedicated folder rather than hidden in ~/.kimaki.

Note: This also affects the special 'kimaki' default project in
channel-management.ts, which will now be created at
~/Local-Projects-2026/kimaki.
- Add Array.isArray check before calling findIndex to prevent
  'todos.findIndex is not a function' error when todos is not an array
- Add unit tests for null, undefined, and non-array todos inputs
- Add build and link scripts to package.json for local development
- Add DEVELOP.md documenting local variant workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant