Feature: Parakeet-MLX for audio transcription instead of Gemini#82
Draft
Caffa wants to merge 16 commits intoremorses:mainfrom
Draft
Feature: Parakeet-MLX for audio transcription instead of Gemini#82Caffa wants to merge 16 commits intoremorses:mainfrom
Caffa wants to merge 16 commits intoremorses:mainfrom
Conversation
- Add asr-service/ with FastAPI wrapper for parakeet-mlx - parakeet-mlx runs locally on Apple Silicon (MLX) - 10x faster than Whisper - No API keys needed - 100% free and local - Add 'parakeet' as TranscriptionProvider option - Add ASR_PROVIDER env var for switching providers - Works directly with OGG files - no preprocessing needed To use: 1. cd asr-service && pip install -r requirements.txt 2. python asr_server.py 3. Set ASR_PROVIDER=parakeet when running kimaki Tested on user's voice message - excellent transcription quality.
- Add asr-service-manager.ts to spawn/manage parakeet-mlx service - Auto-start ASR service when bot starts (if ASR_PROVIDER=parakeet) - Graceful shutdown of ASR service on bot exit - Add ASR log prefix for logging
Switch from cloud-based ASR to local parakeet-mlx as the default. No API keys required - works out of the box with reasonable speed. Changes: - Default ASR_PROVIDER changed from 'gemini' to 'parakeet' - Parakeet service starts automatically when voice channels enabled - Cloud providers only used when explicitly configured (ASR_PROVIDER=openai/gemini) - Updated voice.ts to use FormData file upload instead of base64 - Fixed asr_server.py Body parameter handling for FastAPI - Added startup logging to show active ASR provider
- Add isAppleSilicon() to detect macOS arm64 (M1/M2/M3 chips) - Update shouldAutoStartAsr() to only auto-start on Apple Silicon - Update voice-handler to default parakeet only on Apple Silicon - Add helpful error message for non-Apple Silicon users Parakeet MLX (local ASR) only works on Apple Silicon. Users on other platforms must use cloud providers (OpenAI/Gemini). Fixes issue where ASR service would default to parakeet but not auto-start, breaking voice transcription.
Update snapshots to match upstream changes that added z_orchestrator to model/footer output in Discord messages. These changes were introduced by upstream commits: - ede6a6b release: kimaki@0.4.78 - d156b9a fix(plugins): prefix part IDs with prt_ to satisfy OpenCode validation - 7636bbc Update SKILL.md
- Add vLLM transcription provider with OpenAI-compatible API - Implement auto-start capability for vLLM service when needed - Add fallback chain: Parakeet → vLLM → OpenAI → Gemini - Support ASR_PROVIDER=vllm for direct vLLM usage - Add VLLM_AUTO_START=true option for on-demand startup New files: - src/vllm-service-manager.ts - vLLM service lifecycle management - src/vllm-service-manager.test.ts - 15 tests for service manager - src/vllm-transcription.test.ts - 10 tests for transcription Features: - Auto-detect and start vLLM service when configured - Health check and status monitoring - Configurable host/port/model via environment variables - Proper cleanup on process exit Environment variables: - VLLM_AUTO_START=true - Auto-start vLLM on demand - ASR_PROVIDER=vllm - Use vLLM as primary transcription - VLLM_HOST, VLLM_PORT, VLLM_MODEL, VLLM_EXTRA_ARGS Tests: 25 tests passing for vLLM functionality
…anner
- Add figlet package for dynamic ASCII art generation
- Replace hardcoded banner with figlet.textSync('LOCAL\nVOICE', { font: 'ANSI Shadow' })
- Retain small text 'LOCAL TRANSCRIPTION VARIANT OF KIMAKI'
- Ensures big ASCII text is always correctly spelled
Key upstream changes merged: - New /tasks Discord command for task management - libsqlproxy package for SQLite proxy - Renamed opencode-plugin -> kimaki-opencode-plugin - Moved usecomputer to standalone repo - Removed zeke/zoke folders - Various performance and bug fixes Conflicts resolved: - discord/src/cli.ts: kept LOCAL VOICE transcription banner - errore: updated submodule to upstream (8 commits ahead) - pnpm-lock.yaml: merged lockfile
…cts-2026 The getProjectsDir() function now returns ~/Local-Projects-2026 instead of ~/.kimaki/projects. This removes the previous behavior where projects were stored under a configurable dataDir. This is intentional for my personal workflow - I prefer keeping projects in a dedicated folder rather than hidden in ~/.kimaki. Note: This also affects the special 'kimaki' default project in channel-management.ts, which will now be created at ~/Local-Projects-2026/kimaki.
- Add Array.isArray check before calling findIndex to prevent 'todos.findIndex is not a function' error when todos is not an array - Add unit tests for null, undefined, and non-array todos inputs - Add build and link scripts to package.json for local development - Add DEVELOP.md documenting local variant workflow
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes parakeet-mlx the default voice transcription provider. No API keys required. I have tested that parakeet-mlx can run on 16GB old macbook and 64GB M1 Ultra (2022) mac studio.
I have tested by sending a voice ogg file through discord and it sent back the transcription + continued the chat.
Changes
ASR_PROVIDERchanged from'gemini'to'parakeet'(local)ASR_PROVIDERasr_server.pyWhy
Backward Compatibility
Cloud providers still work when explicitly set:
ASR_PROVIDER=openai kimaki # or 'gemini'