Description
The first call to Ollama after starting v2t takes significantly longer than subsequent calls. This appears to be an Ollama cold-start issue.
Logs
2025-12-12 12:08:32.828 | INFO | voice2text:process_audio:136 - Transcribing...
2025-12-12 12:08:35.247 | INFO | voice2text:process_audio:141 - Raw: Hey, how is it going? (2.42s)
2025-12-12 12:08:35.248 | INFO | voice2text:process_audio:147 - Cleaning up...
2025-12-12 12:09:01.533 | INFO | voice2text:process_audio:165 - Clean: How is it going? (26.28s) <-- FIRST CALL: 26s
2025-12-12 12:09:04.926 | SUCCESS | voice2text:process_audio:172 - Pasted!
2025-12-12 12:09:14.448 | INFO | voice2text:process_audio:136 - Transcribing...
2025-12-12 12:09:16.807 | INFO | voice2text:process_audio:141 - Raw: Hey, how's it going? (2.36s)
2025-12-12 12:09:16.808 | INFO | voice2text:process_audio:147 - Cleaning up...
2025-12-12 12:09:17.389 | INFO | voice2text:process_audio:165 - Clean: How's it going? (0.58s) <-- SECOND CALL: 0.6s
Analysis
- First Ollama cleanup: 26.28s
- Second Ollama cleanup: 0.58s
This is a ~45x difference. Likely Ollama loading the model into memory on first call.
Possible solutions (for future consideration)
- Warm up Ollama on startup (similar to how we warm up Whisper)
- Document this as expected behavior
- Investigate if
ollama has a keep-alive or preload option
Description
The first call to Ollama after starting
v2ttakes significantly longer than subsequent calls. This appears to be an Ollama cold-start issue.Logs
Analysis
This is a ~45x difference. Likely Ollama loading the model into memory on first call.
Possible solutions (for future consideration)
ollamahas a keep-alive or preload option