Skip to content

Refactor Audio Stack and Offline Voice Agent, Implement Comprehensive Audio Testing and Diagnostics Overview#6

Closed
chcavignx wants to merge 6 commits intomainfrom
test
Closed

Refactor Audio Stack and Offline Voice Agent, Implement Comprehensive Audio Testing and Diagnostics Overview#6
chcavignx wants to merge 6 commits intomainfrom
test

Conversation

@chcavignx
Copy link
Copy Markdown
Owner

Implement Offline Voice Agent and Refactor Audio Stack
Overview
This update introduces a new offline voice agent, integrating wake word detection, speech recognition, and text-to-speech. It significantly refactors the underlying audio processing stack with new dependencies and improved engine logic. CI/CD pipelines also shifted to support dedicated ARM testing while relaxing overall type-checking rigor.
Introduces a robust suite of integration and unit tests for audio components. It includes new diagnostic tools and updates pre-commit hooks. The focus is on verifying audio hardware interaction and engine functionality.
uv, ruff usage, refactoring the CI/CD pipelines and pyproject.toml to support that.

Signed-off-by: chcavignx <ccubi73@gmail.com>
LiveReview Pre-Commit Check: ran (iter:7, coverage:100%)
LiveReview Pre-Commit Check: ran (iter:8, coverage:100%)
This commit introduces a comprehensive suite of integration tests for the audio library. The tests cover hardware detection, stream control, playback, and recording, aiming to validate audio functionality in deployment environments.

A new README file (`AUDIO_TESTS_README.md`) provides detailed descriptions of each test, expected outputs, troubleshooting guidance, and integration strategies for CI/CD pipelines.

Additionally, a quick start script (`QUICK_START_AUDIO_TESTS.sh`) and a runner script (`run_all_audio_tests.py`) are included to facilitate easy execution and management of these tests. This enhances the robustness and testability of the audio components.

Signed-off-by: chcavignx <ccubi73@gmail.com>

LiveReview Pre-Commit Check: vouched (iter:1, coverage:0%)
LiveReview Pre-Commit Check: ran (iter:1, coverage:0%)
LiveReview Pre-Commit Check: ran (iter:4, coverage:98%)
@qodo-code-review
Copy link
Copy Markdown

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: unit / Test "basic"

Failed stage: Install dependencies [❌]

Failed test name: ""

Failure summary:

The action failed during dependency synchronization because uv sync was invoked with both:
- the CLI
flag --locked, and
- the environment variable UV_FROZEN=1 (set in the job environment).

uv treats these options as mutually exclusive, producing the error at log line 261: the argument
--locked cannot be used with UV_FROZEN, and exiting with code 2.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

246:  UV_FROZEN: 1
247:  UV_PYTHON_INSTALL_DIR: /home/runner/work/_temp/uv-python-dir
248:  UV_PYTHON: 3.11.6
249:  UV_CACHE_DIR: /home/runner/work/_temp/setup-uv-cache
250:  ##[endgroup]
251:  uv download cache hit: false
252:  ##[group]Run uv sync --locked --no-default-groups --group test
253:  �[36;1muv sync --locked --no-default-groups --group test�[0m
254:  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
255:  env:
256:  UV_FROZEN: 1
257:  UV_PYTHON_INSTALL_DIR: /home/runner/work/_temp/uv-python-dir
258:  UV_PYTHON: 3.11.6
259:  UV_CACHE_DIR: /home/runner/work/_temp/setup-uv-cache
260:  ##[endgroup]
261:  error: the argument `--locked` cannot be used with `UV_FROZEN` (environment variable)
262:  ##[error]Process completed with exit code 2.
263:  Post job cleanup.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Implement Offline Voice Agent with Comprehensive Audio Stack, Testing, and CI/CD Restructuring

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Implements comprehensive offline voice agent with wake word detection, automatic speech
  recognition (ASR), and text-to-speech (TTS) synthesis
• Introduces new audio processing stack with ASREngine, TTSEngine, WakeWordDetector, and
  VADEngine classes supporting multiple backends (Faster-Whisper, OpenAI Whisper, Piper TTS,
  openWakeWord, Silero VAD)
• Adds 100+ unit tests covering audio engines, configuration system, utilities, and VAD/wake word
  detection with comprehensive mocking
• Implements 6+ end-to-end integration tests for microphone recording, audio playback, stream
  lifecycle, TTS-to-ASR pipeline, and sample transcription
• Refactors configuration system with new ASRConfig, AudioConfig, WakeConfig, VADConfig, and
  PlatformConfig classes supporting centralized audio settings
• Adds shared audio utilities module (audio_utils.py) with playback, validation, conversion, and
  error suppression helpers
• Restructures CI/CD pipelines to separate ARM testing from general testing with dedicated
  test-raspberry-pi job and coverage reporting
• Updates dependency management with uv and ruff, adds strict type-checking via basedpyright,
  and consolidates optional dependencies
• Expands documentation with audio implementation guides, integration testing documentation, and
  offline voice agent architecture
• Refactors example scripts and model download scripts with improved type safety, logging cleanup,
  and path handling
• Updates pre-commit hooks with basedpyright type-checking and restructured ruff stages
Diagram
flowchart LR
  A["Audio Input<br/>Microphone"] -->|"16kHz PCM"| B["WakeWordDetector<br/>openWakeWord"]
  B -->|"Wake Event"| C["ASREngine<br/>Faster-Whisper/Whisper"]
  C -->|"Transcribed Text"| D["Intent Processing<br/>Response Generation"]
  D -->|"Response Text"| E["TTSEngine<br/>Piper TTS"]
  E -->|"Audio Output"| F["Speaker<br/>Playback"]
  B -->|"VAD Segmentation"| G["VADEngine<br/>Silero VAD"]
  G -->|"Speech Segments"| C
  H["Config System<br/>ASRConfig/AudioConfig"] -.->|"Settings"| B
  H -.->|"Settings"| C
  H -.->|"Settings"| E
Loading

Grey Divider

File Changes

1. tests/audio/test_audio_engine_units.py 🧪 Tests +899/-0

Audio Engine Unit Tests and Integration Coverage

• Comprehensive unit test suite for ASR and TTS engines with 100+ test cases
• Tests cover model loading, speech detection, transcription, synthesis, and playback
• Includes mocking of PyAudio, Whisper, Faster-Whisper, and Piper components
• Tests error handling, thread lifecycle, queue management, and edge cases

tests/audio/test_audio_engine_units.py


2. src/audio/asr.py ✨ Enhancement +721/-0

Automatic Speech Recognition Engine Implementation

• New ASREngine class combining speech-to-text with voice activity detection
• Supports multiple backends: OpenAI Whisper and Faster-Whisper with configurable device/compute
• Implements background capture and processing threads with automatic resampling
• Includes Silero VAD integration with energy-based fallback for speech detection

src/audio/asr.py


3. src/utils/config.py ✨ Enhancement +213/-79

Configuration System Refactoring and Expansion

• Refactored configuration structure with new ASRConfig, AudioConfig, WakeConfig, VADConfig, and
 PlatformConfig classes
• Added audio input/output settings, platform detection for Raspberry Pi, and CPU core limiting
• Replaced STTConfig with ASRConfig supporting both Whisper and Faster-Whisper engines
• Enhanced path management with separate audio and vision model directories

src/utils/config.py


View more (60)
4. src/audio/wake_word.py ✨ Enhancement +441/-0

Wake Word Detection Engine with Background Threading

• New WakeWordDetector class using openWakeWord for lightweight wake word detection
• Implements background capture and detection threads with automatic resampling to 16kHz
• Supports configurable cooldown to prevent false positives and threshold-based triggering
• Handles PyAudio stream management with proper error handling and cleanup

src/audio/wake_word.py


5. src/audio/tts.py ✨ Enhancement +375/-0

Text-to-Speech Engine with Queue-Based Playback

• New TTSEngine class for text-to-speech synthesis using Piper TTS
• Supports both CLI subprocess mode and Python API mode for flexibility
• Implements non-blocking queue-based playback with separate synthesis thread
• Includes PCM to WAV conversion, PyAudio playback, and text truncation for responsiveness

src/audio/tts.py


6. tests/utils/test_config.py 🧪 Tests +257/-0

Configuration System Unit Tests

• Unit tests for all configuration classes and their properties
• Tests path resolution, model path generation, and configuration loading from YAML
• Validates default values and property setters for ASR, TTS, VAD, and platform configs
• Tests security checks for config file path validation

tests/utils/test_config.py


7. tests/audio/test_audio_utils.py 🧪 Tests +239/-0

Audio Utilities Unit Tests

• Tests for audio utility functions including validation, cleaning, and format conversion
• Covers NaN/Inf handling, int16 conversion, and audio stream playback
• Tests stderr suppression for PortAudio and ALSA error handler installation
• Includes tests for sounddevice integration and subprocess-based audio playback

tests/audio/test_audio_utils.py


8. src/audio/vad.py ✨ Enhancement +162/-0

Voice Activity Detection Engine with Platform Tuning

• New VADEngine class wrapping Silero VAD for voice activity detection
• Implements platform-specific tuning for Raspberry Pi with CPU core limiting
• Provides methods to detect speech and extract speech segments with timestamps
• Supports configurable thresholds for speech duration and silence detection

src/audio/vad.py


9. examples/test_recording.py 🧪 Tests +264/-0

Audio Recording Integration Test

• New integration test for audio recording from microphone to WAV file
• Tests device detection, sample rate negotiation, and audio capture
• Includes validation of file size and recording duration
• Provides progress indicators and detailed error reporting

examples/test_recording.py


10. scripts/models/audio/whisper_objects.py ✨ Enhancement +25/-36

Whisper Model Download Script Refactoring

• Updated cache directory path from models_path to models_audio_path
• Refactored download_file() to use rsplit() for URL parsing and removed verbose logging
• Simplified error handling by catching exceptions silently
• Changed download_dir parameter to download_root in whisper.load_model() call

scripts/models/audio/whisper_objects.py


11. examples/STT/whisper/test_whisper_w_huggingface.py ✨ Enhancement +66/-46

Whisper Hugging Face Example Type Safety Improvements

• Added type hints and Protocol definitions for model, processor, and pipeline objects
• Removed verbose print statements throughout execution flow
• Added from __future__ import annotations for forward compatibility
• Replaced string-based device selection with proper type casting
• Changed dtype parameter to torch_dtype in pipeline configuration

examples/STT/whisper/test_whisper_w_huggingface.py


12. examples/VAD/voice_agent_offline.py ✨ Enhancement +236/-0

Offline Voice Agent with Wake Word Detection

• New offline voice agent implementation with wake word detection, ASR, and TTS
• Implements threading-based architecture for non-blocking audio processing
• Includes simple intent-based response generation system
• Provides signal handling for graceful shutdown

examples/VAD/voice_agent_offline.py


13. examples/TTS/text2speech_piper.py ✨ Enhancement +67/-27

Piper TTS Example Type Safety and Path Handling

• Added comprehensive type hints and Protocol definitions for Piper voice and audio streams
• Refactored setup_output_filename() with proper path handling using pathlib
• Added type annotations to synthesize_voice_and_save() and synthesize_voice() functions
• Removed print statements from synthesis functions

examples/TTS/text2speech_piper.py


14. examples/test_stream_open_close.py 🧪 Tests +228/-0

Audio Stream Lifecycle Integration Test

• New integration test for audio stream open/close operations
• Tests both input and output stream lifecycle management
• Validates stream state transitions and data reading/writing
• Includes device detection and sample rate negotiation

examples/test_stream_open_close.py


15. src/audio/audio_utils.py ✨ Enhancement +204/-0

Shared Audio Processing Utilities Module

• New utility module for shared audio processing functions
• Provides audio playback, validation, and conversion utilities
• Includes ALSA and PortAudio error suppression helpers
• Implements audio data cleaning and int16 conversion with validation

src/audio/audio_utils.py


16. examples/test_playback.py 🧪 Tests +197/-0

Audio Playback Integration Test

• New integration test for audio playback functionality
• Tests sine wave generation and playback through output device
• Tests silence playback to verify output stream functionality
• Includes device detection and timing validation

examples/test_playback.py


17. tests/audio/test_vad_wakeword_units.py 🧪 Tests +133/-0

VAD and Wake Word Detection Unit Tests

• New unit tests for VAD (Voice Activity Detection) engine
• Tests wake word detection with mocked models
• Validates speech segment detection and error handling
• Tests wake word model loading and detection loop

tests/audio/test_vad_wakeword_units.py


18. src/utils/sysutils.py ✨ Enhancement +41/-25

System Utilities Logging and Type Safety Refactoring

• Refactored to use logging instead of print statements
• Added type hints and improved documentation
• Simplified detect_raspberry_pi_model() using pathlib.Path.read_text()
• Removed verbose output from CPU detection and limiting functions

src/utils/sysutils.py


19. scripts/models/audio/load_huggingface_objects.py ✨ Enhancement +30/-23

Hugging Face Model Download Script Enhancement

• Added .env file support via python-dotenv for HF_TOKEN configuration
• Updated cache directory path from models_path to models_audio_path
• Added project root to sys.path for proper imports
• Removed verbose logging and simplified error handling

scripts/models/audio/load_huggingface_objects.py


20. tests/utils/test_sysutils.py 🧪 Tests +113/-0

System Utilities Unit Tests

• New comprehensive unit tests for sysutils module functions
• Tests CPU detection, memory reporting, and Raspberry Pi detection
• Includes monkeypatch fixtures for system call mocking
• Validates edge cases like missing CPU count and OSError handling

tests/utils/test_sysutils.py


21. examples/STT/whisper/test_whisper.py ✨ Enhancement +18/-9

Whisper Example Type Safety and Logging Cleanup

• Added type hints and Protocol definitions for Whisper model
• Removed verbose initialization and model selection print statements
• Added from __future__ import annotations for forward compatibility
• Simplified error handling by catching RuntimeError silently

examples/STT/whisper/test_whisper.py


22. examples/STT/vosk/vosk_test_simple.py ✨ Enhancement +4/-11

Vosk Example Logging Cleanup

• Removed verbose print statements for usage, audio format, and transcription output
• Replaced os.path.exists() with pathlib.Path.exists()
• Simplified error handling by removing informational messages

examples/STT/vosk/vosk_test_simple.py


23. tests/audio/test_e2e_mic_asr.py 🧪 Tests +87/-0

Microphone to ASR End-to-End Integration Test

• New end-to-end integration test for microphone to ASR transcription
• Records audio from microphone and transcribes using ASR engine
• Includes device detection and model caching checks
• Uses pytest markers for integration test categorization

tests/audio/test_e2e_mic_asr.py


24. scripts/models/audio/fast_whisper_objects.py ✨ Enhancement +13/-11

Faster Whisper Model Download Script Refactoring

• Updated cache directory path from models_path to asr.download_path
• Added project root to sys.path for proper imports
• Removed verbose logging statements
• Simplified docstring formatting

scripts/models/audio/fast_whisper_objects.py


25. examples/test_hardware_detection.py 🧪 Tests +80/-0

Audio Hardware Detection Integration Test

• New integration test for audio hardware detection
• Lists all available input and output devices with channel counts
• Validates hardware availability for CI/CD health checks
• Provides detailed device information for troubleshooting

examples/test_hardware_detection.py


26. examples/run_all_audio_tests.py 🧪 Tests +81/-0

Audio Integration Test Runner

• New test runner orchestrating all audio integration tests
• Provides summary reporting with pass/fail status
• Includes timeout handling and subprocess management
• Generates comprehensive test results summary

examples/run_all_audio_tests.py


27. scripts/models/audio/vosk_models.py ✨ Enhancement +12/-11

Vosk Model Download Script Refactoring

• Added project root to sys.path for proper imports
• Updated cache directory path from models_path to models_audio_path
• Removed verbose logging for download and extraction progress
• Simplified docstring formatting

scripts/models/audio/vosk_models.py


28. scripts/models/audio/load_all.py ✨ Enhancement +10/-20

Master Model Loading Script Refactoring

• Added project root to sys.path for proper imports
• Removed verbose phase banners and progress logging
• Added wakeword_model to model loading phases
• Simplified error handling with broad exception catching

scripts/models/audio/load_all.py


29. scripts/models/audio/piper_models.py ✨ Enhancement +21/-18

Piper Model Deployment Script Refactoring

• Updated to use config.paths.models_audio_path instead of cache/data paths
• Changed from file moving to symlink creation for model deployment
• Added proper pathlib usage for path operations
• Removed verbose logging statements

scripts/models/audio/piper_models.py


30. tests/audio/_helpers.py 🧪 Tests +67/-0

Audio Test Helper Utilities

• New helper module for audio test utilities
• Provides functions to check TTS/ASR model availability
• Includes device detection helpers for PyAudio
• Supports test skipping when models or hardware unavailable

tests/audio/_helpers.py


31. tests/audio/test_e2e_tts_asr.py 🧪 Tests +43/-0

TTS to ASR End-to-End Integration Test

• New end-to-end integration test for TTS to ASR pipeline
• Synthesizes text to WAV using Piper TTS
• Transcribes generated audio using ASR engine
• Validates round-trip audio processing

tests/audio/test_e2e_tts_asr.py


32. tests/audio/test_e2e_sample_audio_transcription.py 🧪 Tests +28/-0

Sample Audio Transcription Integration Test

• New integration test for sample audio file transcription
• Tests ASR engine with repository sample audio
• Includes model caching checks and pytest markers
• Validates basic transcription functionality

tests/audio/test_e2e_sample_audio_transcription.py


33. tests/audio/test_asr_vad_logic.py 🧪 Tests +22/-0

ASR Voice Activity Detection Unit Tests

• New unit tests for ASR engine VAD (Voice Activity Detection) logic
• Tests energy-based VAD detection on audio chunks
• Tests fallback behavior when VAD model unavailable
• Validates speech detection with int16 audio data

tests/audio/test_asr_vad_logic.py


34. tests/audio/test_tts_queue.py 🧪 Tests +30/-0

TTS Queue Management Unit Tests

• New unit tests for TTS engine queue management
• Tests text enqueueing without model loading
• Tests empty text filtering
• Tests queue clearing on interrupt

tests/audio/test_tts_queue.py


35. scripts/models/audio/models_check.py ✨ Enhancement +4/-3

Model Existence Check Type Safety Enhancement

• Added type hints for target_dir parameter accepting PathLike
• Improved docstring formatting
• Enhanced type safety for path handling

scripts/models/audio/models_check.py


36. scripts/models/audio/wakeword_model.py ✨ Enhancement +29/-0

Wakeword Model Download Script

• New script for downloading and caching wakeword models
• Uses openwakeword library for model management
• Configures cache directory via centralized config
• Supports "hey_jarvis_v0.1" wakeword model

scripts/models/audio/wakeword_model.py


37. tests/test_basic.py 🧪 Tests +8/-8

Basic Test Suite Improvements

• Renamed test functions for clarity and added return type hints
• Updated test descriptions to be more specific
• Added test for main entrypoint importability
• Improved test naming conventions

tests/test_basic.py


38. src/audio/__init__.py ✨ Enhancement +18/-0

Audio Package Initialization with Lazy Loading

• New audio package initialization with lazy loading
• Provides lazy imports for whisper and faster_whisper modules
• Uses lazy_loader for deferred module loading
• Includes TYPE_CHECKING imports for type hints

src/audio/init.py


39. src/main.py ✨ Enhancement +2/-6

Main Entry Point Simplification

• Removed shebang and verbose docstring
• Simplified main function docstring
• Removed print statement from main function

src/main.py


40. src/utils/__init__.py 📝 Documentation +1/-1

Utils Package Documentation Update

• Updated module docstring to describe utilities package
• Removed shebang comment

src/utils/init.py


41. src/__init__.py ✨ Enhancement +1/-0

Source Package Initialization

• New package initialization file
• Provides basic package identifier comment

src/init.py


42. examples/QUICK_START_AUDIO_TESTS.sh 📝 Documentation +44/-0

Audio Tests Quick Start Guide

• New bash script providing quick reference for audio integration tests
• Documents individual test locations and usage
• Provides command examples for running tests
• Includes references to documentation and expected outcomes

examples/QUICK_START_AUDIO_TESTS.sh


43. scripts/install/dependencies.sh ⚙️ Configuration changes +12/-1

System Dependencies Installation Script Update

• Reformatted apt-get install command with line breaks for readability
• Added new dependencies: python3-all-dev, pipewire-alsa, libspeexdsp-dev
• Improved script maintainability with clearer package listing

scripts/install/dependencies.sh


44. requirements.txt Dependencies +245/-75

Comprehensive Requirements Update for Audio Stack

• Significantly expanded dependency list with audio and ML packages
• Added faster-whisper, openwakeword, piper-tts, silero-vad
• Added torch, torchaudio, transformers, and accelerate for ML
• Added pyaudio, sounddevice, soundfile for audio I/O
• Updated versions for existing packages

requirements.txt


45. pyproject.toml ⚙️ Configuration changes +218/-60

Refactor dependency management and add strict type-checking configuration

• Consolidated optional dependencies into main dependencies list with new packages (dotenv,
 onnxruntime, psutil, piper-tts>=1.4.1, openwakeword, lazy-loader)
• Restructured dependency groups using [dependency-groups] format with dev, lint, and test
 groups replacing old [project.optional-dependencies] structure
• Added comprehensive [tool.ruff] configuration with strict linting rules, per-file ignores for
 tests, and format settings
• Added [tool.basedpyright] configuration with strict type-checking settings (null safety, type
 completeness, code quality)
• Updated [tool.uv] with workspace configuration and custom git sources for openai-whisper and
 faster-whisper
• Removed mypy configuration and relaxed pytest coverage requirement from 80% to implicit baseline

pyproject.toml


46. docs/TTS_offline.md 📝 Documentation +91/-29

Expand TTS documentation with implementation details and fixes

• Fixed typos (assitantassistant, scryptvscript)
• Corrected markdown formatting (numbered list items, heading levels)
• Expanded Piper section with detailed configuration settings and supported modes (Python API vs
 CLI)
• Added new section describing the TTS engine implementation in src/audio/tts.py with architecture
 details
• Replaced comparison table with implementation notes and custom voice instructions
• Added example usage pointing to examples/VAD/voice_agent_offline.py

docs/TTS_offline.md


47. docs/audio_usb_test.md 📝 Documentation +72/-124

Streamline USB audio testing guide with clearer structure

• Simplified and condensed tutorial from 159 to 107 lines with clearer focus on audio path
 verification
• Reorganized content into logical sections: hardware setup, useful commands, step-by-step
 instructions
• Removed verbose explanations and replaced with concise command examples
• Added device selection configuration details for audio.input_device_index and
 audio.output_device_index
• Included reference to automation script scripts/tests/audio_test.sh

docs/audio_usb_test.md


48. examples/AUDIO_TESTS_README.md 📝 Documentation +277/-0

Add comprehensive audio integration testing documentation

• New comprehensive guide for audio integration tests with 277 lines of documentation
• Describes four test modules: hardware detection, stream open/close, playback, and recording
• Includes test descriptions, expected outputs, skip conditions, and common issues
• Provides integration with CI/CD pipelines and development guidelines for adding new tests
• Documents hardware tested and architecture philosophy (simple-first, build-progressively)

examples/AUDIO_TESTS_README.md


49. docs/STS_VAD_models.md 📝 Documentation +68/-30

Document implemented audio library architecture and integration

• Replaced generic conclusion with detailed "Implemented Solution" section describing src/audio
 library architecture
• Added subsections for wake word detection (WakeWordDetector), speech recognition (ASREngine),
 and text-to-speech (TTSEngine)
• Documented default configuration values and model locations under .cache/audio/models/
• Described integration flow with examples/VAD/voice_agent_offline.py state machine
• Removed references section and consolidated into implementation-focused content

docs/STS_VAD_models.md


50. docs/DEV_PROCESS.md 📝 Documentation +15/-14

Update development process with audio validation sequence

• Removed time estimates from priority sections (1-3 days, 3-5 days, etc.)
• Expanded "Proposed Sequence" with detailed audio module validation steps referencing specific
 guides
• Updated "Sequenced Guides" section with corrected file paths and added STS_VAD_models.md
 reference
• Changed placeholder formatting from **TO DO** to **!TO DO!** for visibility
• Added reference to examples/VAD/voice_agent_offline.md with clarified description

docs/DEV_PROCESS.md


51. .github/workflows/run_tests.yml ⚙️ Configuration changes +38/-20

Restructure CI/CD with separate ARM testing and coverage reporting

• Changed ruff and test jobs from ubuntu-24.04-arm to ubuntu-latest runner
• Removed mypy job entirely
• Updated python-uv-setup action calls with specific sync-flags for lint and test groups
• Added new test-raspberry-pi job running on ubuntu-24.04-arm with 30-minute timeout and
 --extra raspberry-pi flag
• Updated test commands to include --cov=src and upload coverage to Codecov

.github/workflows/run_tests.yml


52. docs/STT_offline.md 📝 Documentation +88/-0

Document ASR engine implementation and configuration details

• Added new "Final Choice and ASR Library Building" section (91 lines) documenting
 src/audio/asr.py implementation
• Documented configuration settings from src/utils/config.py (engine, model_size, device,
 compute_type, etc.)
• Described ASR engine workflow: microphone capture, VAD segmentation, transcription, callback
 delivery
• Listed supported backends (Faster-Whisper as default, OpenAI Whisper as alternate)
• Added installation instructions, example usage, and practical notes about native teardown
• Referenced audio test hardware guides and device listing scripts

docs/STT_offline.md


53. examples/VAD/voice_agent_offline.md 📝 Documentation +84/-0

Add voice agent example documentation and architecture guide

• New documentation file (84 lines) for the offline voice agent example
• Describes the three-component architecture: WakeWordDetector, ASREngine, TTSEngine
• Explains the state machine flow and main extension point (_generate_response())
• Documents architecture notes, usage, configuration, customization ideas, and runtime behavior
• Provides guidance on how to modify and extend the agent

examples/VAD/voice_agent_offline.md


54. config.yaml ⚙️ Configuration changes +36/-11

Restructure configuration with ASR, wake word, and audio device settings

• Renamed stt section to asr with expanded configuration (added device, compute_type,
 skip_native_teardown)
• Updated TTS configuration with new fields (speed, volume) and changed default model to
 en_US-hfc_female-medium.onnx
• Added new wake section for wake word detection (wake_word, model_name,
 inference_framework, threshold, cooldown_seconds, noise_suppression)
• Added new audio section for device configuration (input_device_index, output_device_index,
 input_sample_rate, output_sample_rate, volume, chunk sizes)
• Updated VAD configuration with threshold field and reduced max_recording_seconds from 15 to 10

config.yaml


55. .github/actions/python-uv-setup/action.yml ⚙️ Configuration changes +12/-8

Update GitHub Actions setup with conditional system packages

• Updated description to reflect new dependency profile approach
• Changed default sync-flags from --all-extras --dev to --locked
• Added new input install-system-packages (boolean, default false) for conditional system
 dependency installation
• Updated setup-uv action from v4 to v7 with pinned version 0.11.2
• Modified system dependency installation to be conditional and removed espeak-ng and
 python3-pyaudio from package list

.github/actions/python-uv-setup/action.yml


56. .pre-commit-config.yaml ⚙️ Configuration changes +21/-4

Update pre-commit hooks with basedpyright type-checking

• Updated ruff-pre-commit from v0.14.5 to v0.15.12 with restructured hooks
• Split ruff checking into pre-commit stage (with --fix) and pre-push stage (without fix)
• Updated pre-commit-hooks from v4.6.0 to v6.0.0
• Updated conventional-pre-commit from v2.3.0 to v4.4.0
• Added new local hook for basedpyright strict type-checking on pre-push stage

.pre-commit-config.yaml


57. examples/TTS/text2speech_espeak.py Additional files +0/-87

...

examples/TTS/text2speech_espeak.py


58. scripts/models/audio/openai_model.py Additional files +0/-52

...

scripts/models/audio/openai_model.py


59. src/audio/test_voice_agent.py Additional files +0/-477

...

src/audio/test_voice_agent.py


60. src/audio/voice_agent_offline.md Additional files +0/-124

...

src/audio/voice_agent_offline.md


61. src/audio/voice_agent_offline.py Additional files +0/-1005

...

src/audio/voice_agent_offline.py


62. tests/audio/__init__.py Additional files +0/-0

...

tests/audio/init.py


63. tests/utils/__init__.py Additional files +0/-0

...

tests/utils/init.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 28, 2026

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0)

Grey Divider


Action required

1. ASR import-time dep crash 🐞 Bug ☼ Reliability
Description
src/audio/asr.py imports faster_whisper, pyaudio, and whisper at module import time, so missing
optional deps will crash before ASREngine can be constructed. This makes the later ImportError
handling in _load_whisper/_load_faster_whisper unreachable and breaks optional backend behavior.
Code

src/audio/asr.py[R34-38]

+import numpy as np
+from numpy.typing import NDArray
+import faster_whisper
+import pyaudio
+import whisper
Evidence
The module imports heavy/optional dependencies at import time, but later tries to handle missing
dependencies via try/except ImportError inside loader methods; those handlers cannot run if the
module import already failed.

src/audio/asr.py[34-46]
src/audio/asr.py[191-243]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`src/audio/asr.py` imports `faster_whisper`, `pyaudio`, and `whisper` at module import time. This prevents the module from being imported in environments where any of these dependencies are absent, and it makes the later `ImportError` handling in `_load_whisper()` / `_load_faster_whisper()` ineffective.

### Issue Context
The code already has explicit try/except `ImportError` blocks inside the model-loading methods, implying these dependencies are meant to be optional or at least fail gracefully.

### Fix Focus Areas
- src/audio/asr.py[34-46]
- src/audio/asr.py[191-243]

### What to change
- Remove the top-level imports for `faster_whisper`, `pyaudio`, and `whisper`.
- Import `pyaudio` inside the method(s) that open streams and handle `ImportError` with a clear message.
- Keep `whisper` and `faster_whisper` imports inside `_load_whisper()` / `_load_faster_whisper()` only (you already do this) and ensure no other top-level references require them.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Wake model_name None path 🐞 Bug ≡ Correctness
Description
WakeConfig defaults model_name to None but full_model_path always formats it into a filename,
producing a path like '<...>/None.onnx'. WakeWordDetector.load unconditionally passes this path to
openwakeword.Model, so wake-word initialization fails when wake.model_name is omitted.
Code

src/utils/config.py[R121-145]

+    wake_word: str = "hey_jarvis"
+    model_name: str | None = None
+    model_path: str | None = None
+    inference_framework: str = "onnx"  # or "pytorch" if using a PyTorch model
+    threshold: float = 0.4
+    cooldown_seconds: float = 2.0  # minimum seconds between detections
+    download_root: str | None = None
+    noise_suppression: bool = False
+    vad_threshold: float = 0.6
+
+    @property
+    def download_path(self) -> Path:
+        """Returns the download path for the ASR model."""
+        if self.download_root:
+            p = ROOT_DIR / self.download_root
+            # Avoid doubling engine name if already in path
+            if p.suffix in {".onnx", ".tflite"}:
+                return p
+            return p / "wakeword"
+        return self.models_audio_path / "wakeword"
+
+    @property
+    def full_model_path(self) -> Path:
+        """Returns the full path to the wakeword model."""
+        return self.download_path / f"{self.model_name}.{self.inference_framework}"
Evidence
WakeConfig.full_model_path uses model_name even when it is None, and WakeWordDetector.load uses
full_model_path without validation, leading to an invalid model file path.

src/utils/config.py[118-145]
src/audio/wake_word.py[135-161]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WakeConfig.model_name` defaults to `None`, but `WakeConfig.full_model_path` formats it into a filename anyway, yielding an invalid path like `None.onnx`. `WakeWordDetector.load()` then passes that path to `openwakeword.Model(...)`, which will fail.

### Issue Context
This is triggered whenever the config YAML omits `wake.model_name` or when running with default config (e.g., missing config file).

### Fix Focus Areas
- src/utils/config.py[118-145]
- src/audio/wake_word.py[135-161]

### What to change
- Provide a non-None default for `model_name` (e.g., derive it from `wake_word`, or set a known-good default such as `"hey_jarvis_v0.1"`).
- Or, add validation:
 - In `full_model_path`, if `model_name is None`, raise a clear `ValueError` explaining it must be configured.
- Ensure WakeWordDetector logs/errors are actionable when the model path is missing.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Platform tuning never runs 🐞 Bug ≡ Correctness
Description
PlatformConfig defines __post_init__ to auto-detect Raspberry Pi and clamp cpu_cores, but
PlatformConfig inherits pydantic.BaseModel so __post_init__ is never called. As a result, platform
fields won’t be auto-populated unless callers manually invoke is_raspberry_pi()/cpu_limit().
Code

src/utils/config.py[R174-203]

+class PlatformConfig(PathConfig):
+    """Configuration for platform-specific tuning."""
+
+    cpu_cores: int | None = 2  # Limit CPU cores for multiprocessing on Pi5
+    pi: bool | None = False  # Automatically detect Raspberry Pi and apply tuning
+
+    def is_raspberry_pi(self) -> bool:
+        """Detect whether the system is running on a Raspberry Pi 5.
+
+        Returns:
+            `True` when the host appears to be a Raspberry Pi 5.
+
+        """
+        self.pi = detect_raspberry_pi_model()
+        return self.pi
+
+    def cpu_limit(self) -> int:
+        """Set and return the CPU core limit for multiprocessing.
+
+        Returns:
+            The CPU core limit selected for multiprocessing.
+
+        """
+        self.cpu_cores = limit_cpu_for_multiprocessing(self.cpu_cores)
+        return self.cpu_cores
+
+    def __post_init__(self) -> None:
+        """Apply platform-specific tuning after initialization."""
+        self.is_raspberry_pi()
+        self.cpu_limit()
Evidence
PlatformConfig is a pydantic BaseModel (via PathConfig), and Config constructs it via
model_validate; with pydantic v2, dataclass-style __post_init__ is not executed, so the intended
auto-tuning code won’t run.

src/utils/config.py[20-22]
src/utils/config.py[174-204]
src/utils/config.py[209-218]
pyproject.toml[41-58]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`PlatformConfig.__post_init__()` is intended to run automatically after initialization to detect Raspberry Pi and apply a CPU core limit. But `PlatformConfig` inherits from `pydantic.BaseModel`, so `__post_init__` will not be called, and the tuning never happens.

### Issue Context
The project depends on `pydantic>=2.0.0`, where the correct hook is `model_post_init` or an `@model_validator(mode="after")`.

### Fix Focus Areas
- src/utils/config.py[20-22]
- src/utils/config.py[174-204]
- src/utils/config.py[209-218]

### What to change
- Replace `def __post_init__(...)` with one of:
 - `def model_post_init(self, __context: Any) -> None:` and call `self.is_raspberry_pi()` / `self.cpu_limit()` there, or
 - `@model_validator(mode="after")` to apply tuning.
- Alternatively, invoke `self.platform.is_raspberry_pi()` and `self.platform.cpu_limit()` inside `Config.__init__` after `PlatformConfig.model_validate(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. TTS model path rebuilt wrong 🐞 Bug ≡ Correctness
Description
TTSEngine._load_piper_python reconstructs the model file as parent(full_model_path)/model_name,
which breaks when TTSConfig.model_path is already a full file path. This can raise FileNotFoundError
even though the configured model file exists.
Code

src/audio/tts.py[R182-188]

+        # Determine models directory from config
+        models_dir = Path(self._config.tts.full_model_path).parent
+        model_file = models_dir / self._config.tts.model_name
+
+        if not model_file.exists():
+            msg = f"Piper model not found: {model_file}\nDownload from huggingface or run setup script."
+            raise FileNotFoundError(msg)
Evidence
TTSConfig.full_model_path explicitly supports file paths (returns the file directly), but TTSEngine
discards that file path and appends model_name again, potentially pointing to a
different/non-existent location.

src/utils/config.py[106-115]
src/audio/tts.py[174-192]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`TTSConfig.full_model_path` returns the exact file path when `model_path` points to a file (suffix `.onnx/.bin/.pt`). But `TTSEngine._load_piper_python()` rebuilds the path as `parent(full_model_path) / model_name`, which can ignore a valid configured file path and incorrectly raise `FileNotFoundError`.

### Issue Context
This affects configurations that set `tts.model_path` to a specific model file rather than a directory.

### Fix Focus Areas
- src/utils/config.py[106-115]
- src/audio/tts.py[174-192]

### What to change
- In `_load_piper_python`, set `model_file = Path(self._config.tts.full_model_path)` directly (no recomposition).
- If you also want to support directory-based configuration, handle that explicitly (e.g., detect `full_model_path.is_dir()` vs `is_file()` before choosing the final path).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/audio/asr.py
Comment thread src/utils/config.py
@chcavignx
Copy link
Copy Markdown
Owner Author

fix will be done in next commit

@chcavignx
Copy link
Copy Markdown
Owner Author

fix in next commit

@chcavignx chcavignx closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant