Conversation
Migrates and refactors model download scripts for Whisper, Vosk, and Fast-Whisper from shell and example scripts to Python scripts under scripts/models/audio/. Adds model existence checks, unifies cache/model paths, and centralizes configuration using a new config.yaml and src/utils/config.py. Updates voice_agent_offline.py and related scripts to use the new configuration and model management structure, improving maintainability and consistency.
Renamed and relocated ONNX model files and their JSON metadata from the data/ directory to data/models/piper/ for better organization. Updated .gitignore to remove exclusions for models/*.onnx and data/, and added .env to ignore environment variable files.
Refined license file pattern in pyproject.toml and cleaned up environment markers. requirements.txt now includes more explicit dependency sources. uv.lock was updated to require Python >=3.11, simplify markers, and pin all direct dependencies with explicit versions.
Added a direct link to the Piper model repository on Hugging Face in the TTS offline guide. Updated the STT Whisper RPi5 example to use the Python script instead of the shell script for model setup, reflecting the current recommended usage.
Introduces src/main.py with a main() function that prints a greeting. Sets up the script to run main() when executed directly.
📝 WalkthroughWalkthroughCentralizes runtime configuration, reorganizes and replaces model management scripts with Python utilities under Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/models/audio/vosk_models.py (1)
25-29: Missing timeout on HTTP request can cause indefinite hangs.
requests.get()without atimeoutparameter can block forever if the server doesn't respond, causing the script to hang indefinitely.Proposed fix
- with requests.get(url, stream=True) as r: + with requests.get(url, stream=True, timeout=30) as r:src/audio/voice_agent_offline.py (2)
43-54: Variable shadowing:configmodule is overwritten byConfiginstance.Line 43 imports the
configmodule, but line 54 reassignsconfigto the return value ofload_config(), shadowing the module. After this point,configrefers to aConfigobject, not the module.This is confusing and prevents any further use of the module's functions (e.g.,
config.setup_python_path()).🔎 Proposed fix
-from utils import config -from utils.sysutils import detect_raspberry_pi_model, limit_cpu_for_multiprocessing +from utils.config import config, setup_python_path +from utils.sysutils import detect_raspberry_pi_model, limit_cpu_for_multiprocessing try: from vosk import KaldiRecognizer from vosk import Model as VoskModel except ImportError: VoskModel = None KaldiRecognizer = None -config.setup_python_path() -config = config.load_config() +setup_python_path() +# config is already loaded as a global in utils.config
567-574: Logic bug: CLI error handling immediately overwritten.When
process.returncode != 0,successis set toFalseon line 570, but line 571 unconditionally setssuccess = True, overwriting the error state.🔎 Proposed fix
_, stderr = process.communicate() if process.returncode != 0: print(f"❌ Piper CLI error: {stderr.decode()}") success = False - success = True + else: + success = True except (subprocess.SubprocessError, IOError) as e: print(f"❌ Piper CLI error: {e}") success = False
🧹 Nitpick comments (15)
scripts/tests/audio_test.sh (1)
1-3: Clarify integration with config-driven refactor.Given the PR's focus on centralizing configuration and refactoring to config-driven initialization, it's unclear how this legacy test script integrates with the new configuration system. Consider whether this script should be updated to use config values (e.g., audio device, sample rate, duration) or replaced with a test that uses the new config infrastructure.
examples/STT/whisper/STT_offline_whisper_rpi5.md (2)
45-47: Consider using a relative or environment-aware path for better portability.The hardcoded absolute path
~/AI-Autonomous-Assistant/scripts/models/audioassumes users have cloned the repository to the home directory with that exact structure. This may cause issues for users with different directory structures or deployments.Consider one of these alternatives:
- Use a relative path from the repository root with clear instructions on where to run the command
- Document the assumption about clone location explicitly
- Suggest using environment variables or shell aliases for flexibility
🔎 Suggested alternative approaches
Option 1: Relative path with clear context
Load models, vocabulary and encoder files to be able to use whisper offline with the following script: ```bash -cd ~/AI-Autonomous-Assistant/scripts/models/audio +cd scripts/models/audio python3 whisper_objects.py
+(Run from the repository root directory)
**Option 2: Environment variable approach** ```bash cd $REPO_ROOT/scripts/models/audio python3 whisper_objects.pywith guidance: "Set
REPO_ROOTto your cloned repository path before running."
43-48: Add brief explanation of whatwhisper_objects.pydoes.The script exists and is functional. However, the markdown should clarify that it downloads and caches Whisper models with hardware-aware optimization (e.g., it excludes larger models on Raspberry Pi 5 to manage resource constraints). This helps users understand the purpose before running it.
scripts/models/audio/piper_models.py (1)
5-8: Variable shadowing:configmodule is overwritten byConfiginstance.The
configmodule imported on line 5 is shadowed by theConfiginstance assigned on line 8. This works but can be confusing and prevents access to the original module after line 8.Proposed fix
-from utils import config +from utils.config import setup_python_path, load_config -config.setup_python_path() -config = config.load_config() +setup_python_path() +config = load_config()scripts/models/audio/fast_whisper_objects.py (2)
16-18: Import-time side effect fromdetect_raspberry_pi_model().This function is called at module import time and prints to stdout. If this module is imported elsewhere (e.g., for testing or as a library), it will unconditionally produce console output.
Consider moving this logic into the
run()function or using a lazy initialization pattern if the model list needs to be determined at runtime.
27-27: Non-standard cache directory location.Using
~/cache/models/huggingfaceinstead of the XDG-compliant~/.cache/models/huggingface. This diverges from the Hugging Face default (~/.cache/huggingface) and other standard cache locations.Consider using
~/.cache/models/huggingfacefor consistency with XDG Base Directory Specification, or leverage the centralized config system introduced in this PR.scripts/models/audio/vosk_models.py (1)
15-16: Side effect at module import: directory creation.
os.makedirs()is called at module import time. This can cause unexpected behavior when the module is imported for testing or inspection.Move directory creation into the
run()function:cache_dir = os.path.join(os.path.expanduser("~"), "cache/models/vosk") -os.makedirs(cache_dir, exist_ok=True)def run(): + os.makedirs(cache_dir, exist_ok=True) for model_name, url in MODELS.items():scripts/models/audio/load_huggingface_objects.py (1)
31-32: Consider adding error handling for network failures.
snapshot_downloadcan fail due to network issues, authentication problems, or rate limiting. Currently, any failure will crash the entire script without processing remaining models.+ from huggingface_hub.utils import RepositoryNotFoundError, GatedRepoError for model_name in model_names: if model_exists(model_name, cache_dir): print(f"Model {model_name} already exists.") continue print(f"Downloading and saving {model_name} to {cache_dir}") - - snapshot_download(repo_id=model_name, repo_type="model", cache_dir=cache_dir) - print(f"Model saved to: {os.path.join(cache_dir, model_name)}") + try: + snapshot_download(repo_id=model_name, repo_type="model", cache_dir=cache_dir) + print(f"Model saved to: {os.path.join(cache_dir, model_name)}") + except Exception as e: + print(f"Failed to download {model_name}: {e}")scripts/models/audio/whisper_objects.py (2)
115-125: Overly broadRuntimeErrorexception handling.Catching
RuntimeErrorbroadly can mask unrelated errors. The fallback to manual download will execute even for errors like CUDA initialization failures or memory issues.Consider catching a more specific exception or logging the error before falling back:
- except RuntimeError: + except RuntimeError as e: + print(f"Whisper library download failed: {e}. Falling back to manual download...") for model_name, model_url in MODELS.items():
25-27: Import-time side effect fromdetect_raspberry_pi_model().Same concern as in
fast_whisper_objects.py—this function prints to stdout at module import time.src/utils/config.py (3)
57-82: Consider composition over inheritance for config classes.
STTConfig,TTSConfig, andVADConfiginherit fromPathConfig, but they don't represent "a kind of path config" — they use paths. This inheritance pattern duplicates path fields in each sub-config and could lead to inconsistencies if different values are passed.Consider injecting a shared
PathConfiginstance or using composition:class STTConfig(BaseModel): engine: str = "whisper" # ... other STT fields # Access paths via Config.paths insteadThis is a design consideration for future maintainability.
Also applies to: 85-106, 109-117
120-129: Consider makingConfiga Pydantic model for consistency.The top-level
Configclass is a plain Python class while sub-configs are Pydantic models. This loses validation benefits at the root level. Consider using Pydantic's nested model pattern:class Config(BaseModel): paths: PathConfig = PathConfig() stt: STTConfig = STTConfig() tts: TTSConfig = TTSConfig() vad: VADConfig = VADConfig()This is optional but would provide consistent validation throughout.
179-180: Module-level config loading causes side effects on import.Loading config at module level means any import triggers file I/O, YAML parsing, and potentially prints warnings. This can complicate testing and cause unexpected behavior when the module is imported for introspection.
Consider a lazy-loading pattern or explicit initialization if testability is a concern.
src/audio/voice_agent_offline.py (2)
153-156: Redundantint()casts on already-integer config values.
VADConfigdefines these fields asintwith Pydantic validation, so the explicitint()casts are unnecessary.🔎 Proposed fix
- min_speech_duration_ms: int = int(config.vad.min_speech_duration_ms) - min_silence_duration_ms: int = int(config.vad.min_silence_duration_ms) - silence_timeout_seconds: int = int(config.vad.silence_timeout_seconds) - max_recording_seconds: int = int(config.vad.max_recording_seconds) + min_speech_duration_ms: int = config.vad.min_speech_duration_ms + min_silence_duration_ms: int = config.vad.min_silence_duration_ms + silence_timeout_seconds: int = config.vad.silence_timeout_seconds + max_recording_seconds: int = config.vad.max_recording_seconds
310-322: VADEngine mutates STT config fields — unexpected side effect.VADEngine modifies
config.stt_model_sizeandconfig.faster_stt_model_sizeduring initialization. This couples VAD initialization with STT configuration and has side effects on the shared config object.Consider moving this logic to
STTEnginewhere it's more contextually appropriate, or computing the model size suffix without mutating the config.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (31)
.gitignoreconfig.yamldata/models/piper/en_GB-alan-low.onnxdata/models/piper/en_GB-alan-low.onnx.jsondata/models/piper/en_GB-alba-medium.onnxdata/models/piper/en_GB-alba-medium.onnx.jsondata/models/piper/fr_FR-gilles-low.onnxdata/models/piper/fr_FR-gilles-low.onnx.jsondata/models/piper/fr_FR-mls-medium.onnxdata/models/piper/fr_FR-mls-medium.onnx.jsondata/models/piper/jarvis-medium.onnxdata/models/piper/jarvis-medium.onnx.jsondocs/TTS_offline.mdexamples/STT/whisper/STT_offline_whisper_rpi5.mdexamples/STT/whisper/load_huggingface_objects.pyexamples/STT/whisper/whisper_objects.pypyproject.tomlrequirements.txtscripts/models/audio/fast_whisper_objects.pyscripts/models/audio/load_huggingface_objects.pyscripts/models/audio/models_check.pyscripts/models/audio/piper_models.pyscripts/models/audio/vosk_models.pyscripts/models/audio/vosk_models.shscripts/models/audio/whisper_objects.pyscripts/models/audio/whisper_objects.shscripts/tests/audio_test.shsrc/audio/fast_whisper_objects.pysrc/audio/voice_agent_offline.pysrc/main.pysrc/utils/config.py
💤 Files with no reviewable changes (5)
- scripts/models/audio/whisper_objects.sh
- src/audio/fast_whisper_objects.py
- scripts/models/audio/vosk_models.sh
- examples/STT/whisper/whisper_objects.py
- examples/STT/whisper/load_huggingface_objects.py
🧰 Additional context used
🧬 Code graph analysis (6)
scripts/models/audio/whisper_objects.py (2)
scripts/models/audio/models_check.py (1)
model_exists(4-19)src/utils/sysutils.py (1)
detect_raspberry_pi_model(38-50)
scripts/models/audio/fast_whisper_objects.py (3)
scripts/models/audio/models_check.py (1)
model_exists(4-19)src/utils/sysutils.py (1)
detect_raspberry_pi_model(38-50)scripts/models/audio/load_huggingface_objects.py (1)
run(21-45)
scripts/models/audio/load_huggingface_objects.py (4)
scripts/models/audio/models_check.py (1)
model_exists(4-19)scripts/models/audio/fast_whisper_objects.py (1)
run(30-41)scripts/models/audio/piper_models.py (1)
run(13-29)scripts/models/audio/whisper_objects.py (1)
run(112-130)
scripts/models/audio/piper_models.py (1)
src/utils/config.py (4)
setup_python_path(184-195)load_config(132-176)cache_path(43-47)data_path(36-40)
src/audio/voice_agent_offline.py (2)
src/utils/sysutils.py (2)
detect_raspberry_pi_model(38-50)limit_cpu_for_multiprocessing(28-35)src/utils/config.py (7)
setup_python_path(184-195)load_config(132-176)data_path(36-40)cache_path(43-47)models_path(50-54)full_model_path(96-106)download_path(70-82)
scripts/models/audio/vosk_models.py (3)
scripts/models/audio/models_check.py (1)
model_exists(4-19)scripts/models/audio/fast_whisper_objects.py (1)
run(30-41)scripts/models/audio/load_huggingface_objects.py (1)
run(21-45)
🔇 Additional comments (19)
data/models/piper/fr_FR-mls-medium.onnx.json (1)
634-634: LGTM: Trailing newline added for consistency.This formatting change aligns with standard conventions for JSON files.
data/models/piper/en_GB-alan-low.onnx.json (1)
420-420: LGTM: Trailing newline added for consistency.data/models/piper/fr_FR-gilles-low.onnx.json (1)
420-420: LGTM: Trailing newline added for consistency.data/models/piper/en_GB-alba-medium.onnx.json (1)
493-493: LGTM: Trailing newline added for consistency..gitignore (3)
70-70: LGTM: Documentation build directory appropriately ignored.
73-73: LGTM: Environment file appropriately ignored.Adding
.envto gitignore is a best practice to prevent accidental exposure of sensitive environment variables.
70-73: The review comment is based on incorrect information. The.gitignorefile still contains themodels/*.onnxpattern at line 43—it was not removed. However, there is a real underlying issue: the.onnxfiles in the repository are located indata/models/piper/(61–74 MB each), which are not matched by the currentmodels/*.onnxpattern. The pattern would only ignore.onnxfiles directly in a root-levelmodels/directory, which doesn't exist. Consider updating the pattern todata/models/**/*.onnxor verifying that large binary model files should be tracked.Likely an incorrect or invalid review comment.
docs/TTS_offline.md (1)
70-70: LGTM: Documentation enhanced with model repository link.Adding the direct link to the Piper voices repository on Hugging Face improves usability by making it easier for users to locate and download the required voice models.
pyproject.toml (2)
117-117: LGTM: Cleanup of commented platform line.Removing the commented x86_64 line simplifies the configuration.
12-12: LICENSE.txt exists and configuration is valid.LICENSE.txt is present in the repository root (1.1K), confirming the pyproject.toml declaration is correct.
requirements.txt (1)
1-183: LGTM: Auto-generated dependency comments updated.The changes to this file are purely cosmetic, updating the formatting of dependency provenance comments. All version pins and functional dependency declarations remain unchanged. This appears to be the result of regenerating the file with
uv pip compile.src/main.py (1)
1-6: LGTM!Simple entry-point placeholder. Consider adding a return type hint
-> Nonefor consistency with otherrun()functions in this PR.config.yaml (1)
23-23: Same template interpolation concern formodel_path.The value
"cache/models/{engine}/{model_name}"won't be automatically expanded. Ensure the consuming code performs string interpolation.src/utils/config.py (2)
18-55: LGTM!The
PathConfigclass is well-structured with clear property definitions for path resolution.
183-195: LGTM!The
setup_python_pathhelper correctly adds necessary paths for import resolution.src/audio/voice_agent_offline.py (4)
70-76: LGTM!Path resolution from config is correctly implemented. The conversion to strings maintains compatibility with existing code using
os.pathfunctions.
364-406: LGTM!
STTEngineinitialization is well-structured with clear backend selection and informative logging.
961-973: Hardcoded values inmain()override centralized config.The
AgentConfiginmain()uses hardcoded values (e.g.,tts_model_path,wake_word="thanos") that override the centralized config loaded fromconfig.yaml. This defeats the purpose of the config refactor.Consider using the config values directly or documenting this as an intentional override for testing:
vadconfig = AgentConfig() # Uses defaults from config
993-995: LGTM!The cleanup pattern using
if "agent" in locals()correctly handles cases where the agent wasn't successfully created.
| @@ -0,0 +1,33 @@ | |||
| # !/usr/bin/env python3 | |||
There was a problem hiding this comment.
Fix the shebang syntax.
There's an extra space after # in the shebang line that makes it invalid.
Proposed fix
-# !/usr/bin/env python3
+#!/usr/bin/env python3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # !/usr/bin/env python3 | |
| #!/usr/bin/env python3 |
🤖 Prompt for AI Agents
In scripts/models/audio/piper_models.py around line 1, the shebang line contains
an extra space after the leading '#' ("# !/usr/bin/env python3") which makes it
invalid; replace it with a proper shebang by removing the space so the first
line reads "#!/usr/bin/env python3" (ensure it's the very first line with no
preceding blank lines or characters and has executable file permissions if
needed).
Removed the extra space before the shebang in src/utils/config.py to ensure correct script execution.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/utils/config.py (2)
158-162: Security check still fails open — unresolved from previous review.The exception handler continues to pass silently when
resolve()raisesOSErrororRuntimeError, bypassing the security validation and allowing execution to proceed toopen()without confirming the path is safe. This fail-open pattern remains a security risk.The code should fail-closed: if the path cannot be safely resolved, refuse to load it.
🔎 Proposed fix
except (OSError, RuntimeError): - # If path cannot be resolved, but we are trying to open it, that's a risk. - # However, if it doesn't exist, the .exists() check below handles the UI. - # We only block if we CAN resolve it and it's unsafe. - pass + # Fail-closed: if we cannot resolve the path safely, refuse to load + print(f"Warning: Cannot safely resolve config path {config_path}. Using defaults.") + return Config()
169-171: Incorrect security suppression comments remain from previous review.The suppression codes are still incorrect for this
open()call:
# nosec B301applies to pickle deserialization, not file open operations# noqa: S101suppresses assert statement warnings, also not applicable hereYour path validation logic is sound (as noted in learnings), but these specific suppression codes are copy-paste errors and should be removed. If no linter is actually complaining about this line, no suppression is needed.
🔎 Proposed fix
# Safe to open: path has been validated above to be within allowed directories - # nosec B301 - Path validation prevents file inclusion attacks - with open(config_path, "r", encoding="utf-8") as f: # noqa: S101 + with open(config_path, "r", encoding="utf-8") as f: config_dict = yaml.safe_load(f)Based on learnings, your path validation approach is correct.
🧹 Nitpick comments (1)
src/utils/config.py (1)
125-129: Consider adding type hints to Config attributes.The
Configclass attributes (paths,stt,tts,vad) lack type hints, which reduces IDE support and type checking effectiveness.🔎 Suggested enhancement
class Config: """ Configuration for the voice agent. """ def __init__(self, **data): - self.paths = PathConfig(**data.get("paths", {})) - self.stt = STTConfig(**data.get("stt", {})) - self.tts = TTSConfig(**data.get("tts", {})) - self.vad = VADConfig(**data.get("vad", {})) + self.paths: PathConfig = PathConfig(**data.get("paths", {})) + self.stt: STTConfig = STTConfig(**data.get("stt", {})) + self.tts: TTSConfig = TTSConfig(**data.get("tts", {})) + self.vad: VADConfig = VADConfig(**data.get("vad", {}))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/tests/audio_test.shsrc/utils/config.py
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/tests/audio_test.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-24T10:50:46.479Z
Learnt from: chcavignx
Repo: chcavignx/AI-Autonomous-Assistant PR: 1
File: src/utils/config.py:169-171
Timestamp: 2025-12-24T10:50:46.479Z
Learning: In src/utils/config.py, the user has implemented path validation to ensure config files are only loaded from within ROOT_DIR or USER_DIR before calling open(). This is acceptable security practice for configuration loading.
Applied to files:
src/utils/config.py
🧬 Code graph analysis (1)
src/utils/config.py (1)
src/audio/test_voice_agent.py (1)
open(195-196)
🔇 Additional comments (2)
src/utils/config.py (2)
1-1: Shebang is now correct.The shebang syntax issue from the previous review has been successfully addressed.
179-201: Configuration initialization and path setup look good.The global config instance,
setup_python_pathhelper, and diagnostic main block are well-implemented. The path manipulation correctly prioritizes the configured paths for module imports.
|
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.