-
Notifications
You must be signed in to change notification settings - Fork 22
Fix segments #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Fix segments #260
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis pull request introduces a centralized hierarchical configuration system that merges defaults.yml, config.yml, and environment variables with caching and interpolation support. Docker Compose volumes are updated to mount the entire config directory. Multiple backend modules are refactored to use the centralized config instead of direct file access. Tests are enhanced to validate configuration override behavior and segment structure validation. Changes
Sequence Diagram(s)sequenceDiagram
participant Service as Service/Module
participant CentralConfig as config.get_config()
participant Cache as _CONFIG_CACHE
participant Defaults as defaults.yml
participant UserConfig as config.yml
participant EnvVars as Environment
Service->>CentralConfig: get_config(force_reload=False)
alt Cache exists and force_reload=False
CentralConfig->>Cache: retrieve cached config
Cache-->>CentralConfig: return cached dict
else Cache miss or force_reload=True
CentralConfig->>Defaults: load YAML
Defaults-->>CentralConfig: base config dict
CentralConfig->>UserConfig: load YAML
UserConfig-->>CentralConfig: user overrides dict
CentralConfig->>CentralConfig: _deep_merge(defaults, user_config)
CentralConfig->>EnvVars: resolve env vars in merged config
EnvVars-->>CentralConfig: interpolated values
CentralConfig->>Cache: store merged config
end
CentralConfig-->>Service: return config dict
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Do tests pass locally / did you try? @thestumonkey |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
tests/integration/websocket_streaming_tests.robot (1)
210-216: Don’t silently disable segment timestamp verification—gate it or document why it’s flaky.Commenting out the assertion reduces coverage and can let regressions slip (especially since this PR is “Fix segments”). Prefer a conditional check (e.g., only for a deterministic provider) or re-enable with a wider tolerance + an explicit TODO linking to the flakiness root cause.
backends/advanced/docker-compose.yml (1)
10-16: Align config layout across services (backend uses/app/config, workers uses/app/config.yml+/app/defaults.yml).Given the PR’s centralized config system, I’d strongly prefer both services expose the same in-container structure (e.g.,
/app/config/config.yml,/app/config/defaults.yml, templates). Otherwise you risk “works in backend, breaks in workers” issues and confusing operator UX.Proposed compose alignment (directory mount for workers too)
services: chronicle-backend: @@ volumes: @@ - ./data:/app/data - ../../config:/app/config # Mount entire config directory (contains config.yml, defaults.yml, templates) @@ workers: @@ volumes: - ./src:/app/src - ./start-workers.sh:/app/start-workers.sh - ./data/audio_chunks:/app/audio_chunks - ./data:/app/data - - ../../config/config.yml:/app/config.yml # Removed :ro for consistency - - ../../config/defaults.yml:/app/defaults.yml:ro # Built-in defaults + - ../../config:/app/config # Match backend config layoutAlso applies to: 58-65
tests/endpoints/conversation_tests.robot (1)
47-67: Bug:Should Not Be Emptyis checking the whole conversation dict, not thetitle/summary/...fields.These assertions currently won’t fail if
title(etc.) is empty, because${conversation}itself is non-empty.Fix field-level assertions + make segments-key failure explicit
# Verify conversation structure Dictionary Should Contain Key ${conversation} conversation_id Dictionary Should Contain Key ${conversation} audio_uuid Dictionary Should Contain Key ${conversation} created_at - Should Not Be Empty ${conversation} title - Should Not Be Empty ${conversation} summary - Should Not Be Empty ${conversation} detailed_summary - Should Not Be Empty ${conversation} transcript + Dictionary Should Contain Key ${conversation} title + Dictionary Should Contain Key ${conversation} summary + Dictionary Should Contain Key ${conversation} detailed_summary + Dictionary Should Contain Key ${conversation} transcript + Should Not Be Empty ${conversation}[title] + Should Not Be Empty ${conversation}[summary] + Should Not Be Empty ${conversation}[detailed_summary] + Should Not Be Empty ${conversation}[transcript] - ${segments}= Set Variable ${conversation}[segments] + Dictionary Should Contain Key ${conversation} segments + ${segments}= Set Variable ${conversation}[segments]backends/advanced/docker-compose-test.yml (1)
153-164: Config mount paths diverge between backend-test and workers-test, causing inconsistent config loading behavior.
chronicle-backend-testmounts a single file at/app/config.yml, whileworkers-testmounts the entire config directory at/app/config. The config loader expects/app/config/config.ymland also loadsdefaults.ymlfrom the same directory as the resolved config. This results in:
- workers-test: correctly finds
/app/config/config.ymland/app/config/defaults.yml- chronicle-backend-test: cannot find
/app/config/config.yml(mounted as/app/config.yml), falls back to cwd, and cannot locatedefaults.ymlat the expected/app/config/defaults.ymlAlign both services to mount the same config directory structure, or update the mounts to ensure both use consistent paths.
backends/advanced/src/advanced_omi_backend/model_registry.py (2)
179-233: Handle non-dict model entries; current code can crash onTypeError.
ModelDef(**m)will raiseTypeErrorifmisn’t a mapping (e.g., a string/number from a malformed YAML list). You currently only catchValidationError, so a single bad entry can still take down the whole registry load.Proposed fix
models: Dict[str, ModelDef] = {} for m in model_list: try: # Pydantic will handle validation automatically - model_def = ModelDef(**m) + if not isinstance(m, dict): + raise TypeError(f"Model entry must be a mapping, got {type(m).__name__}") + model_def = ModelDef(**m) models[model_def.name] = model_def - except ValidationError as e: + except (ValidationError, TypeError) as e: # Log but don't fail the entire registry load logging.warning(f"Failed to load model '{m.get('name', 'unknown')}': {e}") continue
179-196: Docstring no longer matches behavior (no YAML parsing here; no longer “raises ValidationError” in practice).This function now delegates YAML loading/env expansion to
get_config()and intentionally doesn’t fail the whole load onValidationError(it logs and skips). Updating the docstring will prevent future callers/tests from relying on stale semantics.backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)
287-334: Make config.yml writes atomic (prevent partial/corrupt YAML on crash).Both backup + direct write can still leave a truncated/invalid
config.ymlif the process is killed mid-write. Use a temp file +os.replace()(atomic on POSIX) and considerPathAPIs for consistency.Proposed fix
- # Write to config.yml - with open(cfg_path, 'w') as f: - yaml.safe_dump(data, f, sort_keys=False) + # Atomic write to config.yml + cfg_path = get_config_path() + tmp_path = str(cfg_path) + ".tmp" + with open(tmp_path, "w", encoding="utf-8") as f: + yaml.safe_dump(data, f, sort_keys=False) + os.replace(tmp_path, cfg_path)
🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/config.py:
- Around line 269-304: _find_config_path currently ignores the CONFIG_FILE env
var when the file doesn't yet exist, causing writes to fall back to
/app/config/config.yml; change the initial ENV override in _find_config_path to
return Path(cfg_env) whenever CONFIG_FILE is set (i.e., if cfg_env: return
Path(cfg_env)) so callers like get_config_path will honor a user-provided path
even on first-run, leaving the rest of the lookup logic unchanged.
In @config/defaults.yml:
- Line 51: The Authorization header in config/defaults.yml currently uses a
fallback empty value ("Authorization: Token ${DEEPGRAM_API_KEY:-}") which yields
a malformed header when DEEPGRAM_API_KEY is unset; update the config to require
the variable (e.g., "Authorization: Token ${DEEPGRAM_API_KEY}") and/or add
runtime validation where Deepgram is used (search for references to
DEEPGRAM_API_KEY and the Deepgram client initialization) to detect a missing key
and fail early with a clear error message before making API calls.
- Around line 87-89: The prompt value under the prompt key in defaults.yml is
written as a quoted multi-line string with the closing quote on its own line,
which introduces unintended trailing newlines; fix it by collapsing the prompt
into a single-line quoted string or convert it to a YAML block scalar (e.g., |
or >) so the closing delimiter is correct and no extra newlines or whitespace
are included—update the prompt entry so its content ends exactly where intended
without the stray closing-quote-only line.
In @tests/endpoints/system_admin_tests.robot:
- Line 194: Test assertion expects "specialized AI assistant" but the actual
default prompt text differs; update the Robot test to match the real default
prompt. Edit the assertion in tests/endpoints/system_admin_tests.robot (the
Should Contain check) to look for the actual default prompt beginning "You are a
helpful AI assistant with access to the user's personal memories and
conversation history." (matching the default prompt defined in SystemController
/ system_controller.py) so the test checks the real prompt text instead of
"specialized AI assistant".
🧹 Nitpick comments (6)
config/config.yml.template (1)
13-21: Consider adding inline comments to defaults section for consistency.While the concrete default values work well for a template, adding inline comments showing that these reference models from defaults.yml helps users understand the configuration hierarchy at a glance. The current comments at lines 14-16 already do this well - consider extending the pattern to all entries.
💡 Optional enhancement
defaults: llm: openai-llm # LLM for memory extraction (from defaults.yml) embedding: openai-embed # Embedding model (from defaults.yml) stt: stt-deepgram # Speech-to-text service (from defaults.yml) # Transcription provider options: # - stt-deepgram: Cloud-based Deepgram (requires DEEPGRAM_API_KEY) # - stt-parakeet-batch: Local Parakeet ASR (requires Parakeet service running) - tts: tts-http # Text-to-speech service + tts: tts-http # Text-to-speech service (define custom in models section) vector_store: vs-qdrant # Vector database (from defaults.yml)backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)
254-285: Avoid broadexcept Exceptionhere (or ensure fallback can’t throw).If
get_config()fails, the fallback path still callsget_config_path()andyaml.safe_dump(...). If either of those ever throws, you lose the intended “always return something” behavior. Consider narrowing exceptions and/or wrapping the fallback body too.
506-548: Useyaml.safe_dumpand atomic write for chat config; align backup naming.
yaml.dump(...)can emit unsafe tags;safe_dumpis the safer default. This path also directly writes toconfig.yml(same atomicity concern as memory). Also, backups use.backuphere but.bakin memory—worth standardizing for ops/scripts.backends/advanced/src/advanced_omi_backend/config.py (2)
246-267: Tighten_deep_mergetyping; currentValueErrorhandling won’t catch the common failure modes.If
overrideisn’t a dict, you’ll getAttributeErroronoverride.items(), notValueError. Consider explicitisinstance(..., dict)checks and fail fast with a clear message (or coerce to{}if you want permissive behavior).
306-370: Return a copy (or freeze) to prevent accidental mutation of the cached config.
get_config()returns the cached dict by reference. Any caller that mutates it will mutate the global config for everyone (and potentially across requests). If you want the cache to be authoritative, return a deep copy (or provide a read-only wrapper) and keep the cached object private.backends/advanced/src/advanced_omi_backend/model_registry.py (1)
70-85: Guard embedding-dimension auto-defaulting by model provider to avoid silent dimension mismatches.The
validate_modelvalidator applies hardcoded embedding dimensions based only onmodel_name(e.g.,text-embedding-3-small→ 1536,nomic-embed-text-v1.5→ 768) without checkingself.model_providerorself.api_family. If a non-OpenAI provider or custom embedding server uses the same model name, this could silently set incorrect dimensions. Consider adding a provider check before applying defaults:if self.model_name in defaults and self.model_provider in ('openai', 'unknown'): self.embedding_dimensions = defaults[self.model_name]Or maintain a provider-specific defaults mapping instead. The current config template explicitly sets
embedding_dimensionsin all examples, but this validator serves as a safety net and should be provider-aware to prevent silent misconfigurations.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (14)
backends/advanced/docker-compose-test.ymlbackends/advanced/docker-compose.ymlbackends/advanced/src/advanced_omi_backend/config.pybackends/advanced/src/advanced_omi_backend/controllers/system_controller.pybackends/advanced/src/advanced_omi_backend/llm_client.pybackends/advanced/src/advanced_omi_backend/model_registry.pybackends/advanced/src/advanced_omi_backend/services/memory/config.pybackends/advanced/src/advanced_omi_backend/services/transcription/__init__.pyconfig/config.yml.templateconfig/defaults.ymltests/endpoints/conversation_tests.robottests/endpoints/system_admin_tests.robottests/integration/integration_test.robottests/integration/websocket_streaming_tests.robot
💤 Files with no reviewable changes (1)
- backends/advanced/src/advanced_omi_backend/llm_client.py
🧰 Additional context used
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
backends/advanced/src/advanced_omi_backend/config.py (1)
get_config(306-370)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4)
backends/advanced/src/advanced_omi_backend/config.py (2)
get_config(306-370)get_config_path(402-408)backends/advanced/src/advanced_omi_backend/model_registry.py (1)
load_models_config(179-232)backends/advanced/src/advanced_omi_backend/services/memory/service_factory.py (1)
get_memory_service(73-110)backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)
SpeakerRecognitionClient(25-649)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
275-275: Do not catch blind exception: Exception
(BLE001)
499-499: Consider moving this statement to an else block
(TRY300)
501-501: Do not catch blind exception: Exception
(BLE001)
backends/advanced/src/advanced_omi_backend/config.py
264-264: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
295-296: try-except-pass detected, consider logging the exception
(S110)
295-295: Do not catch blind exception: Exception
(BLE001)
343-343: Do not catch blind exception: Exception
(BLE001)
344-344: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
359-359: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (4)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
66-72: LGTM! Clean migration to centralized configuration.The refactoring to use
get_config()aligns well with the centralized configuration system introduced in this PR. The function now returns the full merged configuration (defaults.yml + config.yml + env vars) as documented.tests/endpoints/system_admin_tests.robot (1)
152-181: Excellent test coverage for configuration override behavior.This test effectively validates that config.yml values correctly override defaults.yml values, which is a critical aspect of the hierarchical configuration system introduced in this PR.
tests/integration/integration_test.robot (1)
198-212: Excellent segment structure validation!The explicit validation of segment structure (required keys, non-empty text, timing consistency) provides robust test coverage for the segment functionality fixes introduced in this PR. This prevents regressions and ensures segments are properly formatted.
config/config.yml.template (1)
1-195: Well-structured template with clear guidance.The template effectively balances being a working example (with concrete defaults section) and providing comprehensive commented examples for custom configurations. The header comments clearly explain the relationship with defaults.yml and how to customize.
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
Show resolved
Hide resolved
Of course I ran the tests, and they pass |
fixed segments not working and improved config so there are defaults that are activated by default
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.