Fix segments #260

thestumonkey · 2026-01-11T18:02:54Z

fixed segments not working and improved config so there are defaults that are activated by default

Summary by CodeRabbit

Release Notes

New Features
- Configuration now supports environment variable interpolation and hierarchical defaults merging (user config overrides built-in defaults).
- Automatic backups created when updating settings via API.
- Configuration reload capability allows changes to take effect without server restart.
Bug Fixes
- Improved error handling with sensible fallback values when configuration is unavailable.
- Enhanced transcription data extraction robustness.
Tests
- Added comprehensive validation for conversation transcript segments.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-11T18:03:01Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request introduces a centralized hierarchical configuration system that merges defaults.yml, config.yml, and environment variables with caching and interpolation support. Docker Compose volumes are updated to mount the entire config directory. Multiple backend modules are refactored to use the centralized config instead of direct file access. Tests are enhanced to validate configuration override behavior and segment structure validation.

Changes

Cohort / File(s)	Summary
Docker Configuration Updates `backends/advanced/docker-compose-test.yml`, `backends/advanced/docker-compose.yml`	Updated volume mounts to expose entire config directory at /app/config instead of single config.yml file; added read-only mount for defaults.yml in workers service.
Centralized Config System `backends/advanced/src/advanced_omi_backend/config.py`	Introduced hierarchical configuration loading with defaults merging, environment variable interpolation (_resolve_env, _deep_resolve_env), deep-merge capability (_deep_merge), cached config mechanism (_CONFIG_CACHE), and public API (get_config, reload_config, get_config_section, get_config_path). Added DEFAULT_SPEECH_DETECTION_SETTINGS, DEFAULT_CONVERSATION_STOP_SETTINGS, DEFAULT_AUDIO_STORAGE_SETTINGS constants.
Config File Defaults `config/defaults.yml`	New file defining fallback configuration for model registry (OpenAI LLM, OpenAI Embeddings, Deepgram STT, Qdrant vector store), memory settings (Chronicle provider), and speaker recognition (disabled by default). Establishes priority: config.yml > environment variables > defaults.yml.
Config Template `config/config.yml.template`	Replaced active model definitions with commented examples and placeholders; added inline guidance for memory providers and speaker recognition; disabled speaker recognition by default.
System Controller Refactoring `backends/advanced/src/advanced_omi_backend/controllers/system_controller.py`	Refactored configuration access to use merged in-memory config instead of direct filesystem reads. Updated get_memory_config_raw, update_memory_config_raw, get_chat_config_yaml, and save_chat_config_yaml to leverage centralized config with backup creation and cache reloading. Added fallback paths and resilience improvements.
Model Registry Refactoring `backends/advanced/src/advanced_omi_backend/model_registry.py`	Removed local environment resolution pipeline and replaced direct YAML parsing with centralized get_config(). Simplified load_models_config and get_models_registry to return non-Optional AppModels; added force_reload parameter. Removed file-system lookup logic.
Memory Config Refactoring `backends/advanced/src/advanced_omi_backend/services/memory/config.py`	Replaced local YAML loading with centralized get_config() call. Updated load_config_yml to return merged configuration from central config system.
Transcription Enhancements `backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py`	Added safer data extraction with fallback for missing channels; introduced utterance validation and logging; added segment normalization to map "transcript" to "text" field for format consistency.
Import Cleanup `backends/advanced/src/advanced_omi_backend/llm_client.py`	Removed unused imports: load_config_yml and resolve_value.
Test: Configuration Override `tests/endpoints/system_admin_tests.robot`	Added Config Override Defaults Test to verify custom memory config (timeout_seconds: 999) persists and overrides defaults (1200). Updated Get Chat Configuration Test expected prompt text.
Test: Conversation Segments `tests/endpoints/conversation_tests.robot`	Enhanced Get Conversation By ID Test with validation of conversation title, summary, detailed_summary, transcript presence; added segment extraction and per-segment validation (start, end, text, speaker keys; non-empty text; end > start).
Test: Integration Segments `tests/integration/integration_test.robot`	Replaced heuristic segment counting (based on transcript type) with explicit segment structure validation. Added comprehensive per-segment checks including timing consistency (end > start) and mandatory fields validation.
Test: WebSocket Streaming `tests/integration/websocket_streaming_tests.robot`	Commented out Verify Segments Match Expected Timestamps verification step in Segment Timestamps Match Cropped Audio test.

Sequence Diagram(s)

sequenceDiagram
    participant Service as Service/Module
    participant CentralConfig as config.get_config()
    participant Cache as _CONFIG_CACHE
    participant Defaults as defaults.yml
    participant UserConfig as config.yml
    participant EnvVars as Environment
    
    Service->>CentralConfig: get_config(force_reload=False)
    alt Cache exists and force_reload=False
        CentralConfig->>Cache: retrieve cached config
        Cache-->>CentralConfig: return cached dict
    else Cache miss or force_reload=True
        CentralConfig->>Defaults: load YAML
        Defaults-->>CentralConfig: base config dict
        CentralConfig->>UserConfig: load YAML
        UserConfig-->>CentralConfig: user overrides dict
        CentralConfig->>CentralConfig: _deep_merge(defaults, user_config)
        CentralConfig->>EnvVars: resolve env vars in merged config
        EnvVars-->>CentralConfig: interpolated values
        CentralConfig->>Cache: store merged config
    end
    CentralConfig-->>Service: return config dict

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Fix segments' is too vague and generic. While segments are indeed addressed in the changeset, the title does not convey the full scope of changes, which includes a major configuration system refactor with defaults merging, environment variable resolution, and Docker compose updates alongside segment fixes.	Consider a more descriptive title that captures both the segment fix and the configuration system improvements, such as 'Refactor config system with defaults merging and fix segment processing' or 'Implement centralized config with defaults and fix segment handling'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AnkushMalaker · 2026-01-11T19:00:30Z

Do tests pass locally / did you try? @thestumonkey

AnkushMalaker · 2026-01-11T19:06:30Z

@coderabbitai review

coderabbitai · 2026-01-11T19:06:36Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

tests/integration/websocket_streaming_tests.robot (1)

210-216: Don’t silently disable segment timestamp verification—gate it or document why it’s flaky.

Commenting out the assertion reduces coverage and can let regressions slip (especially since this PR is “Fix segments”). Prefer a conditional check (e.g., only for a deterministic provider) or re-enable with a wider tolerance + an explicit TODO linking to the flakiness root cause.
backends/advanced/docker-compose.yml (1)
10-16: Align config layout across services (backend uses /app/config, workers uses /app/config.yml + /app/defaults.yml).

Given the PR’s centralized config system, I’d strongly prefer both services expose the same in-container structure (e.g., /app/config/config.yml, /app/config/defaults.yml, templates). Otherwise you risk “works in backend, breaks in workers” issues and confusing operator UX.
Proposed compose alignment (directory mount for workers too)
 services:
   chronicle-backend:
@@
     volumes:
@@
       - ./data:/app/data
       - ../../config:/app/config  # Mount entire config directory (contains config.yml, defaults.yml, templates)
@@
   workers:
@@
     volumes:
       - ./src:/app/src
       - ./start-workers.sh:/app/start-workers.sh
       - ./data/audio_chunks:/app/audio_chunks
       - ./data:/app/data
-      - ../../config/config.yml:/app/config.yml  # Removed :ro for consistency
-      - ../../config/defaults.yml:/app/defaults.yml:ro  # Built-in defaults
+      - ../../config:/app/config  # Match backend config layout
Also applies to: 58-65
tests/endpoints/conversation_tests.robot (1)
47-67: Bug: Should Not Be Empty is checking the whole conversation dict, not the title/summary/... fields.

These assertions currently won’t fail if title (etc.) is empty, because ${conversation} itself is non-empty.
Fix field-level assertions + make segments-key failure explicit
     # Verify conversation structure
     Dictionary Should Contain Key    ${conversation}    conversation_id
     Dictionary Should Contain Key    ${conversation}    audio_uuid
     Dictionary Should Contain Key    ${conversation}    created_at
-    Should Not Be Empty    ${conversation}    title
-    Should Not Be Empty    ${conversation}    summary
-    Should Not Be Empty    ${conversation}    detailed_summary
-    Should Not Be Empty    ${conversation}    transcript
+    Dictionary Should Contain Key    ${conversation}    title
+    Dictionary Should Contain Key    ${conversation}    summary
+    Dictionary Should Contain Key    ${conversation}    detailed_summary
+    Dictionary Should Contain Key    ${conversation}    transcript
+    Should Not Be Empty    ${conversation}[title]
+    Should Not Be Empty    ${conversation}[summary]
+    Should Not Be Empty    ${conversation}[detailed_summary]
+    Should Not Be Empty    ${conversation}[transcript]
     
-    ${segments}=    Set Variable    ${conversation}[segments]
+    Dictionary Should Contain Key    ${conversation}    segments
+    ${segments}=    Set Variable    ${conversation}[segments]
backends/advanced/docker-compose-test.yml (1)

153-164: Config mount paths diverge between backend-test and workers-test, causing inconsistent config loading behavior.

chronicle-backend-test mounts a single file at /app/config.yml, while workers-test mounts the entire config directory at /app/config. The config loader expects /app/config/config.yml and also loads defaults.yml from the same directory as the resolved config. This results in:

workers-test: correctly finds /app/config/config.yml and /app/config/defaults.yml

chronicle-backend-test: cannot find /app/config/config.yml (mounted as /app/config.yml), falls back to cwd, and cannot locate defaults.yml at the expected /app/config/defaults.yml

Align both services to mount the same config directory structure, or update the mounts to ensure both use consistent paths.
backends/advanced/src/advanced_omi_backend/model_registry.py (2)
179-233: Handle non-dict model entries; current code can crash on TypeError.

ModelDef(**m) will raise TypeError if m isn’t a mapping (e.g., a string/number from a malformed YAML list). You currently only catch ValidationError, so a single bad entry can still take down the whole registry load.
Proposed fix
     models: Dict[str, ModelDef] = {}
     for m in model_list:
         try:
             # Pydantic will handle validation automatically
-            model_def = ModelDef(**m)
+            if not isinstance(m, dict):
+                raise TypeError(f"Model entry must be a mapping, got {type(m).__name__}")
+            model_def = ModelDef(**m)
             models[model_def.name] = model_def
-        except ValidationError as e:
+        except (ValidationError, TypeError) as e:
             # Log but don't fail the entire registry load
             logging.warning(f"Failed to load model '{m.get('name', 'unknown')}': {e}")
             continue
179-196: Docstring no longer matches behavior (no YAML parsing here; no longer “raises ValidationError” in practice).

This function now delegates YAML loading/env expansion to get_config() and intentionally doesn’t fail the whole load on ValidationError (it logs and skips). Updating the docstring will prevent future callers/tests from relying on stale semantics.
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)
287-334: Make config.yml writes atomic (prevent partial/corrupt YAML on crash).

Both backup + direct write can still leave a truncated/invalid config.yml if the process is killed mid-write. Use a temp file + os.replace() (atomic on POSIX) and consider Path APIs for consistency.
Proposed fix
-        # Write to config.yml
-        with open(cfg_path, 'w') as f:
-            yaml.safe_dump(data, f, sort_keys=False)
+        # Atomic write to config.yml
+        cfg_path = get_config_path()
+        tmp_path = str(cfg_path) + ".tmp"
+        with open(tmp_path, "w", encoding="utf-8") as f:
+            yaml.safe_dump(data, f, sort_keys=False)
+        os.replace(tmp_path, cfg_path)

🤖 Fix all issues with AI agents

In @backends/advanced/src/advanced_omi_backend/config.py:
- Around line 269-304: _find_config_path currently ignores the CONFIG_FILE env
var when the file doesn't yet exist, causing writes to fall back to
/app/config/config.yml; change the initial ENV override in _find_config_path to
return Path(cfg_env) whenever CONFIG_FILE is set (i.e., if cfg_env: return
Path(cfg_env)) so callers like get_config_path will honor a user-provided path
even on first-run, leaving the rest of the lookup logic unchanged.

In @config/defaults.yml:
- Line 51: The Authorization header in config/defaults.yml currently uses a
fallback empty value ("Authorization: Token ${DEEPGRAM_API_KEY:-}") which yields
a malformed header when DEEPGRAM_API_KEY is unset; update the config to require
the variable (e.g., "Authorization: Token ${DEEPGRAM_API_KEY}") and/or add
runtime validation where Deepgram is used (search for references to
DEEPGRAM_API_KEY and the Deepgram client initialization) to detect a missing key
and fail early with a clear error message before making API calls.
- Around line 87-89: The prompt value under the prompt key in defaults.yml is
written as a quoted multi-line string with the closing quote on its own line,
which introduces unintended trailing newlines; fix it by collapsing the prompt
into a single-line quoted string or convert it to a YAML block scalar (e.g., |
or >) so the closing delimiter is correct and no extra newlines or whitespace
are included—update the prompt entry so its content ends exactly where intended
without the stray closing-quote-only line.

In @tests/endpoints/system_admin_tests.robot:
- Line 194: Test assertion expects "specialized AI assistant" but the actual
default prompt text differs; update the Robot test to match the real default
prompt. Edit the assertion in tests/endpoints/system_admin_tests.robot (the
Should Contain check) to look for the actual default prompt beginning "You are a
helpful AI assistant with access to the user's personal memories and
conversation history." (matching the default prompt defined in SystemController
/ system_controller.py) so the test checks the real prompt text instead of
"specialized AI assistant".

🧹 Nitpick comments (6)

config/config.yml.template (1)
13-21: Consider adding inline comments to defaults section for consistency.

While the concrete default values work well for a template, adding inline comments showing that these reference models from defaults.yml helps users understand the configuration hierarchy at a glance. The current comments at lines 14-16 already do this well - consider extending the pattern to all entries.
💡 Optional enhancement
 defaults:
   llm: openai-llm              # LLM for memory extraction (from defaults.yml)
   embedding: openai-embed      # Embedding model (from defaults.yml)
   stt: stt-deepgram            # Speech-to-text service (from defaults.yml)
   # Transcription provider options:
   # - stt-deepgram: Cloud-based Deepgram (requires DEEPGRAM_API_KEY)
   # - stt-parakeet-batch: Local Parakeet ASR (requires Parakeet service running)
-  tts: tts-http                # Text-to-speech service
+  tts: tts-http                # Text-to-speech service (define custom in models section)
   vector_store: vs-qdrant      # Vector database (from defaults.yml)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)

254-285: Avoid broad except Exception here (or ensure fallback can’t throw).

If get_config() fails, the fallback path still calls get_config_path() and yaml.safe_dump(...). If either of those ever throws, you lose the intended “always return something” behavior. Consider narrowing exceptions and/or wrapping the fallback body too.

506-548: Use yaml.safe_dump and atomic write for chat config; align backup naming.

yaml.dump(...) can emit unsafe tags; safe_dump is the safer default. This path also directly writes to config.yml (same atomicity concern as memory). Also, backups use .backup here but .bak in memory—worth standardizing for ops/scripts.

backends/advanced/src/advanced_omi_backend/config.py (2)

246-267: Tighten _deep_merge typing; current ValueError handling won’t catch the common failure modes.

If override isn’t a dict, you’ll get AttributeError on override.items(), not ValueError. Consider explicit isinstance(..., dict) checks and fail fast with a clear message (or coerce to {} if you want permissive behavior).

306-370: Return a copy (or freeze) to prevent accidental mutation of the cached config.

get_config() returns the cached dict by reference. Any caller that mutates it will mutate the global config for everyone (and potentially across requests). If you want the cache to be authoritative, return a deep copy (or provide a read-only wrapper) and keep the cached object private.
backends/advanced/src/advanced_omi_backend/model_registry.py (1)
70-85: Guard embedding-dimension auto-defaulting by model provider to avoid silent dimension mismatches.

The validate_model validator applies hardcoded embedding dimensions based only on model_name (e.g., text-embedding-3-small → 1536, nomic-embed-text-v1.5 → 768) without checking self.model_provider or self.api_family. If a non-OpenAI provider or custom embedding server uses the same model name, this could silently set incorrect dimensions. Consider adding a provider check before applying defaults:
if self.model_name in defaults and self.model_provider in ('openai', 'unknown'):
    self.embedding_dimensions = defaults[self.model_name]
Or maintain a provider-specific defaults mapping instead. The current config template explicitly sets embedding_dimensions in all examples, but this validator serves as a safety net and should be provider-aware to prevent silent misconfigurations.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 91dcd89 and 88c836e.

📒 Files selected for processing (14)

backends/advanced/docker-compose-test.yml
backends/advanced/docker-compose.yml
backends/advanced/src/advanced_omi_backend/config.py
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
backends/advanced/src/advanced_omi_backend/llm_client.py
backends/advanced/src/advanced_omi_backend/model_registry.py
backends/advanced/src/advanced_omi_backend/services/memory/config.py
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
config/config.yml.template
config/defaults.yml
tests/endpoints/conversation_tests.robot
tests/endpoints/system_admin_tests.robot
tests/integration/integration_test.robot
tests/integration/websocket_streaming_tests.robot

💤 Files with no reviewable changes (1)

backends/advanced/src/advanced_omi_backend/llm_client.py

🧰 Additional context used

🧬 Code graph analysis (2)

backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

backends/advanced/src/advanced_omi_backend/config.py (1)

get_config (306-370)

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4)

backends/advanced/src/advanced_omi_backend/config.py (2)

get_config (306-370)

get_config_path (402-408)

backends/advanced/src/advanced_omi_backend/model_registry.py (1)

load_models_config (179-232)

backends/advanced/src/advanced_omi_backend/services/memory/service_factory.py (1)

get_memory_service (73-110)

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)

SpeakerRecognitionClient (25-649)

🪛 Ruff (0.14.10)

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

275-275: Do not catch blind exception: Exception

(BLE001)

499-499: Consider moving this statement to an else block

(TRY300)

501-501: Do not catch blind exception: Exception

(BLE001)

backends/advanced/src/advanced_omi_backend/config.py

264-264: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

295-296: try-except-pass detected, consider logging the exception

(S110)

295-295: Do not catch blind exception: Exception

(BLE001)

343-343: Do not catch blind exception: Exception

(BLE001)

344-344: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

359-359: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)

backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

66-72: LGTM! Clean migration to centralized configuration.

The refactoring to use get_config() aligns well with the centralized configuration system introduced in this PR. The function now returns the full merged configuration (defaults.yml + config.yml + env vars) as documented.

tests/endpoints/system_admin_tests.robot (1)

152-181: Excellent test coverage for configuration override behavior.

This test effectively validates that config.yml values correctly override defaults.yml values, which is a critical aspect of the hierarchical configuration system introduced in this PR.

tests/integration/integration_test.robot (1)

198-212: Excellent segment structure validation!

The explicit validation of segment structure (required keys, non-empty text, timing consistency) provides robust test coverage for the segment functionality fixes introduced in this PR. This prevents regressions and ensures segments are properly formatted.

config/config.yml.template (1)

1-195: Well-structured template with clear guidance.

The template effectively balances being a working example (with concrete defaults section) and providing comprehensive commented examples for custom configurations. The header comments clearly explain the relationship with defaults.yml and how to customize.

backends/advanced/src/advanced_omi_backend/config.py

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py

config/defaults.yml

tests/endpoints/system_admin_tests.robot

thestumonkey · 2026-01-12T09:00:24Z

Do tests pass locally / did you try? @thestumonkey

Of course I ran the tests, and they pass

thestumonkey added 7 commits January 11, 2026 17:16

Moved monolith conifg./yml to defaults and overrides

1efb141

added test for config + extra test for convo striucture

e1c6eb3

added mounted defaults file

dee5992

moved to using defaults system

6fa348d

fix segment text not being returned and processed with deepgram

f4cebfb

added segment detect to test

6b74fc7

removed specific segement timing as too much vairance based on provider

88c836e

thestumonkey requested a review from AnkushMalaker January 11, 2026 18:03

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

AnkushMalaker self-assigned this Jan 12, 2026

changed logging level from info to debug

b59b0c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix segments #260

Fix segments #260

thestumonkey commented Jan 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 11, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

AnkushMalaker commented Jan 11, 2026

Uh oh!

AnkushMalaker commented Jan 11, 2026

Uh oh!

coderabbitai bot commented Jan 11, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thestumonkey commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix segments #260

Are you sure you want to change the base?

Fix segments #260

Conversation

thestumonkey commented Jan 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

AnkushMalaker commented Jan 11, 2026

Uh oh!

AnkushMalaker commented Jan 11, 2026

Uh oh!

coderabbitai bot commented Jan 11, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thestumonkey commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thestumonkey commented Jan 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 11, 2026 •

edited

Loading