Skip to content

Conversation

@0xrushi
Copy link
Contributor

@0xrushi 0xrushi commented Jan 10, 2026

  • Introduced a new TranscriptAnnotation model for managing transcript corrections.

  • Added annotation_routes for creating and retrieving annotations via API.

  • Implemented cron jobs to surface potential errors in transcripts and finetune a model based on accepted corrections (actual surfacing of errors is not implemented, this is only a cron service with time.sleep backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py

  • Updated Docker Compose to include a cron service for running scheduled tasks.

  • Enhanced the web UI to support displaying and managing annotations, including accepting and rejecting suggestions.

  • Added tests for annotation model and integration flow to ensure functionality and reliability.

image

Summary by CodeRabbit

  • New Features

    • Transcript annotation system for creating, viewing, and managing per-segment corrections
    • AI-suggested corrections with accept/reject workflow and inline editing in conversation transcripts
    • API endpoints to create and fetch annotations
    • UI shows pending suggestions and memory reprocessing progress
  • Background Jobs

    • Scheduled worker that surfaces annotation suggestions and runs periodic fine-tuning
    • Memory reprocessing now enqueues title/summary generation jobs
  • Tests

    • Integration and model tests covering annotation flows and data model behavior
  • Chores

    • Added a background cron service to run scheduled jobs

✏️ Tip: You can customize this high-level summary in your review settings.

- Introduced a new `TranscriptAnnotation` model for managing transcript corrections.
- Added `annotation_routes` for creating and retrieving annotations via API.
- Implemented cron jobs to surface potential errors in transcripts and finetune a model based on accepted corrections.
- Updated Docker Compose to include a cron service for running scheduled tasks.
- Enhanced the web UI to support displaying and managing annotations, including accepting and rejecting suggestions.
- Added tests for annotation model and integration flow to ensure functionality and reliability.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 10, 2026

📝 Walkthrough

Walkthrough

Adds transcript annotation support: new Beanie model, API routes to create/fetch annotations, UI inline-editing and suggestion controls, background cron + worker jobs to surface suggestions and fine-tune, memory/queue integration (extra title/summary job), docker cron service, and tests.

Changes

Cohort / File(s) Summary
Data Model
backends/advanced/src/advanced_omi_backend/models/annotation.py, backends/advanced/src/advanced_omi_backend/models/__init__.py
New TranscriptAnnotation Beanie document with enums (status, source), UUID id, indexed fields, timestamps; exported from models package.
App Init
backends/advanced/src/advanced_omi_backend/app_factory.py
Registers TranscriptAnnotation in init_beanie document_models.
API & Routers
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py, .../routers/modules/__init__.py, .../routers/api_router.py
New FastAPI router exposing POST /api/annotations/ (validate ownership, persist annotation, update transcript segment, enqueue memory jobs) and GET /api/annotations/{conversation_id}; router included under /api/annotations.
Background Scheduler & Jobs
backends/advanced/src/advanced_omi_backend/cron.py, backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
Async cron runner and jobs: surface_error_suggestions (generate suggestion annotations) and finetune_hallucination_model (collect accepted corrections and simulate fine-tuning).
Memory / Queue Integration
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py, .../services/memory/providers/vector_stores.py
Enqueue additional title/summary generation job after memory processing; adapt Qdrant response handling to .points / query_points compatibility.
Docker
backends/advanced/docker-compose.yml
Adds cron service (build, command, env, mounts, depends_on mongo/redis, restart policy).
Web UI
backends/advanced/webui/src/pages/Conversations.tsx, backends/advanced/webui/src/services/api.ts
Adds annotations to Conversation type; inline editable transcript textarea, AI-suggestion accept/deny UI, polling for memory-processing job progress; adds annotationsApi (create, getByConversationId).
Workers / Queue Jobs
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
New worker jobs implementing suggestion surfacing and fine-tune job simulation.
Tests
backends/advanced/tests/integration/test_annotation_flow.py, backends/advanced/tests/test_annotation_models.py
Integration test for annotation creation flow (mocks for DB and enqueue) and model unit tests for TranscriptAnnotation defaults and enum validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API\n(annotation_routes)
    participant Conv as Conversation\n(MongoDB)
    participant DB as TranscriptAnnotation\n(MongoDB)
    participant Queue as RQ\n(memory_jobs)

    Client->>API: POST /api/annotations/ (annotation payload)
    API->>Conv: find_one(conversation_id) (validate ownership)
    Conv-->>API: conversation with active transcript
    API-->>API: build TranscriptAnnotation instance
    API->>DB: insert transcript annotation
    DB-->>API: inserted
    API->>Conv: update transcript segment text and save()
    Conv-->>API: saved
    API->>Queue: enqueue memory_processing job (+ title/summary job)
    Queue-->>API: job IDs
    API-->>Client: 201 Created + AnnotationResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested reviewers

  • AnkushMalaker

Poem

🐰 Hopping through transcripts to nudge a stray line,
I patch up the carrots where words ceased to shine.
Suggestions I scatter, small fixes I sow,
Background hums learning as corrections grow.
A tiny rabbit cheer for feedback that’s fine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective: introducing an annotation feature for transcript corrections and implementing cron jobs for scheduled tasks. It is concise, specific, and directly summarizes the key changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker AnkushMalaker self-requested a review January 10, 2026 03:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/cron.py:
- Around line 1-17: The imports for surface_error_suggestions,
finetune_hallucination_model and init_db are below the logging configuration;
move those three import statements (from
advanced_omi_backend.workers.annotation_jobs and advanced_omi_backend.database)
up to the top of the file with the other imports so all imports appear before
any runtime code (including logging.basicConfig) and keep the existing logger =
logging.getLogger("cron_scheduler") after configuration.
- Around line 37-59: The async finetune_hallucination_model function in
annotation_jobs.py uses blocking time.sleep(2), which freezes the event loop
when awaited by the scheduler; change that call to await asyncio.sleep(2),
ensure asyncio is imported in annotation_jobs.py, and keep the function
signature async def finetune_hallucination_model(...) unchanged so the scheduler
loop in cron.py can await it without blocking.

In @backends/advanced/src/advanced_omi_backend/models/annotation.py:
- Around line 30-31: The created_at and updated_at Field default_factories use
the deprecated datetime.utcnow(); replace them with timezone-aware calls by
importing timezone from datetime and changing the default_factory to a lambda
that returns datetime.now(timezone.utc) for both created_at and updated_at
(locate the fields in annotation.py and update the Field(default_factory=...)
for created_at and updated_at accordingly).

In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py:
- Line 57: The inline comment next to the condition in the annotation job (the
line using "if count >= 6:") is inconsistent: either update the comment to read
"Surface 6 places" to match the existing condition, or change the condition to
"if count > 5:" to preserve the intended "5-6" behavior; locate the check using
the symbol count and make the corresponding edit so the comment and code reflect
the same semantics.
- Around line 95-96: Move the mid-function import and blocking sleep out of the
async flow: remove the in-function "import time" and "time.sleep(2)", add the
required imports at the top of the file (replace the time import with "import
asyncio"), and in the async function replace "time.sleep(2)" with an awaited
non-blocking call "await asyncio.sleep(2)". Ensure the function remains async so
the await is valid and no blocking calls remain.

In @backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py:
- Around line 286-304: The except block around default_queue.enqueue for
generate_title_summary_job should not re-raise (remove "raise e") to avoid
making the whole memory job appear to fail; instead log the full traceback with
logger.exception (replace logger.error with logger.exception) so failures are
recorded without propagating, keeping summary_job_id, generate_title_summary_job
and default_queue.enqueue logic intact.

In @backends/advanced/webui/src/pages/Conversations.tsx:
- Around line 448-459: handleRejectSuggestion currently only mutates local state
so rejected annotations reappear on refresh; update it to persist the rejection
by calling the backend (mirror how handleAcceptSuggestion persists acceptances).
Call the same create/update API used by handleAcceptSuggestion (e.g., invoke the
create annotation endpoint with status: 'rejected' or call a new
updateAnnotationStatus/update endpoint if you add one), await the response, then
update setConversations to remove or mark the annotation only after a successful
API response and handle errors (logging or UI feedback) so state remains
consistent with the backend.
- Around line 130-177: The updater passed to setConversations currently performs
side-effectful API calls (annotationsApi.create and
conversationsApi.reprocessMemory) inside the state mutation which risks
unhandled async errors, memory leaks, and mixing concerns; refactor
handleSaveAnnotation to (1) synchronously compute the new conversations array
(using the same logic that updates the segment text) and call setConversations
with that array, (2) after setConversations perform the async work with await
and try/catch around annotationsApi.create and then
conversationsApi.reprocessMemory, updating setProcessingMemories on success and
reverting the optimistic UI on failure, and (3) guard async callbacks against
unmounted components (use an isMounted flag or AbortController) and clear
setEditingSegment only after the async flow completes or is safely cancelled.
🧹 Nitpick comments (8)
backends/advanced/tests/test_annotation_models.py (1)

8-90: Consider using a pytest fixture for Beanie initialization.

Currently, initialize_beanie() is called at the start of each test method, creating a new AsyncMongoMockClient every time. This approach works but could be optimized.

Consider using a pytest async fixture to initialize Beanie once per test and ensure proper cleanup:

♻️ Proposed refactor using pytest fixture
+@pytest.fixture
+async def beanie_setup():
+    """Setup and teardown Beanie for each test."""
+    client = AsyncMongoMockClient()
+    await init_beanie(database=client.db_name, document_models=[TranscriptAnnotation])
+    yield
+    # Cleanup if needed
+
 class TestAnnotationModel:
     """Test TranscriptAnnotation Pydantic/Beanie model."""
 
     @pytest.mark.asyncio
-    async def test_create_annotation_defaults(self):
+    async def test_create_annotation_defaults(self, beanie_setup):
         """Test creating an annotation with default values."""
-        await initialize_beanie()
         
         annotation = TranscriptAnnotation(

Apply similar changes to the other test methods.

backends/advanced/src/advanced_omi_backend/models/annotation.py (1)

20-20: Consider using Beanie's native ID generation.

While the current UUID-based ID works, Beanie typically uses MongoDB ObjectId by default. Using a string UUID as the primary id field may cause confusion since Beanie documents also have an internal _id field.

Consider either:

  1. Letting Beanie manage the _id field automatically (remove the custom id field)
  2. Renaming to annotation_id if you need a secondary UUID identifier
  3. Using PydanticObjectId from Beanie if you want to keep the custom ID pattern

This helps maintain consistency with other Beanie models in the codebase.

backends/advanced/tests/integration/test_annotation_flow.py (2)

28-31: Unused fixture can be removed.

The client fixture is defined but never used in the test. The test creates its own AsyncClient with ASGITransport on lines 82-83 instead.

🧹 Remove unused fixture
-@pytest.fixture
-async def client(app):
-    async with AsyncClient(app=app, base_url="http://test") as c:
-        yield c
-

56-57: Simplify the AsyncMock pattern for find_one.

The current pattern AsyncMock(return_value=mock_conv)() works but is unnecessarily convoluted. Using a direct coroutine or simpler async return is clearer.

♻️ Simplified mock pattern
-        # Configure find_one to return our mock conversation (awaitable)
-        MockConversation.find_one.return_value = AsyncMock(return_value=mock_conv)() # Calling AsyncMock returns an awaitable coroutine
+        # Configure find_one to return our mock conversation (awaitable)
+        async def mock_find_one(*args, **kwargs):
+            return mock_conv
+        MockConversation.find_one = mock_find_one

Alternatively, if you want to keep using AsyncMock:

MockConversation.find_one = AsyncMock(return_value=mock_conv)
backends/advanced/src/advanced_omi_backend/cron.py (1)

65-72: Graceful shutdown doesn't cancel running async tasks.

When SIGTERM is received, the handler calls sys.exit(0) which doesn't gracefully cancel any in-progress async jobs. Consider using an asyncio.Event for cooperative shutdown.

♻️ Graceful async shutdown pattern
+shutdown_event = asyncio.Event()
+
 async def run_scheduler():
     logger.info("Starting Cron Scheduler...")
     
     # Initialize DB connection
     await init_db()
     
     last_suggestion_run = datetime.min
     last_training_run = datetime.min
     
-    while True:
+    while not shutdown_event.is_set():
         now = datetime.utcnow()
         
         # ... job checks ...
         
-        await asyncio.sleep(60)
+        try:
+            await asyncio.wait_for(shutdown_event.wait(), timeout=60)
+        except asyncio.TimeoutError:
+            pass  # Continue loop after timeout
+    
+    logger.info("Scheduler shutdown complete")

 def handle_shutdown(signum, frame):
     logger.info("Shutting down Cron Scheduler...")
-    sys.exit(0)
+    shutdown_event.set()
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (2)

87-95: Potential enum serialization issue in response.

new_annotation.status is of type AnnotationStatus (an Enum), but AnnotationResponse.status is typed as str. While Pydantic may handle this automatically, explicit conversion ensures consistent behavior.

♻️ Explicit enum value conversion
     return AnnotationResponse(
         id=str(new_annotation.id),
         conversation_id=new_annotation.conversation_id,
         segment_index=new_annotation.segment_index,
         original_text=new_annotation.original_text,
         corrected_text=new_annotation.corrected_text,
-        status=new_annotation.status,
+        status=new_annotation.status.value if isinstance(new_annotation.status, TranscriptAnnotation.AnnotationStatus) else new_annotation.status,
         created_at=new_annotation.created_at
     )

Or update the model to use the enum type:

 class AnnotationResponse(BaseModel):
     # ...
-    status: str
+    status: TranscriptAnnotation.AnnotationStatus

60-72: Transcript update is not atomic and may have race conditions.

Concurrent annotation requests on the same conversation could lead to lost updates since the pattern reads the full conversation, modifies it in memory, then saves. Consider using MongoDB's atomic update operators or adding optimistic concurrency control.

For a quick fix, you could use a unique constraint or version field to detect conflicts. For production robustness, consider MongoDB's $set with array filters for atomic segment updates.

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)

185-188: Consider removing the intermediate variable for simplicity.

The results = response.points assignment and subsequent iteration could be simplified by directly iterating over response.points, as the original code likely did:

for result in response.points:
    memory = MemoryEntry(
        ...

This reduces unnecessary variable assignments without sacrificing readability, since response.points is already self-documenting.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f634a22 and 3413eec.

📒 Files selected for processing (15)
  • backends/advanced/docker-compose.yml
  • backends/advanced/src/advanced_omi_backend/app_factory.py
  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/models/__init__.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
  • backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py
  • backends/advanced/tests/integration/test_annotation_flow.py
  • backends/advanced/tests/test_annotation_models.py
  • backends/advanced/webui/src/pages/Conversations.tsx
  • backends/advanced/webui/src/services/api.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.

Files:

  • backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py
  • backends/advanced/tests/test_annotation_models.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/app_factory.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/webui/src/services/api.ts
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/tests/integration/test_annotation_flow.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py
  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
  • backends/advanced/webui/src/pages/Conversations.tsx
  • backends/advanced/src/advanced_omi_backend/models/__init__.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT add defensive hasattr() checks. Instead, research and understand input/response or class structure.
Use Black formatter with 100-character line length for Python code.
Use isort for Python import organization.
Use Wyoming protocol (JSONL + binary) for WebSocket communication for structured audio sessions.
Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Conversations must only be created when speech is detected, eliminating noise-only sessions from user interfaces.
Always store audio sessions in audio_chunks collection by audio_uuid. Only create entries in conversations collection when speech is detected.
Use email-based authentication with MongoDB ObjectId user IDs.
Use automatic {objectid_suffix}-{device_name} format for client registration.
All data must be scoped by user_id with efficient permission checking for data isolation.
JWT tokens are required for all endpoints and WebSocket connections.
Support pluggable memory providers (Chronicle native with individual facts or OpenMemory MCP delegation).

Files:

  • backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py
  • backends/advanced/tests/test_annotation_models.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/app_factory.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/tests/integration/test_annotation_flow.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py
  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
  • backends/advanced/src/advanced_omi_backend/models/__init__.py
**/*.{py,sh,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Use uv run python3 instead of python3 for running Python commands, as uv is the package manager for all Python projects.

Files:

  • backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py
  • backends/advanced/tests/test_annotation_models.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/app_factory.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/tests/integration/test_annotation_flow.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py
  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
  • backends/advanced/src/advanced_omi_backend/models/__init__.py
**/docker-compose*.{yml,yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

**/docker-compose*.{yml,yaml}: Use Docker Compose without --no-cache by default for faster builds. Only use --no-cache when explicitly needed for troubleshooting or when cached layers are causing issues.
Check if src/ is volume mounted. If not, do docker compose build so code changes are reflected. Do not simply run docker compose restart as it will not rebuild the image.

Files:

  • backends/advanced/docker-compose.yml
🧠 Learnings (1)
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.py : Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧬 Code graph analysis (9)
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (1)
  • generate_title_summary_job (514-637)
backends/advanced/tests/test_annotation_models.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
  • TranscriptAnnotation (8-39)
  • AnnotationStatus (11-14)
  • AnnotationSource (16-18)
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
  • TranscriptAnnotation (8-39)
backends/advanced/tests/integration/test_annotation_flow.py (3)
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
  • create_app (173-215)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
backends/advanced/src/advanced_omi_backend/models/conversation.py (2)
  • segments (197-199)
  • active_transcript (166-174)
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (5)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
  • TranscriptAnnotation (8-39)
  • AnnotationStatus (11-14)
  • AnnotationSource (16-18)
backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
  • Conversation (17-314)
  • active_transcript (166-174)
  • segments (197-199)
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
  • enqueue_memory_processing (257-305)
backends/advanced/src/advanced_omi_backend/models/job.py (1)
  • JobPriority (68-80)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
backends/advanced/src/advanced_omi_backend/cron.py (2)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
  • surface_error_suggestions (12-62)
  • finetune_hallucination_model (64-98)
extras/speaker-recognition/src/simple_speaker_recognition/database/__init__.py (1)
  • init_db (30-33)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
  • Conversation (17-314)
  • active_transcript (166-174)
  • segments (197-199)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
  • TranscriptAnnotation (8-39)
  • AnnotationStatus (11-14)
  • AnnotationSource (16-18)
backends/advanced/webui/src/pages/Conversations.tsx (1)
backends/advanced/webui/src/services/api.ts (3)
  • queueApi (199-226)
  • conversationsApi (109-124)
  • annotationsApi (150-159)
backends/advanced/src/advanced_omi_backend/models/__init__.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
  • TranscriptAnnotation (8-39)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py

302-302: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


303-303: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/models/annotation.py

35-39: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backends/advanced/tests/integration/test_annotation_flow.py

34-34: Unused function argument: mock_user

(ARG001)

backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py

34-34: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


100-100: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/cron.py

61-61: Unused function argument: signum

(ARG001)


61-61: Unused function argument: frame

(ARG001)

backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py

34-34: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


35-35: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

🔇 Additional comments (18)
backends/advanced/docker-compose.yml (1)

79-98: LGTM! Cron service is properly configured.

The cron service configuration follows best practices:

  • Uses uv run python as per coding guidelines
  • Mounts src/ for development (code changes reflected without rebuild per guidelines)
  • Properly depends on mongo and redis with health checks
  • Includes necessary environment variables and volumes
backends/advanced/src/advanced_omi_backend/models/__init__.py (1)

11-11: LGTM! Clean model export.

The TranscriptAnnotation import is properly placed at the top of the file after the docstring, following the coding guidelines for import organization.

backends/advanced/tests/test_annotation_models.py (1)

1-6: LGTM! Imports are properly organized.

All imports are correctly placed at the top of the file and grouped appropriately (standard library, third-party, local imports) as per coding guidelines.

backends/advanced/src/advanced_omi_backend/routers/api_router.py (1)

22-22: LGTM! Annotation router is properly integrated.

The annotation_router is correctly imported at the top with other routers and registered with appropriate prefix and tags, following the established pattern in this file.

Also applies to: 42-42

backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)

19-20: LGTM! Imports are properly placed.

The new imports are correctly positioned at the top of the file with other imports from the same package.

backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py (1)

28-28: LGTM! Clean integration of annotation router.

The import and export are properly placed, maintaining consistency with the existing router pattern.

Also applies to: 42-42

backends/advanced/src/advanced_omi_backend/app_factory.py (1)

59-59: LGTM! Proper Beanie model registration.

The TranscriptAnnotation model is correctly imported and registered with Beanie during application startup, following the established pattern for other document models.

Also applies to: 63-63

backends/advanced/webui/src/services/api.ts (1)

150-159: LGTM! Well-structured annotation API.

The new annotationsApi follows the established patterns in the codebase with properly typed parameters and clear endpoint definitions.

backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (1)

12-62: The deduplication logic is correct as designed.

Conversations cannot be shared across users. Each conversation_id is globally unique (marked as unique=True in the model) and scoped to a single user_id. The existing annotation check at lines 39-42 querying by conversation_id and segment_index is sufficient for deduplication, since conversation_id inherently belongs to only one user. Adding user_id to the filter would be redundant.

backends/advanced/tests/integration/test_annotation_flow.py (2)

33-34: The mock_user parameter is correctly used.

Static analysis flagged mock_user as unused, but it's actually required because the app fixture depends on it. The parameter ensures mock_user is instantiated before app.


81-108: Test structure and assertions are comprehensive.

The test properly verifies:

  • API response status and payload
  • Database interactions (conversation lookup, annotation creation, insert call)
  • Transcript segment update with corrected text
  • Memory job enqueue with correct parameters
backends/advanced/src/advanced_omi_backend/cron.py (1)

61-63: Signal handler parameters are required by convention.

The static analysis flagged signum and frame as unused, but these parameters are required by Python's signal handler signature. This is a false positive.

backends/advanced/webui/src/pages/Conversations.tsx (2)

93-128: Job polling implementation looks good.

The useEffect properly:

  • Cleans up the interval on unmount
  • Only polls when there are active jobs
  • Updates state correctly when jobs complete
  • Handles errors gracefully

29-37: Annotation type and UI rendering are well-structured.

The annotation interface properly types all fields, and the UI clearly displays pending suggestions with Accept/Deny actions. The visual distinction with yellow highlighting effectively draws attention to AI suggestions.

Also applies to: 866-902

backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (3)

31-35: Depends() in argument defaults is idiomatic FastAPI.

The static analysis hint (B008) flags this pattern, but it's the standard and recommended way to declare dependencies in FastAPI. This is a false positive.

Also applies to: 97-101


97-118: get_annotations endpoint correctly filters by user.

The endpoint properly scopes results by user_id, ensuring users can only access their own annotations. The response mapping is clean and consistent with the create endpoint.


74-81: The code is correct. enqueue_memory_processing is a synchronous function (defined at line 257 of memory_jobs.py as def enqueue_memory_processing(, not async def). It uses RQ's synchronous memory_queue.enqueue() method to queue a job, so no await is needed or appropriate.

Likely an incorrect or invalid review comment.

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)

178-183: The Qdrant client API usage is correct. The project uses qdrant-client 1.16.2, and the code properly calls query_points() which returns a QueryResponse object with a valid .points attribute containing a list of ScoredPoint records. The v1.10+ compatibility reference is accurate—v1.10 introduced the unified Query API consolidating older search helpers into query_points, making this the appropriate method for async Qdrant queries.

No API compatibility issues found. The intermediate variable results = response.points on line 185 is a minor refactoring choice that adds marginal clarity.

)
await suggestion.insert()
count += 1
if count >= 6: # Surface 5-6 places as requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor comment mismatch.

The comment says "Surface 5-6 places" but the code checks >= 6, which would actually surface 6 places. Consider updating the comment to just say "6" or adjusting the condition to > 5 if you want "5-6" variability.

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py at
line 57, The inline comment next to the condition in the annotation job (the
line using "if count >= 6:") is inconsistent: either update the comment to read
"Surface 6 places" to match the existing condition, or change the condition to
"if count > 5:" to preserve the intended "5-6" behavior; locate the check using
the symbol count and make the corresponding edit so the comment and code reflect
the same semantics.

Comment on lines +286 to +304

# Also enqueue title/summary generation to ensure summaries reflect any transcript changes
try:
# Use a timestamp in job_id to avoid conflicts if re-run frequently
summary_job_id = f"title_summary_{conversation_id[:8]}_{int(time.time())}"

default_queue.enqueue(
generate_title_summary_job,
conversation_id,
job_timeout=300,
result_ttl=JOB_RESULT_TTL,
job_id=summary_job_id,
description=f"Generate title and summary for conversation {conversation_id[:8]}",
)
logger.info(f"📥 RQ: Enqueued summary job {summary_job_id} for conversation {conversation_id}")
except Exception as e:
logger.error(f"Failed to enqueue summary job: {e}")
raise e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix error handling to avoid cascading failure and improve logging.

The current error handling has two issues:

  1. Critical: Re-raising the exception (line 303) prevents the memory job from being returned to the caller, making it appear that the entire operation failed even though the memory job was successfully enqueued.

  2. Minor: Use logging.exception instead of logging.error for better stack traces, and use bare raise instead of raise e to preserve the original traceback.

🐛 Proposed fix for error handling
     # Also enqueue title/summary generation to ensure summaries reflect any transcript changes
     try:
         # Use a timestamp in job_id to avoid conflicts if re-run frequently
         summary_job_id = f"title_summary_{conversation_id[:8]}_{int(time.time())}"
         
         default_queue.enqueue(
             generate_title_summary_job,
             conversation_id,
             job_timeout=300,
             result_ttl=JOB_RESULT_TTL,
             job_id=summary_job_id,
             description=f"Generate title and summary for conversation {conversation_id[:8]}",
         )
         logger.info(f"📥 RQ: Enqueued summary job {summary_job_id} for conversation {conversation_id}")
     except Exception as e:
-        logger.error(f"Failed to enqueue summary job: {e}")
-        raise e
+        logger.exception(f"Failed to enqueue summary job for conversation {conversation_id}")
+        # Don't re-raise - memory job was already enqueued successfully
+        # Summary generation is supplementary and should not block the main memory job
 
     return job
🧰 Tools
🪛 Ruff (0.14.10)

302-302: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


303-303: Use raise without specifying exception name

Remove exception name

(TRY201)

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py around
lines 286 - 304, The except block around default_queue.enqueue for
generate_title_summary_job should not re-raise (remove "raise e") to avoid
making the whole memory job appear to fail; instead log the full traceback with
logger.exception (replace logger.error with logger.exception) so failures are
recorded without propagating, keeping summary_job_id, generate_title_summary_job
and default_queue.enqueue logic intact.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @backends/advanced/webui/src/pages/Conversations.tsx:
- Around line 491-502: handleRejectSuggestion currently only mutates local state
and is unnecessarily declared async; make rejection persistent by calling the
backend to mark the annotation as rejected (e.g., call a new or existing API
like markAnnotationRejected(conversationId, annotationId) and handle errors) and
then update state via setConversations to remove the annotation (using
conversation_id and annotation.id), or if backend work is not possible yet,
persist rejected IDs in localStorage (store annotationId keyed by conversationId
and filter them out wherever conversations are loaded/rendered). Also remove the
unnecessary async keyword from handleRejectSuggestion if you implement the
localStorage synchronous approach.
- Around line 1063-1069: The progress bar width logic uses
jobProgress[processingMemories[conversation.conversation_id]] but only treats
'finished' as the terminal successful state; update the width condition in the
progress bar div (the inline style computing width) to also treat 'completed' as
a terminal/success state (i.e., return '100%' when the status is 'finished' OR
'completed'), and consider handling any other terminal statuses your backend may
return the same way so the UI reflects completed jobs correctly.
- Around line 136-171: jobProgress entries are never removed, causing unbounded
growth; when a job reaches a terminal status inside the useEffect polling loop
(the block that checks if job.status is in
['finished','completed','failed','stopped','canceled']), after you update
processingMemories (the setProcessingMemories call), also call setJobProgress to
remove the jobId key (e.g. setJobProgress(prev => { const next = { ...prev };
delete next[jobId]; return next; })); ensure this runs for the same terminal
branch that refreshes the conversation so jobProgress stays in sync with
processingMemories.
- Around line 173-220: handleSaveAnnotation currently performs
annotationsApi.create and conversationsApi.reprocessMemory inside the
setConversations updater (and then sets editingSegment to null), which causes
side effects in a state updater, captures editingSegment after it’s cleared, and
leaves errors unhandled by the outer try/catch; fix by snapshotting the
editingSegment into a local const (e.g. const segment = editingSegment) at the
top of handleSaveAnnotation, build the new conversations array (or call
setConversations with a pure updater that only returns updated state) without
performing API calls, then after setConversations run the async work outside the
state updater using await/try-catch with annotationsApi.create({
conversation_id: segment.conversationId, segment_index: segment.segmentIndex,
original_text, corrected_text, status: 'accepted' }) and then await
conversationsApi.reprocessMemory(segment.conversationId) to
setProcessingMemories with res.data.job_id; only then clear editingSegment
(setEditingSegment(null)) and handle errors in the surrounding try/catch to
revert UI or setError as needed.
🧹 Nitpick comments (2)
backends/advanced/webui/src/pages/Conversations.tsx (2)

78-95: Potential double-save on Enter key.

When Enter is pressed, onSave() is called (line 81), and then the blur event may also fire onSave() (line 95) before the component unmounts. While the guard in handleSaveAnnotation (if (!editingSegment) return) should prevent duplicate API calls, consider either:

  1. Removing onBlur={onSave} and relying only on explicit save/cancel, or
  2. Adding a local isSaving ref to prevent the double-call at the component level.
💡 Optional: Use a ref to prevent double-save
 const AutoResizingTextarea = ({ value, onChange, onSave, onCancel }: AutoResizingTextareaProps) => {
   const textareaRef = useRef<HTMLTextAreaElement>(null)
+  const isSavingRef = useRef(false)

   useEffect(() => {
     if (textareaRef.current) {
       textareaRef.current.style.height = 'auto'
       textareaRef.current.style.height = textareaRef.current.scrollHeight + 'px'
     }
   }, [value])

+  const handleSave = () => {
+    if (isSavingRef.current) return
+    isSavingRef.current = true
+    onSave()
+  }
+
   const handleKeyDown = (e: React.KeyboardEvent) => {
     if (e.key === 'Enter' && !e.shiftKey) {
       e.preventDefault()
-      onSave()
+      handleSave()
     }
     if (e.key === 'Escape') {
       onCancel()
     }
   }

   return (
     <div className="relative w-full ml-1">
       <textarea
         ref={textareaRef}
         value={value}
         onChange={onChange}
         onKeyDown={handleKeyDown}
-        onBlur={onSave}
+        onBlur={handleSave}

430-452: Annotations fetch failure blocks transcript loading.

If annotationsApi.getByConversationId fails, the entire Promise.all rejects, and the transcript won't be displayed. Consider fetching annotations separately so a failure doesn't block the primary functionality.

💡 Proposed fix: Handle annotations fetch independently
     try {
-      const [convResponse, annotationsResponse] = await Promise.all([
-        conversationsApi.getById(conversation.conversation_id),
-        annotationsApi.getByConversationId(conversation.conversation_id)
-      ])
+      const convResponse = await conversationsApi.getById(conversation.conversation_id)
+      
+      // Fetch annotations separately - don't block transcript on failure
+      let annotations: any[] = []
+      try {
+        const annotationsResponse = await annotationsApi.getByConversationId(conversation.conversation_id)
+        annotations = annotationsResponse.data || []
+      } catch (annotationsErr) {
+        console.warn('Failed to fetch annotations:', annotationsErr)
+      }

       if (convResponse.status === 200 && convResponse.data.conversation) {
         // Update the conversation in state with full data and annotations
         setConversations(prev => prev.map(c =>
           c.conversation_id === conversationId
             ? { 
                 ...c, 
                 ...convResponse.data.conversation,
-                annotations: annotationsResponse.data || []
+                annotations
               }
             : c
         ))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3413eec and a2cc30b.

📒 Files selected for processing (1)
  • backends/advanced/webui/src/pages/Conversations.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.

Files:

  • backends/advanced/webui/src/pages/Conversations.tsx
🧬 Code graph analysis (1)
backends/advanced/webui/src/pages/Conversations.tsx (1)
backends/advanced/webui/src/services/api.ts (3)
  • queueApi (199-226)
  • conversationsApi (109-124)
  • annotationsApi (150-159)
🔇 Additional comments (4)
backends/advanced/webui/src/pages/Conversations.tsx (4)

1-5: LGTM!

Imports are properly organized at the top of the file with standard library (React), third-party (lucide-react), and local imports grouped appropriately.


908-945: LGTM!

The pending suggestions UI is well-structured with clear visual differentiation between original and corrected text. The Accept/Deny buttons are appropriately labeled.


1015-1035: LGTM!

The inline editing implementation is clean. The transition between view and edit modes is handled well, with appropriate keyboard shortcuts (Enter to save, Escape to cancel). Hiding the play button during edit prevents conflicting interactions.


29-37: LGTM!

The annotations interface extension is well-typed with appropriate fields for the annotation feature.

Comment on lines +136 to +171
useEffect(() => {
// Poll for job status
const interval = setInterval(async () => {
const activeJobs = Object.entries(processingMemories)
if (activeJobs.length === 0) return

for (const [convId, jobId] of activeJobs) {
try {
const response = await queueApi.getJob(jobId)
const job = response.data
setJobProgress(prev => ({ ...prev, [jobId]: job.status }))

if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) {
// Job done, remove from tracking and refresh conversation
setProcessingMemories(prev => {
const newState = { ...prev }
delete newState[convId]
return newState
})

// Refresh this conversation to show new memories
const convResponse = await conversationsApi.getById(convId)
if (convResponse.status === 200 && convResponse.data.conversation) {
setConversations(prev => prev.map(c =>
c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c
))
}
}
} catch (err) {
console.error(`Failed to poll job ${jobId}:`, err)
}
}
}, 2000)

return () => clearInterval(interval)
}, [processingMemories])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory leak: jobProgress entries are never cleaned up.

When a job completes, it's removed from processingMemories (line 150-154), but the corresponding entry in jobProgress is never removed. Over time, this state object will grow unboundedly.

🧹 Proposed fix: Clean up jobProgress when job completes
           if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) {
             // Job done, remove from tracking and refresh conversation
             setProcessingMemories(prev => {
               const newState = { ...prev }
               delete newState[convId]
               return newState
             })
+            // Clean up job progress entry
+            setJobProgress(prev => {
+              const newState = { ...prev }
+              delete newState[jobId]
+              return newState
+            })
             
             // Refresh this conversation to show new memories
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
// Poll for job status
const interval = setInterval(async () => {
const activeJobs = Object.entries(processingMemories)
if (activeJobs.length === 0) return
for (const [convId, jobId] of activeJobs) {
try {
const response = await queueApi.getJob(jobId)
const job = response.data
setJobProgress(prev => ({ ...prev, [jobId]: job.status }))
if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) {
// Job done, remove from tracking and refresh conversation
setProcessingMemories(prev => {
const newState = { ...prev }
delete newState[convId]
return newState
})
// Refresh this conversation to show new memories
const convResponse = await conversationsApi.getById(convId)
if (convResponse.status === 200 && convResponse.data.conversation) {
setConversations(prev => prev.map(c =>
c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c
))
}
}
} catch (err) {
console.error(`Failed to poll job ${jobId}:`, err)
}
}
}, 2000)
return () => clearInterval(interval)
}, [processingMemories])
useEffect(() => {
// Poll for job status
const interval = setInterval(async () => {
const activeJobs = Object.entries(processingMemories)
if (activeJobs.length === 0) return
for (const [convId, jobId] of activeJobs) {
try {
const response = await queueApi.getJob(jobId)
const job = response.data
setJobProgress(prev => ({ ...prev, [jobId]: job.status }))
if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) {
// Job done, remove from tracking and refresh conversation
setProcessingMemories(prev => {
const newState = { ...prev }
delete newState[convId]
return newState
})
// Clean up job progress entry
setJobProgress(prev => {
const newState = { ...prev }
delete newState[jobId]
return newState
})
// Refresh this conversation to show new memories
const convResponse = await conversationsApi.getById(convId)
if (convResponse.status === 200 && convResponse.data.conversation) {
setConversations(prev => prev.map(c =>
c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c
))
}
}
} catch (err) {
console.error(`Failed to poll job ${jobId}:`, err)
}
}
}, 2000)
return () => clearInterval(interval)
}, [processingMemories])
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 136 - 171,
jobProgress entries are never removed, causing unbounded growth; when a job
reaches a terminal status inside the useEffect polling loop (the block that
checks if job.status is in
['finished','completed','failed','stopped','canceled']), after you update
processingMemories (the setProcessingMemories call), also call setJobProgress to
remove the jobId key (e.g. setJobProgress(prev => { const next = { ...prev };
delete next[jobId]; return next; })); ensure this runs for the same terminal
branch that refreshes the conversation so jobProgress stays in sync with
processingMemories.

Comment on lines +173 to +220
const handleSaveAnnotation = async () => {
if (!editingSegment) return

try {
// Optimistically update UI
setConversations(prev => prev.map(c => {
if (c.conversation_id === editingSegment.conversationId && c.segments) {
const newSegments = [...c.segments]
if (newSegments[editingSegment.segmentIndex]) {
// Store original text in case we need to revert (not implemented here for brevity)
const originalText = newSegments[editingSegment.segmentIndex].text
newSegments[editingSegment.segmentIndex] = {
...newSegments[editingSegment.segmentIndex],
text: editingSegment.text
}

// Call API in background
annotationsApi.create({
conversation_id: editingSegment.conversationId,
segment_index: editingSegment.segmentIndex,
original_text: originalText,
corrected_text: editingSegment.text,
status: 'accepted'
}).then(() => {
// Trigger memory reprocessing explicitly to get the job ID for the UI
conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => {
setProcessingMemories(prev => ({
...prev,
[editingSegment.conversationId]: res.data.job_id
}))
})
}).catch(err => {
console.error('Failed to save annotation:', err)
setError('Failed to save correction. Please try again.')
// Revert UI change would go here
})
}
return { ...c, segments: newSegments }
}
return c
}))

setEditingSegment(null)
} catch (err: any) {
console.error('Error saving annotation:', err)
setError('Failed to save correction')
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Anti-pattern: API call inside setState callback.

The annotationsApi.create() call at line 190 is inside the setConversations callback. This is problematic because:

  1. setState callbacks should be pure functions without side effects.
  2. React may call the callback multiple times in concurrent mode.
  3. The closure captures editingSegment which is set to null at line 215 before the async operations complete.

Additionally, the outer try-catch (lines 216-219) won't catch errors from the async .then()/.catch() chain.

🛠️ Suggested refactor: Move API call outside setState
 const handleSaveAnnotation = async () => {
   if (!editingSegment) return
+  
+  const { conversationId, segmentIndex, text } = editingSegment
+  
+  // Find original text before updating state
+  const conversation = conversations.find(c => c.conversation_id === conversationId)
+  const originalText = conversation?.segments?.[segmentIndex]?.text || ''

   try {
     // Optimistically update UI
     setConversations(prev => prev.map(c => {
-      if (c.conversation_id === editingSegment.conversationId && c.segments) {
+      if (c.conversation_id === conversationId && c.segments) {
         const newSegments = [...c.segments]
-        if (newSegments[editingSegment.segmentIndex]) {
-          // Store original text in case we need to revert (not implemented here for brevity)
-          const originalText = newSegments[editingSegment.segmentIndex].text
-          newSegments[editingSegment.segmentIndex] = {
-            ...newSegments[editingSegment.segmentIndex],
-            text: editingSegment.text
+        if (newSegments[segmentIndex]) {
+          newSegments[segmentIndex] = {
+            ...newSegments[segmentIndex],
+            text: text
           }
-
-          // Call API in background
-          annotationsApi.create({
-            conversation_id: editingSegment.conversationId,
-            segment_index: editingSegment.segmentIndex,
-            original_text: originalText,
-            corrected_text: editingSegment.text,
-            status: 'accepted'
-          }).then(() => {
-             // Trigger memory reprocessing explicitly to get the job ID for the UI
-             conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => {
-               setProcessingMemories(prev => ({
-                 ...prev,
-                 [editingSegment.conversationId]: res.data.job_id
-               }))
-             })
-          }).catch(err => {
-            console.error('Failed to save annotation:', err)
-            setError('Failed to save correction. Please try again.')
-            // Revert UI change would go here
-          })
         }
         return { ...c, segments: newSegments }
       }
       return c
     }))
     
     setEditingSegment(null)
-  } catch (err: any) {
-    console.error('Error saving annotation:', err)
-    setError('Failed to save correction')
-  }
+    
+    // Call API after state update
+    await annotationsApi.create({
+      conversation_id: conversationId,
+      segment_index: segmentIndex,
+      original_text: originalText,
+      corrected_text: text,
+      status: 'accepted'
+    })
+    
+    // Trigger memory reprocessing
+    const res = await conversationsApi.reprocessMemory(conversationId)
+    setProcessingMemories(prev => ({
+      ...prev,
+      [conversationId]: res.data.job_id
+    }))
+  } catch (err: any) {
+    console.error('Failed to save annotation:', err)
+    setError('Failed to save correction. Please try again.')
+    // TODO: Revert optimistic UI update on failure
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleSaveAnnotation = async () => {
if (!editingSegment) return
try {
// Optimistically update UI
setConversations(prev => prev.map(c => {
if (c.conversation_id === editingSegment.conversationId && c.segments) {
const newSegments = [...c.segments]
if (newSegments[editingSegment.segmentIndex]) {
// Store original text in case we need to revert (not implemented here for brevity)
const originalText = newSegments[editingSegment.segmentIndex].text
newSegments[editingSegment.segmentIndex] = {
...newSegments[editingSegment.segmentIndex],
text: editingSegment.text
}
// Call API in background
annotationsApi.create({
conversation_id: editingSegment.conversationId,
segment_index: editingSegment.segmentIndex,
original_text: originalText,
corrected_text: editingSegment.text,
status: 'accepted'
}).then(() => {
// Trigger memory reprocessing explicitly to get the job ID for the UI
conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => {
setProcessingMemories(prev => ({
...prev,
[editingSegment.conversationId]: res.data.job_id
}))
})
}).catch(err => {
console.error('Failed to save annotation:', err)
setError('Failed to save correction. Please try again.')
// Revert UI change would go here
})
}
return { ...c, segments: newSegments }
}
return c
}))
setEditingSegment(null)
} catch (err: any) {
console.error('Error saving annotation:', err)
setError('Failed to save correction')
}
}
const handleSaveAnnotation = async () => {
if (!editingSegment) return
const { conversationId, segmentIndex, text } = editingSegment
// Find original text before updating state
const conversation = conversations.find(c => c.conversation_id === conversationId)
const originalText = conversation?.segments?.[segmentIndex]?.text || ''
try {
// Optimistically update UI
setConversations(prev => prev.map(c => {
if (c.conversation_id === conversationId && c.segments) {
const newSegments = [...c.segments]
if (newSegments[segmentIndex]) {
newSegments[segmentIndex] = {
...newSegments[segmentIndex],
text: text
}
}
return { ...c, segments: newSegments }
}
return c
}))
setEditingSegment(null)
// Call API after state update
await annotationsApi.create({
conversation_id: conversationId,
segment_index: segmentIndex,
original_text: originalText,
corrected_text: text,
status: 'accepted'
})
// Trigger memory reprocessing
const res = await conversationsApi.reprocessMemory(conversationId)
setProcessingMemories(prev => ({
...prev,
[conversationId]: res.data.job_id
}))
} catch (err: any) {
console.error('Failed to save annotation:', err)
setError('Failed to save correction. Please try again.')
// TODO: Revert optimistic UI update on failure
}
}
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 173 - 220,
handleSaveAnnotation currently performs annotationsApi.create and
conversationsApi.reprocessMemory inside the setConversations updater (and then
sets editingSegment to null), which causes side effects in a state updater,
captures editingSegment after it’s cleared, and leaves errors unhandled by the
outer try/catch; fix by snapshotting the editingSegment into a local const (e.g.
const segment = editingSegment) at the top of handleSaveAnnotation, build the
new conversations array (or call setConversations with a pure updater that only
returns updated state) without performing API calls, then after setConversations
run the async work outside the state updater using await/try-catch with
annotationsApi.create({ conversation_id: segment.conversationId, segment_index:
segment.segmentIndex, original_text, corrected_text, status: 'accepted' }) and
then await conversationsApi.reprocessMemory(segment.conversationId) to
setProcessingMemories with res.data.job_id; only then clear editingSegment
(setEditingSegment(null)) and handle errors in the surrounding try/catch to
revert UI or setError as needed.

Comment on lines +455 to +489
const handleAcceptSuggestion = async (conversationId: string, annotation: any) => {
try {
// 1. Update annotation status to accepted
// We'd need an update endpoint, but for now we can create a new one or assume 'create' with same ID updates if we handled it,
// but simpler is to use the create endpoint to overwrite/confirm.
// Ideally we should have an update endpoint.
// Since I didn't create an update endpoint, I'll re-create it as accepted.

await annotationsApi.create({
conversation_id: conversationId,
segment_index: annotation.segment_index,
original_text: annotation.original_text,
corrected_text: annotation.corrected_text,
status: 'accepted'
})

// 2. Update local state to reflect change (hide suggestion, update transcript)
setConversations(prev => prev.map(c => {
if (c.conversation_id === conversationId && c.segments && c.annotations) {
const newSegments = [...c.segments]
if (newSegments[annotation.segment_index]) {
newSegments[annotation.segment_index].text = annotation.corrected_text
}
return {
...c,
segments: newSegments,
annotations: c.annotations.filter(a => a.id !== annotation.id) // Remove processed suggestion
}
}
return c
}))
} catch (err) {
console.error('Failed to accept suggestion:', err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent failure when accepting suggestion.

The catch block (lines 486-488) only logs the error without informing the user. The UI has already been optimistically updated (lines 472-485), so the user won't know the operation failed.

Also, note the comment at lines 457-461 mentions there's no update endpoint, so create is used as a workaround. Verify with the backend that creating a new annotation with the same conversation_id and segment_index properly updates/replaces the existing one rather than creating duplicates.

🛠️ Proposed fix: Add error feedback and consider reverting optimistic update
     } catch (err) {
       console.error('Failed to accept suggestion:', err)
+      setError('Failed to accept suggestion. Please try again.')
+      // Optionally revert the optimistic update here
     }
   }

Comment on lines +491 to +502
const handleRejectSuggestion = async (conversationId: string, annotationId: string) => {
// Ideally call API to mark as rejected. For now just remove from UI.
setConversations(prev => prev.map(c => {
if (c.conversation_id === conversationId && c.annotations) {
return {
...c,
annotations: c.annotations.filter(a => a.id !== annotationId)
}
}
return c
}))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rejection is not persisted - suggestion will reappear on refresh.

The function only removes the annotation from local state. If the user refreshes the page, the "rejected" suggestion will reappear as pending. The comment acknowledges this gap.

Consider either:

  1. Adding an API call to update the annotation status to 'rejected', or
  2. At minimum, storing rejected annotation IDs in localStorage to filter them client-side.

Also, the async keyword on line 491 is unnecessary since there's no await.

🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 491 - 502,
handleRejectSuggestion currently only mutates local state and is unnecessarily
declared async; make rejection persistent by calling the backend to mark the
annotation as rejected (e.g., call a new or existing API like
markAnnotationRejected(conversationId, annotationId) and handle errors) and then
update state via setConversations to remove the annotation (using
conversation_id and annotation.id), or if backend work is not possible yet,
persist rejected IDs in localStorage (store annotationId keyed by conversationId
and filter them out wherever conversations are loaded/rendered). Also remove the
unnecessary async keyword from handleRejectSuggestion if you implement the
localStorage synchronous approach.

Comment on lines +1063 to +1069
<div
className="bg-blue-600 h-1.5 rounded-full transition-all duration-500"
style={{
width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' :
jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%'
}}
></div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress bar doesn't handle all terminal statuses.

Line 148 checks for terminal statuses including 'finished' and 'completed', but the progress bar width calculation (line 1066) only checks for 'finished'. If the backend returns 'completed' instead, the progress bar will show 10% width instead of 100%.

🛠️ Proposed fix
                               style={{ 
-                                width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' : 
+                                width: ['finished', 'completed'].includes(jobProgress[processingMemories[conversation.conversation_id]]) ? '100%' : 
                                        jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%' 
                               }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<div
className="bg-blue-600 h-1.5 rounded-full transition-all duration-500"
style={{
width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' :
jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%'
}}
></div>
<div
className="bg-blue-600 h-1.5 rounded-full transition-all duration-500"
style={{
width: ['finished', 'completed'].includes(jobProgress[processingMemories[conversation.conversation_id]]) ? '100%' :
jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%'
}}
></div>
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 1063 -
1069, The progress bar width logic uses
jobProgress[processingMemories[conversation.conversation_id]] but only treats
'finished' as the terminal successful state; update the width condition in the
progress bar div (the inline style computing width) to also treat 'completed' as
a terminal/success state (i.e., return '100%' when the status is 'finished' OR
'completed'), and consider handling any other terminal statuses your backend may
return the same way so the UI reflects completed jobs correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/cron.py:
- Around line 40-57: The scheduled jobs call surface_error_suggestions() and
finetune_hallucination_model() without scoping to a user_id; update the loop to
iterate over active users (e.g., fetch active user IDs) and invoke await
surface_error_suggestions(user_id) and await
finetune_hallucination_model(user_id) for each user, updating
last_suggestion_run and last_training_run after the per-user batch completes; if
the job functions don't accept user_id, add a user_id parameter (with
appropriate defaults or validation) and ensure exceptions are caught per-user
(log user_id on error) so one failing user doesn't stop the whole batch.
- Line 38: Replace the deprecated call to datetime.utcnow() used to set the now
variable with a timezone-aware call datetime.now(timezone.utc); update imports
in cron.py to import timezone (e.g., from datetime import datetime, timezone) or
use datetime.timezone if you keep a module import, and ensure any downstream
code expecting naive datetimes is adjusted to handle an aware datetime.

In @backends/advanced/src/advanced_omi_backend/models/annotation.py:
- Line 20: The annotation model's id is defined as a UUID string (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from the User
model's MongoDB ObjectId pattern; change the Annotation model to use
Beanie/PydanticObjectId for the primary id (match the User model's id type) and
add a string accessor property (e.g., annotation_id) if a string representation
is needed, ensuring imports and any references to Annotation.id elsewhere are
updated to expect PydanticObjectId and use the new string property when
necessary.

In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py:
- Around line 40-43: The existence check currently queries
TranscriptAnnotation.find_one by "conversation_id" and "segment_index" only;
update that query to also include the relevant "user_id" field (e.g., add
"user_id": conv.user_id or job.user_id depending on the surrounding scope) so
the lookup is scoped per user, and ensure the variable you reference
(conv.user_id or job.user_id) is available in the function before using it.
- Around line 22-24: The Conversation.find query in annotation_jobs.py currently
omits user scoping and will retrieve conversations for all users; update the job
to accept a user_id parameter (or obtain the intended user context) and include
a filter like {"user_id": user_id, "created_at": {"$gte": since}} when calling
Conversation.find, or if the job truly must be global add a clear comment near
Conversation.find explaining why user_id scoping is intentionally omitted and
ensure any permission checks are applied; locate this change around the
Conversation.find call and the surrounding job entrypoint to wire the user_id
parameter through.
- Line 21: Replace the naive UTC timestamp usage assigned to variable "since"
(currently using datetime.utcnow()) with a timezone-aware timestamp by calling
datetime.now(timezone.utc) and subtracting timedelta(days=1); also ensure
timezone is imported from datetime (or use datetime.timezone) so the expression
becomes timezone-aware throughout the module.
- Around line 58-59: The comment "Surface 5-6 places as requested" is
inconsistent with the condition `if count >= 6` (which actually limits results
to up to 6); update the comment near the `if count >= 6` check to "Surface up to
6 places" (or "up to 6 suggestions"), or if you truly want to enforce 5–6
results, change the condition logic around the `count` variable (e.g., break
when `count >= 5` and ensure counting semantics produce 5–6 items) so the
comment and the `if count >= 6` behavior match.
- Around line 72-74: The query fetching TranscriptAnnotation entries currently
returns accepted annotations across all users; update the cron job that contains
the corrections = await TranscriptAnnotation.find(...) call to scope by user_id:
either add a user_id parameter to that function and include "user_id": user_id
in the find filter alongside
TranscriptAnnotation.AnnotationStatus.ACCEPTED.value, or modify the cron to
iterate over all user ids and run the existing logic per-user (adding the same
"user_id" filter for each iteration); ensure you update any callers to pass the
user_id when choosing the parameter approach or implement the user-list
iteration and per-user processing path if no request context exists.
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/models/annotation.py (2)

1-6: Group imports by category.

Imports should be organized: standard library, third-party, then local imports. Currently, standard library imports are mixed with third-party imports.

♻️ Reorganize imports
+# Standard library
 from datetime import datetime, timezone
+from enum import Enum
 from typing import Optional, List
+import uuid
+
+# Third-party
 from pydantic import Field
 from beanie import Document, Indexed
-from enum import Enum
-import uuid

As per coding guidelines, imports should be grouped and ordered consistently.


33-39: Annotate indexes with ClassVar for proper Pydantic handling.

The indexes list should be annotated with typing.ClassVar to indicate it's a class attribute rather than an instance field, preventing Pydantic from treating it as a model field.

♻️ Add ClassVar annotation
 from datetime import datetime, timezone
-from typing import Optional, List
+from typing import Optional, List, ClassVar
 from pydantic import Field
 from beanie import Document, Indexed
 from enum import Enum
 import uuid

 class TranscriptAnnotation(Document):
     """Model for transcript annotations/corrections."""
     
     # ... (enum definitions and fields)
     
     class Settings:
         name = "transcript_annotations"
-        indexes = [
+        indexes: ClassVar[List[str]] = [
             "conversation_id",
             "user_id",
             "status"
         ]

Based on static analysis hints.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2cc30b and a6eb773.

📒 Files selected for processing (3)
  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.

Files:

  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT add defensive hasattr() checks. Instead, research and understand input/response or class structure.
Use Black formatter with 100-character line length for Python code.
Use isort for Python import organization.
Use Wyoming protocol (JSONL + binary) for WebSocket communication for structured audio sessions.
Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Conversations must only be created when speech is detected, eliminating noise-only sessions from user interfaces.
Always store audio sessions in audio_chunks collection by audio_uuid. Only create entries in conversations collection when speech is detected.
Use email-based authentication with MongoDB ObjectId user IDs.
Use automatic {objectid_suffix}-{device_name} format for client registration.
All data must be scoped by user_id with efficient permission checking for data isolation.
JWT tokens are required for all endpoints and WebSocket connections.
Support pluggable memory providers (Chronicle native with individual facts or OpenMemory MCP delegation).

Files:

  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
**/*.{py,sh,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Use uv run python3 instead of python3 for running Python commands, as uv is the package manager for all Python projects.

Files:

  • backends/advanced/src/advanced_omi_backend/cron.py
  • backends/advanced/src/advanced_omi_backend/models/annotation.py
  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧠 Learnings (2)
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.{py,ts,tsx,js,jsx} : ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/cron.py
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.py : Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/cron.py (2)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
  • surface_error_suggestions (13-63)
  • finetune_hallucination_model (65-98)
backends/advanced/upload_files.py (1)
  • format (32-36)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
  • Conversation (17-314)
  • active_transcript (166-174)
  • segments (197-199)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
  • TranscriptAnnotation (8-39)
  • AnnotationStatus (11-14)
  • AnnotationSource (16-18)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/cron.py

61-61: Unused function argument: signum

(ARG001)


61-61: Unused function argument: frame

(ARG001)

backends/advanced/src/advanced_omi_backend/models/annotation.py

35-39: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py

35-35: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


36-36: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

🔇 Additional comments (7)
backends/advanced/src/advanced_omi_backend/models/annotation.py (2)

27-31: LGTM! Good use of timezone-aware timestamps.

The default values are well-chosen:

  • ACCEPTED status for user edits makes sense and is clearly documented
  • Using datetime.now(timezone.utc) follows best practices for timezone-aware timestamps

21-25: Field types are consistent with related models.

Verification confirms that conversation_id and user_id in the annotation model match the types used in the Conversation model and throughout the codebase. Both fields correctly use Indexed(str), maintaining consistency with other models (conversation.py and audio_file.py).

backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)

1-11: LGTM! Imports are properly organized.

Imports follow the correct order: standard library followed by local imports.


90-96: Mock implementation is clearly documented.

The mock training process using asyncio.sleep(2) is appropriate for this placeholder implementation and well-documented with comments indicating this is temporary.

backends/advanced/src/advanced_omi_backend/cron.py (3)

1-27: LGTM! Good configuration structure.

The interval configuration with DEV_MODE override is well-designed for both production and testing scenarios.


34-35: Jobs will execute immediately on startup.

Initializing last_suggestion_run and last_training_run to datetime.min means both jobs will run immediately on the first scheduler check (line 41-42 and 50-51), as the time difference will always exceed the configured intervals.

If immediate execution on startup is not desired, initialize these to datetime.now(timezone.utc) instead.

⏰ Alternative initialization

If jobs should not run immediately on startup:

     # Initialize DB connection
     await init_db()
     
-    last_suggestion_run = datetime.min
-    last_training_run = datetime.min
+    last_suggestion_run = datetime.now(timezone.utc)
+    last_training_run = datetime.now(timezone.utc)

61-63: Signal handler parameters are correctly defined.

The signum and frame parameters are required by the signal.signal() API, even though they're not used in this handler. This is expected behavior, and the static analysis warning is a false positive.

last_training_run = datetime.min

while True:
now = datetime.utcnow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace deprecated datetime.utcnow() with datetime.now(timezone.utc).

datetime.utcnow() is deprecated in Python 3.12+ in favor of timezone-aware alternatives.

🔧 Use timezone-aware datetime
     while True:
-        now = datetime.utcnow()
+        now = datetime.now(timezone.utc)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
now = datetime.utcnow()
while True:
now = datetime.now(timezone.utc)
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/cron.py at line 38, Replace the
deprecated call to datetime.utcnow() used to set the now variable with a
timezone-aware call datetime.now(timezone.utc); update imports in cron.py to
import timezone (e.g., from datetime import datetime, timezone) or use
datetime.timezone if you keep a module import, and ensure any downstream code
expecting naive datetimes is adjusted to handle an aware datetime.

Comment on lines +40 to +57
# Check Suggestions Job
if (now - last_suggestion_run).total_seconds() >= SUGGESTION_INTERVAL:
logger.info("Running scheduled job: surface_error_suggestions")
try:
await surface_error_suggestions()
last_suggestion_run = now
except Exception as e:
logger.error(f"Error in surface_error_suggestions: {e}", exc_info=True)

# Check Training Job
if (now - last_training_run).total_seconds() >= TRAINING_INTERVAL:
logger.info("Running scheduled job: finetune_hallucination_model")
try:
await finetune_hallucination_model()
last_training_run = now
except Exception as e:
logger.error(f"Error in finetune_hallucination_model: {e}", exc_info=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Jobs lack user_id scoping required by coding guidelines.

The surface_error_suggestions() and finetune_hallucination_model() jobs are invoked without user_id parameters, but per the coding guidelines, all data operations must be scoped by user_id. This creates an architectural inconsistency with the worker implementations.

Consider one of these approaches:

  1. Per-user execution: Iterate over active users and invoke jobs for each user
  2. Batch processing: Refactor jobs to process all users in batches while maintaining proper data isolation
  3. Global processing: If intentionally global, document the architectural decision and ensure it aligns with system requirements

Would you like me to generate a solution that iterates over users and invokes these jobs per-user?

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/cron.py around lines 40 - 57, The
scheduled jobs call surface_error_suggestions() and
finetune_hallucination_model() without scoping to a user_id; update the loop to
iterate over active users (e.g., fetch active user IDs) and invoke await
surface_error_suggestions(user_id) and await
finetune_hallucination_model(user_id) for each user, updating
last_suggestion_run and last_training_run after the per-user batch completes; if
the job functions don't accept user_id, add a user_id parameter (with
appropriate defaults or validation) and ensure exceptions are caught per-user
(log user_id on error) so one failing user doesn't stop the whole batch.

USER = "user"
MODEL_SUGGESTION = "model_suggestion"

id: str = Field(default_factory=lambda: str(uuid.uuid4()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent ID type: use MongoDB ObjectId instead of UUID string.

The id field uses a UUID string, which is inconsistent with other models in the codebase. The User model (see backends/advanced/src/advanced_omi_backend/models/user.py) uses MongoDB ObjectId for id and provides a user_id property for string representation. This inconsistency could cause:

  • Type mismatches in queries and relationships
  • Non-uniform ID patterns across the codebase
  • Potential issues with Beanie's default behavior
🔍 Proposed fix

Based on the Beanie documentation, the default type of the id field is PydanticObjectId. While you can override it with UUID as shown in the documentation, consider aligning with the existing User model pattern for consistency across the codebase.

+from beanie import Document, Indexed, PydanticObjectId
 
 class TranscriptAnnotation(Document):
     """Model for transcript annotations/corrections."""
     
     class AnnotationStatus(str, Enum):
         PENDING = "pending"
         ACCEPTED = "accepted"
         REJECTED = "rejected"

     class AnnotationSource(str, Enum):
         USER = "user"
         MODEL_SUGGESTION = "model_suggestion"

-    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
+    # id field defaults to PydanticObjectId - MongoDB's ObjectId type
     conversation_id: Indexed(str)
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/models/annotation.py at line 20,
The annotation model's id is defined as a UUID string (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from the User
model's MongoDB ObjectId pattern; change the Annotation model to use
Beanie/PydanticObjectId for the primary id (match the User model's id type) and
add a string accessor property (e.g., annotation_id) if a string representation
is needed, ensuring imports and any references to Annotation.id elsewhere are
updated to expect PydanticObjectId and use the new string property when
necessary.

logger.info("Starting surface_error_suggestions job...")

# Get conversations from the last 24 hours
since = datetime.utcnow() - timedelta(days=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace deprecated datetime.utcnow() with datetime.now(timezone.utc).

datetime.utcnow() is deprecated in Python 3.12+ in favor of timezone-aware alternatives.

🔧 Use timezone-aware datetime
-    since = datetime.utcnow() - timedelta(days=1)
+    since = datetime.now(timezone.utc) - timedelta(days=1)
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py at
line 21, Replace the naive UTC timestamp usage assigned to variable "since"
(currently using datetime.utcnow()) with a timezone-aware timestamp by calling
datetime.now(timezone.utc) and subtracting timedelta(days=1); also ensure
timezone is imported from datetime (or use datetime.timezone) so the expression
becomes timezone-aware throughout the module.

Comment on lines +22 to +24
conversations = await Conversation.find(
{"created_at": {"$gte": since}}
).to_list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Missing user_id scoping violates data isolation requirements.

The query fetches conversations without filtering by user_id, which violates the coding guideline that states: "All data must be scoped by user_id with efficient permission checking for data isolation."

This cron job would process annotations across all users globally, which may not be the intended behavior. If this is meant to run per-user, it should be invoked with a user_id parameter and scope the query accordingly.

💡 Clarify intended behavior

Option 1: If this job should run per-user:

-async def surface_error_suggestions():
+async def surface_error_suggestions(user_id: str):
     """
     Cron job to surface potential errors in transcripts.
     Mocks the behavior of an ML model identifying low-confidence segments.
     """
     logger.info("Starting surface_error_suggestions job...")
     
     # Get conversations from the last 24 hours
     since = datetime.now(timezone.utc) - timedelta(days=1)
     conversations = await Conversation.find(
-        {"created_at": {"$gte": since}}
+        {"created_at": {"$gte": since}, "user_id": user_id}
     ).to_list()

Option 2: If this job is intentionally global (processing all users), add a comment explaining why user_id scoping is not required and ensure this aligns with system architecture decisions.

As per coding guidelines, data must be scoped by user_id.

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 22 - 24, The Conversation.find query in annotation_jobs.py currently omits
user scoping and will retrieve conversations for all users; update the job to
accept a user_id parameter (or obtain the intended user context) and include a
filter like {"user_id": user_id, "created_at": {"$gte": since}} when calling
Conversation.find, or if the job truly must be global add a clear comment near
Conversation.find explaining why user_id scoping is intentionally omitted and
ensure any permission checks are applied; locate this change around the
Conversation.find call and the surrounding job entrypoint to wire the user_id
parameter through.

Comment on lines +40 to +43
existing = await TranscriptAnnotation.find_one({
"conversation_id": conv.conversation_id,
"segment_index": segment_idx
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add user_id to annotation existence check.

The query for existing annotations should also be scoped by user_id for consistency with data isolation requirements.

🔧 Scope query by user_id
             # Check if annotation already exists
             existing = await TranscriptAnnotation.find_one({
                 "conversation_id": conv.conversation_id,
-                "segment_index": segment_idx
+                "segment_index": segment_idx,
+                "user_id": conv.user_id
             })

As per coding guidelines, data must be scoped by user_id.

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 40 - 43, The existence check currently queries
TranscriptAnnotation.find_one by "conversation_id" and "segment_index" only;
update that query to also include the relevant "user_id" field (e.g., add
"user_id": conv.user_id or job.user_id depending on the surrounding scope) so
the lookup is scoped per user, and ensure the variable you reference
(conv.user_id or job.user_id) is available in the function before using it.

Comment on lines +58 to +59
if count >= 6: # Surface 5-6 places as requested
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix comment/logic mismatch.

The comment states "Surface 5-6 places" but the logic checks count >= 6, which surfaces up to 6 suggestions (not 5-6). Either update the comment to "up to 6" or change the condition to match "5-6".

📝 Align comment with logic

Option 1: Update comment to match logic:

-                if count >= 6: # Surface 5-6 places as requested
+                if count >= 6: # Surface up to 6 suggestions

Option 2: Update logic to match comment (surface between 5-6):

-                if count >= 6: # Surface 5-6 places as requested
+                if count >= 5: # Surface 5-6 places as requested
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if count >= 6: # Surface 5-6 places as requested
break
if count >= 6: # Surface up to 6 suggestions
break
Suggested change
if count >= 6: # Surface 5-6 places as requested
break
if count >= 5: # Surface 5-6 places as requested
break
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 58 - 59, The comment "Surface 5-6 places as requested" is inconsistent
with the condition `if count >= 6` (which actually limits results to up to 6);
update the comment near the `if count >= 6` check to "Surface up to 6 places"
(or "up to 6 suggestions"), or if you truly want to enforce 5–6 results, change
the condition logic around the `count` variable (e.g., break when `count >= 5`
and ensure counting semantics produce 5–6 items) so the comment and the `if
count >= 6` behavior match.

Comment on lines +72 to +74
corrections = await TranscriptAnnotation.find({
"status": TranscriptAnnotation.AnnotationStatus.ACCEPTED.value
}).to_list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find the file and examine the function context (lines 65-98)
echo "=== Examining the annotation_jobs.py file and function context ==="
cat -n backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py | sed -n '65,98p'

echo -e "\n=== Searching for TranscriptAnnotation model definition ==="
fd "\.py$" | xargs grep -l "class TranscriptAnnotation" | head -5

Repository: SimpleOpenSoftware/chronicle

Length of output: 1597


🏁 Script executed:

#!/bin/bash

# Search for TranscriptAnnotation model and enum field definitions
echo "=== Finding TranscriptAnnotation model ==="
rg -n "class TranscriptAnnotation" -A 30 --type=py

echo -e "\n=== Searching for AnnotationStatus enum definition ==="
rg -n "class AnnotationStatus" -A 10 --type=py | head -40

Repository: SimpleOpenSoftware/chronicle

Length of output: 4356


🏁 Script executed:

#!/bin/bash

# Search for how enums are queried in Beanie models throughout codebase
echo "=== Enum query patterns in codebase ==="
rg -n "\.find\(\{" -A 5 --type=py | grep -A 5 "\.value\|Enum" | head -50

echo -e "\n=== Check for other status field queries ==="
rg -n '"status":\s*' --type=py | grep -E "(\.value|Enum|Status)" | head -20

Repository: SimpleOpenSoftware/chronicle

Length of output: 2356


🏁 Script executed:

#!/bin/bash

# Check if there are user_id fields in similar queries
echo "=== Searching for user_id usage in find queries ==="
rg -n '\.find\(\{' -A 3 --type=py | grep -B 3 "user_id" | head -40

echo -e "\n=== Check TranscriptAnnotation schema for user_id field ==="
rg -n "TranscriptAnnotation" -B 5 -A 20 --type=py | grep -E "(user_id|class TranscriptAnnotation)" | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 2785


Add user_id scoping to comply with data isolation requirements.

The query retrieves annotations for all users without filtering by user_id. This violates the data isolation guideline. Since this is a cron job without request context, the function must either accept a user_id parameter and filter accordingly, or iterate over all users and process per-user.

The enum query syntax using .value is correct and consistent with the codebase pattern.

Add user_id parameter and filter
-async def finetune_hallucination_model():
+async def finetune_hallucination_model(user_id: str):
     """
     Cron job to finetune a LORA model on corrections.
     """
     logger.info("Starting finetune_hallucination_model job...")
     
     # Gather accepted corrections
     corrections = await TranscriptAnnotation.find({
         "status": TranscriptAnnotation.AnnotationStatus.ACCEPTED.value,
+        "user_id": user_id
     }).to_list()
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 72 - 74, The query fetching TranscriptAnnotation entries currently returns
accepted annotations across all users; update the cron job that contains the
corrections = await TranscriptAnnotation.find(...) call to scope by user_id:
either add a user_id parameter to that function and include "user_id": user_id
in the find filter alongside
TranscriptAnnotation.AnnotationStatus.ACCEPTED.value, or modify the cron to
iterate over all user ids and run the existing logic per-user (adding the same
"user_id" filter for each iteration); ensure you update any callers to pass the
user_id when choosing the parameter approach or implement the user-list
iteration and per-user processing path if no request context exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant