-
Notifications
You must be signed in to change notification settings - Fork 22
feat: Add annotation feature and cron jobs for transcript corrections #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
- Introduced a new `TranscriptAnnotation` model for managing transcript corrections. - Added `annotation_routes` for creating and retrieving annotations via API. - Implemented cron jobs to surface potential errors in transcripts and finetune a model based on accepted corrections. - Updated Docker Compose to include a cron service for running scheduled tasks. - Enhanced the web UI to support displaying and managing annotations, including accepting and rejecting suggestions. - Added tests for annotation model and integration flow to ensure functionality and reliability.
📝 WalkthroughWalkthroughAdds transcript annotation support: new Beanie model, API routes to create/fetch annotations, UI inline-editing and suggestion controls, background cron + worker jobs to surface suggestions and fine-tune, memory/queue integration (extra title/summary job), docker cron service, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as API\n(annotation_routes)
participant Conv as Conversation\n(MongoDB)
participant DB as TranscriptAnnotation\n(MongoDB)
participant Queue as RQ\n(memory_jobs)
Client->>API: POST /api/annotations/ (annotation payload)
API->>Conv: find_one(conversation_id) (validate ownership)
Conv-->>API: conversation with active transcript
API-->>API: build TranscriptAnnotation instance
API->>DB: insert transcript annotation
DB-->>API: inserted
API->>Conv: update transcript segment text and save()
Conv-->>API: saved
API->>Queue: enqueue memory_processing job (+ title/summary job)
Queue-->>API: job IDs
API-->>Client: 201 Created + AnnotationResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/cron.py:
- Around line 1-17: The imports for surface_error_suggestions,
finetune_hallucination_model and init_db are below the logging configuration;
move those three import statements (from
advanced_omi_backend.workers.annotation_jobs and advanced_omi_backend.database)
up to the top of the file with the other imports so all imports appear before
any runtime code (including logging.basicConfig) and keep the existing logger =
logging.getLogger("cron_scheduler") after configuration.
- Around line 37-59: The async finetune_hallucination_model function in
annotation_jobs.py uses blocking time.sleep(2), which freezes the event loop
when awaited by the scheduler; change that call to await asyncio.sleep(2),
ensure asyncio is imported in annotation_jobs.py, and keep the function
signature async def finetune_hallucination_model(...) unchanged so the scheduler
loop in cron.py can await it without blocking.
In @backends/advanced/src/advanced_omi_backend/models/annotation.py:
- Around line 30-31: The created_at and updated_at Field default_factories use
the deprecated datetime.utcnow(); replace them with timezone-aware calls by
importing timezone from datetime and changing the default_factory to a lambda
that returns datetime.now(timezone.utc) for both created_at and updated_at
(locate the fields in annotation.py and update the Field(default_factory=...)
for created_at and updated_at accordingly).
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py:
- Line 57: The inline comment next to the condition in the annotation job (the
line using "if count >= 6:") is inconsistent: either update the comment to read
"Surface 6 places" to match the existing condition, or change the condition to
"if count > 5:" to preserve the intended "5-6" behavior; locate the check using
the symbol count and make the corresponding edit so the comment and code reflect
the same semantics.
- Around line 95-96: Move the mid-function import and blocking sleep out of the
async flow: remove the in-function "import time" and "time.sleep(2)", add the
required imports at the top of the file (replace the time import with "import
asyncio"), and in the async function replace "time.sleep(2)" with an awaited
non-blocking call "await asyncio.sleep(2)". Ensure the function remains async so
the await is valid and no blocking calls remain.
In @backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py:
- Around line 286-304: The except block around default_queue.enqueue for
generate_title_summary_job should not re-raise (remove "raise e") to avoid
making the whole memory job appear to fail; instead log the full traceback with
logger.exception (replace logger.error with logger.exception) so failures are
recorded without propagating, keeping summary_job_id, generate_title_summary_job
and default_queue.enqueue logic intact.
In @backends/advanced/webui/src/pages/Conversations.tsx:
- Around line 448-459: handleRejectSuggestion currently only mutates local state
so rejected annotations reappear on refresh; update it to persist the rejection
by calling the backend (mirror how handleAcceptSuggestion persists acceptances).
Call the same create/update API used by handleAcceptSuggestion (e.g., invoke the
create annotation endpoint with status: 'rejected' or call a new
updateAnnotationStatus/update endpoint if you add one), await the response, then
update setConversations to remove or mark the annotation only after a successful
API response and handle errors (logging or UI feedback) so state remains
consistent with the backend.
- Around line 130-177: The updater passed to setConversations currently performs
side-effectful API calls (annotationsApi.create and
conversationsApi.reprocessMemory) inside the state mutation which risks
unhandled async errors, memory leaks, and mixing concerns; refactor
handleSaveAnnotation to (1) synchronously compute the new conversations array
(using the same logic that updates the segment text) and call setConversations
with that array, (2) after setConversations perform the async work with await
and try/catch around annotationsApi.create and then
conversationsApi.reprocessMemory, updating setProcessingMemories on success and
reverting the optimistic UI on failure, and (3) guard async callbacks against
unmounted components (use an isMounted flag or AbortController) and clear
setEditingSegment only after the async flow completes or is safely cancelled.
🧹 Nitpick comments (8)
backends/advanced/tests/test_annotation_models.py (1)
8-90: Consider using a pytest fixture for Beanie initialization.Currently,
initialize_beanie()is called at the start of each test method, creating a newAsyncMongoMockClientevery time. This approach works but could be optimized.Consider using a pytest async fixture to initialize Beanie once per test and ensure proper cleanup:
♻️ Proposed refactor using pytest fixture
+@pytest.fixture +async def beanie_setup(): + """Setup and teardown Beanie for each test.""" + client = AsyncMongoMockClient() + await init_beanie(database=client.db_name, document_models=[TranscriptAnnotation]) + yield + # Cleanup if needed + class TestAnnotationModel: """Test TranscriptAnnotation Pydantic/Beanie model.""" @pytest.mark.asyncio - async def test_create_annotation_defaults(self): + async def test_create_annotation_defaults(self, beanie_setup): """Test creating an annotation with default values.""" - await initialize_beanie() annotation = TranscriptAnnotation(Apply similar changes to the other test methods.
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
20-20: Consider using Beanie's native ID generation.While the current UUID-based ID works, Beanie typically uses MongoDB ObjectId by default. Using a string UUID as the primary
idfield may cause confusion since Beanie documents also have an internal_idfield.Consider either:
- Letting Beanie manage the
_idfield automatically (remove the customidfield)- Renaming to
annotation_idif you need a secondary UUID identifier- Using
PydanticObjectIdfrom Beanie if you want to keep the custom ID patternThis helps maintain consistency with other Beanie models in the codebase.
backends/advanced/tests/integration/test_annotation_flow.py (2)
28-31: Unused fixture can be removed.The
clientfixture is defined but never used in the test. The test creates its ownAsyncClientwithASGITransporton lines 82-83 instead.🧹 Remove unused fixture
-@pytest.fixture -async def client(app): - async with AsyncClient(app=app, base_url="http://test") as c: - yield c -
56-57: Simplify the AsyncMock pattern forfind_one.The current pattern
AsyncMock(return_value=mock_conv)()works but is unnecessarily convoluted. Using a direct coroutine or simpler async return is clearer.♻️ Simplified mock pattern
- # Configure find_one to return our mock conversation (awaitable) - MockConversation.find_one.return_value = AsyncMock(return_value=mock_conv)() # Calling AsyncMock returns an awaitable coroutine + # Configure find_one to return our mock conversation (awaitable) + async def mock_find_one(*args, **kwargs): + return mock_conv + MockConversation.find_one = mock_find_oneAlternatively, if you want to keep using
AsyncMock:MockConversation.find_one = AsyncMock(return_value=mock_conv)backends/advanced/src/advanced_omi_backend/cron.py (1)
65-72: Graceful shutdown doesn't cancel running async tasks.When
SIGTERMis received, the handler callssys.exit(0)which doesn't gracefully cancel any in-progress async jobs. Consider using anasyncio.Eventfor cooperative shutdown.♻️ Graceful async shutdown pattern
+shutdown_event = asyncio.Event() + async def run_scheduler(): logger.info("Starting Cron Scheduler...") # Initialize DB connection await init_db() last_suggestion_run = datetime.min last_training_run = datetime.min - while True: + while not shutdown_event.is_set(): now = datetime.utcnow() # ... job checks ... - await asyncio.sleep(60) + try: + await asyncio.wait_for(shutdown_event.wait(), timeout=60) + except asyncio.TimeoutError: + pass # Continue loop after timeout + + logger.info("Scheduler shutdown complete") def handle_shutdown(signum, frame): logger.info("Shutting down Cron Scheduler...") - sys.exit(0) + shutdown_event.set()backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (2)
87-95: Potential enum serialization issue in response.
new_annotation.statusis of typeAnnotationStatus(an Enum), butAnnotationResponse.statusis typed asstr. While Pydantic may handle this automatically, explicit conversion ensures consistent behavior.♻️ Explicit enum value conversion
return AnnotationResponse( id=str(new_annotation.id), conversation_id=new_annotation.conversation_id, segment_index=new_annotation.segment_index, original_text=new_annotation.original_text, corrected_text=new_annotation.corrected_text, - status=new_annotation.status, + status=new_annotation.status.value if isinstance(new_annotation.status, TranscriptAnnotation.AnnotationStatus) else new_annotation.status, created_at=new_annotation.created_at )Or update the model to use the enum type:
class AnnotationResponse(BaseModel): # ... - status: str + status: TranscriptAnnotation.AnnotationStatus
60-72: Transcript update is not atomic and may have race conditions.Concurrent annotation requests on the same conversation could lead to lost updates since the pattern reads the full conversation, modifies it in memory, then saves. Consider using MongoDB's atomic update operators or adding optimistic concurrency control.
For a quick fix, you could use a unique constraint or version field to detect conflicts. For production robustness, consider MongoDB's
$setwith array filters for atomic segment updates.backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)
185-188: Consider removing the intermediate variable for simplicity.The
results = response.pointsassignment and subsequent iteration could be simplified by directly iterating overresponse.points, as the original code likely did:for result in response.points: memory = MemoryEntry( ...This reduces unnecessary variable assignments without sacrificing readability, since
response.pointsis already self-documenting.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
backends/advanced/docker-compose.ymlbackends/advanced/src/advanced_omi_backend/app_factory.pybackends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/models/__init__.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/src/advanced_omi_backend/routers/api_router.pybackends/advanced/src/advanced_omi_backend/routers/modules/__init__.pybackends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.pybackends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.pybackends/advanced/src/advanced_omi_backend/workers/memory_jobs.pybackends/advanced/tests/integration/test_annotation_flow.pybackends/advanced/tests/test_annotation_models.pybackends/advanced/webui/src/pages/Conversations.tsxbackends/advanced/webui/src/services/api.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.
Files:
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.pybackends/advanced/tests/test_annotation_models.pybackends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.pybackends/advanced/src/advanced_omi_backend/routers/api_router.pybackends/advanced/src/advanced_omi_backend/app_factory.pybackends/advanced/src/advanced_omi_backend/routers/modules/__init__.pybackends/advanced/webui/src/services/api.tsbackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/tests/integration/test_annotation_flow.pybackends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.pybackends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.pybackends/advanced/webui/src/pages/Conversations.tsxbackends/advanced/src/advanced_omi_backend/models/__init__.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT add defensivehasattr()checks. Instead, research and understand input/response or class structure.
Use Black formatter with 100-character line length for Python code.
Use isort for Python import organization.
Use Wyoming protocol (JSONL + binary) for WebSocket communication for structured audio sessions.
Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Conversations must only be created when speech is detected, eliminating noise-only sessions from user interfaces.
Always store audio sessions inaudio_chunkscollection byaudio_uuid. Only create entries inconversationscollection when speech is detected.
Use email-based authentication with MongoDB ObjectId user IDs.
Use automatic{objectid_suffix}-{device_name}format for client registration.
All data must be scoped by user_id with efficient permission checking for data isolation.
JWT tokens are required for all endpoints and WebSocket connections.
Support pluggable memory providers (Chronicle native with individual facts or OpenMemory MCP delegation).
Files:
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.pybackends/advanced/tests/test_annotation_models.pybackends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.pybackends/advanced/src/advanced_omi_backend/routers/api_router.pybackends/advanced/src/advanced_omi_backend/app_factory.pybackends/advanced/src/advanced_omi_backend/routers/modules/__init__.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/tests/integration/test_annotation_flow.pybackends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.pybackends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.pybackends/advanced/src/advanced_omi_backend/models/__init__.py
**/*.{py,sh,md}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
uv run python3instead ofpython3for running Python commands, as uv is the package manager for all Python projects.
Files:
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.pybackends/advanced/tests/test_annotation_models.pybackends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.pybackends/advanced/src/advanced_omi_backend/routers/api_router.pybackends/advanced/src/advanced_omi_backend/app_factory.pybackends/advanced/src/advanced_omi_backend/routers/modules/__init__.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/tests/integration/test_annotation_flow.pybackends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.pybackends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.pybackends/advanced/src/advanced_omi_backend/models/__init__.py
**/docker-compose*.{yml,yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
**/docker-compose*.{yml,yaml}: Use Docker Compose without--no-cacheby default for faster builds. Only use--no-cachewhen explicitly needed for troubleshooting or when cached layers are causing issues.
Check if src/ is volume mounted. If not, dodocker compose buildso code changes are reflected. Do not simply rundocker compose restartas it will not rebuild the image.
Files:
backends/advanced/docker-compose.yml
🧠 Learnings (1)
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.py : Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Applied to files:
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧬 Code graph analysis (9)
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (1)
generate_title_summary_job(514-637)
backends/advanced/tests/test_annotation_models.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
TranscriptAnnotation(8-39)AnnotationStatus(11-14)AnnotationSource(16-18)
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
TranscriptAnnotation(8-39)
backends/advanced/tests/integration/test_annotation_flow.py (3)
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
create_app(173-215)backends/advanced/src/advanced_omi_backend/models/user.py (1)
user_id(71-73)backends/advanced/src/advanced_omi_backend/models/conversation.py (2)
segments(197-199)active_transcript(166-174)
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (5)
backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
TranscriptAnnotation(8-39)AnnotationStatus(11-14)AnnotationSource(16-18)backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
Conversation(17-314)active_transcript(166-174)segments(197-199)backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
enqueue_memory_processing(257-305)backends/advanced/src/advanced_omi_backend/models/job.py (1)
JobPriority(68-80)backends/advanced/src/advanced_omi_backend/models/user.py (1)
user_id(71-73)
backends/advanced/src/advanced_omi_backend/cron.py (2)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
surface_error_suggestions(12-62)finetune_hallucination_model(64-98)extras/speaker-recognition/src/simple_speaker_recognition/database/__init__.py (1)
init_db(30-33)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
Conversation(17-314)active_transcript(166-174)segments(197-199)backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
TranscriptAnnotation(8-39)AnnotationStatus(11-14)AnnotationSource(16-18)
backends/advanced/webui/src/pages/Conversations.tsx (1)
backends/advanced/webui/src/services/api.ts (3)
queueApi(199-226)conversationsApi(109-124)annotationsApi(150-159)
backends/advanced/src/advanced_omi_backend/models/__init__.py (1)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
TranscriptAnnotation(8-39)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py
302-302: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
303-303: Use raise without specifying exception name
Remove exception name
(TRY201)
backends/advanced/src/advanced_omi_backend/models/annotation.py
35-39: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backends/advanced/tests/integration/test_annotation_flow.py
34-34: Unused function argument: mock_user
(ARG001)
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py
34-34: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
100-100: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
backends/advanced/src/advanced_omi_backend/cron.py
61-61: Unused function argument: signum
(ARG001)
61-61: Unused function argument: frame
(ARG001)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
34-34: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
35-35: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
🔇 Additional comments (18)
backends/advanced/docker-compose.yml (1)
79-98: LGTM! Cron service is properly configured.The cron service configuration follows best practices:
- Uses
uv run pythonas per coding guidelines- Mounts
src/for development (code changes reflected without rebuild per guidelines)- Properly depends on
mongoandrediswith health checks- Includes necessary environment variables and volumes
backends/advanced/src/advanced_omi_backend/models/__init__.py (1)
11-11: LGTM! Clean model export.The
TranscriptAnnotationimport is properly placed at the top of the file after the docstring, following the coding guidelines for import organization.backends/advanced/tests/test_annotation_models.py (1)
1-6: LGTM! Imports are properly organized.All imports are correctly placed at the top of the file and grouped appropriately (standard library, third-party, local imports) as per coding guidelines.
backends/advanced/src/advanced_omi_backend/routers/api_router.py (1)
22-22: LGTM! Annotation router is properly integrated.The
annotation_routeris correctly imported at the top with other routers and registered with appropriate prefix and tags, following the established pattern in this file.Also applies to: 42-42
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
19-20: LGTM! Imports are properly placed.The new imports are correctly positioned at the top of the file with other imports from the same package.
backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py (1)
28-28: LGTM! Clean integration of annotation router.The import and export are properly placed, maintaining consistency with the existing router pattern.
Also applies to: 42-42
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
59-59: LGTM! Proper Beanie model registration.The TranscriptAnnotation model is correctly imported and registered with Beanie during application startup, following the established pattern for other document models.
Also applies to: 63-63
backends/advanced/webui/src/services/api.ts (1)
150-159: LGTM! Well-structured annotation API.The new annotationsApi follows the established patterns in the codebase with properly typed parameters and clear endpoint definitions.
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (1)
12-62: The deduplication logic is correct as designed.Conversations cannot be shared across users. Each
conversation_idis globally unique (marked asunique=Truein the model) and scoped to a singleuser_id. The existing annotation check at lines 39-42 querying byconversation_idandsegment_indexis sufficient for deduplication, sinceconversation_idinherently belongs to only one user. Addinguser_idto the filter would be redundant.backends/advanced/tests/integration/test_annotation_flow.py (2)
33-34: Themock_userparameter is correctly used.Static analysis flagged
mock_useras unused, but it's actually required because theappfixture depends on it. The parameter ensuresmock_useris instantiated beforeapp.
81-108: Test structure and assertions are comprehensive.The test properly verifies:
- API response status and payload
- Database interactions (conversation lookup, annotation creation, insert call)
- Transcript segment update with corrected text
- Memory job enqueue with correct parameters
backends/advanced/src/advanced_omi_backend/cron.py (1)
61-63: Signal handler parameters are required by convention.The static analysis flagged
signumandframeas unused, but these parameters are required by Python's signal handler signature. This is a false positive.backends/advanced/webui/src/pages/Conversations.tsx (2)
93-128: Job polling implementation looks good.The
useEffectproperly:
- Cleans up the interval on unmount
- Only polls when there are active jobs
- Updates state correctly when jobs complete
- Handles errors gracefully
29-37: Annotation type and UI rendering are well-structured.The annotation interface properly types all fields, and the UI clearly displays pending suggestions with Accept/Deny actions. The visual distinction with yellow highlighting effectively draws attention to AI suggestions.
Also applies to: 866-902
backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (3)
31-35:Depends()in argument defaults is idiomatic FastAPI.The static analysis hint (B008) flags this pattern, but it's the standard and recommended way to declare dependencies in FastAPI. This is a false positive.
Also applies to: 97-101
97-118:get_annotationsendpoint correctly filters by user.The endpoint properly scopes results by
user_id, ensuring users can only access their own annotations. The response mapping is clean and consistent with the create endpoint.
74-81: The code is correct.enqueue_memory_processingis a synchronous function (defined at line 257 ofmemory_jobs.pyasdef enqueue_memory_processing(, notasync def). It uses RQ's synchronousmemory_queue.enqueue()method to queue a job, so noawaitis needed or appropriate.Likely an incorrect or invalid review comment.
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)
178-183: The Qdrant client API usage is correct. The project uses qdrant-client 1.16.2, and the code properly callsquery_points()which returns aQueryResponseobject with a valid.pointsattribute containing a list ofScoredPointrecords. The v1.10+ compatibility reference is accurate—v1.10 introduced the unified Query API consolidating older search helpers intoquery_points, making this the appropriate method for async Qdrant queries.No API compatibility issues found. The intermediate variable
results = response.pointson line 185 is a minor refactoring choice that adds marginal clarity.
backends/advanced/src/advanced_omi_backend/models/annotation.py
Outdated
Show resolved
Hide resolved
| ) | ||
| await suggestion.insert() | ||
| count += 1 | ||
| if count >= 6: # Surface 5-6 places as requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment mismatch.
The comment says "Surface 5-6 places" but the code checks >= 6, which would actually surface 6 places. Consider updating the comment to just say "6" or adjusting the condition to > 5 if you want "5-6" variability.
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py at
line 57, The inline comment next to the condition in the annotation job (the
line using "if count >= 6:") is inconsistent: either update the comment to read
"Surface 6 places" to match the existing condition, or change the condition to
"if count > 5:" to preserve the intended "5-6" behavior; locate the check using
the symbol count and make the corresponding edit so the comment and code reflect
the same semantics.
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
Outdated
Show resolved
Hide resolved
|
|
||
| # Also enqueue title/summary generation to ensure summaries reflect any transcript changes | ||
| try: | ||
| # Use a timestamp in job_id to avoid conflicts if re-run frequently | ||
| summary_job_id = f"title_summary_{conversation_id[:8]}_{int(time.time())}" | ||
|
|
||
| default_queue.enqueue( | ||
| generate_title_summary_job, | ||
| conversation_id, | ||
| job_timeout=300, | ||
| result_ttl=JOB_RESULT_TTL, | ||
| job_id=summary_job_id, | ||
| description=f"Generate title and summary for conversation {conversation_id[:8]}", | ||
| ) | ||
| logger.info(f"📥 RQ: Enqueued summary job {summary_job_id} for conversation {conversation_id}") | ||
| except Exception as e: | ||
| logger.error(f"Failed to enqueue summary job: {e}") | ||
| raise e | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix error handling to avoid cascading failure and improve logging.
The current error handling has two issues:
-
Critical: Re-raising the exception (line 303) prevents the memory job from being returned to the caller, making it appear that the entire operation failed even though the memory job was successfully enqueued.
-
Minor: Use
logging.exceptioninstead oflogging.errorfor better stack traces, and use bareraiseinstead ofraise eto preserve the original traceback.
🐛 Proposed fix for error handling
# Also enqueue title/summary generation to ensure summaries reflect any transcript changes
try:
# Use a timestamp in job_id to avoid conflicts if re-run frequently
summary_job_id = f"title_summary_{conversation_id[:8]}_{int(time.time())}"
default_queue.enqueue(
generate_title_summary_job,
conversation_id,
job_timeout=300,
result_ttl=JOB_RESULT_TTL,
job_id=summary_job_id,
description=f"Generate title and summary for conversation {conversation_id[:8]}",
)
logger.info(f"📥 RQ: Enqueued summary job {summary_job_id} for conversation {conversation_id}")
except Exception as e:
- logger.error(f"Failed to enqueue summary job: {e}")
- raise e
+ logger.exception(f"Failed to enqueue summary job for conversation {conversation_id}")
+ # Don't re-raise - memory job was already enqueued successfully
+ # Summary generation is supplementary and should not block the main memory job
return job🧰 Tools
🪛 Ruff (0.14.10)
302-302: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
303-303: Use raise without specifying exception name
Remove exception name
(TRY201)
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py around
lines 286 - 304, The except block around default_queue.enqueue for
generate_title_summary_job should not re-raise (remove "raise e") to avoid
making the whole memory job appear to fail; instead log the full traceback with
logger.exception (replace logger.error with logger.exception) so failures are
recorded without propagating, keeping summary_job_id, generate_title_summary_job
and default_queue.enqueue logic intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @backends/advanced/webui/src/pages/Conversations.tsx:
- Around line 491-502: handleRejectSuggestion currently only mutates local state
and is unnecessarily declared async; make rejection persistent by calling the
backend to mark the annotation as rejected (e.g., call a new or existing API
like markAnnotationRejected(conversationId, annotationId) and handle errors) and
then update state via setConversations to remove the annotation (using
conversation_id and annotation.id), or if backend work is not possible yet,
persist rejected IDs in localStorage (store annotationId keyed by conversationId
and filter them out wherever conversations are loaded/rendered). Also remove the
unnecessary async keyword from handleRejectSuggestion if you implement the
localStorage synchronous approach.
- Around line 1063-1069: The progress bar width logic uses
jobProgress[processingMemories[conversation.conversation_id]] but only treats
'finished' as the terminal successful state; update the width condition in the
progress bar div (the inline style computing width) to also treat 'completed' as
a terminal/success state (i.e., return '100%' when the status is 'finished' OR
'completed'), and consider handling any other terminal statuses your backend may
return the same way so the UI reflects completed jobs correctly.
- Around line 136-171: jobProgress entries are never removed, causing unbounded
growth; when a job reaches a terminal status inside the useEffect polling loop
(the block that checks if job.status is in
['finished','completed','failed','stopped','canceled']), after you update
processingMemories (the setProcessingMemories call), also call setJobProgress to
remove the jobId key (e.g. setJobProgress(prev => { const next = { ...prev };
delete next[jobId]; return next; })); ensure this runs for the same terminal
branch that refreshes the conversation so jobProgress stays in sync with
processingMemories.
- Around line 173-220: handleSaveAnnotation currently performs
annotationsApi.create and conversationsApi.reprocessMemory inside the
setConversations updater (and then sets editingSegment to null), which causes
side effects in a state updater, captures editingSegment after it’s cleared, and
leaves errors unhandled by the outer try/catch; fix by snapshotting the
editingSegment into a local const (e.g. const segment = editingSegment) at the
top of handleSaveAnnotation, build the new conversations array (or call
setConversations with a pure updater that only returns updated state) without
performing API calls, then after setConversations run the async work outside the
state updater using await/try-catch with annotationsApi.create({
conversation_id: segment.conversationId, segment_index: segment.segmentIndex,
original_text, corrected_text, status: 'accepted' }) and then await
conversationsApi.reprocessMemory(segment.conversationId) to
setProcessingMemories with res.data.job_id; only then clear editingSegment
(setEditingSegment(null)) and handle errors in the surrounding try/catch to
revert UI or setError as needed.
🧹 Nitpick comments (2)
backends/advanced/webui/src/pages/Conversations.tsx (2)
78-95: Potential double-save on Enter key.When Enter is pressed,
onSave()is called (line 81), and then the blur event may also fireonSave()(line 95) before the component unmounts. While the guard inhandleSaveAnnotation(if (!editingSegment) return) should prevent duplicate API calls, consider either:
- Removing
onBlur={onSave}and relying only on explicit save/cancel, or- Adding a local
isSavingref to prevent the double-call at the component level.💡 Optional: Use a ref to prevent double-save
const AutoResizingTextarea = ({ value, onChange, onSave, onCancel }: AutoResizingTextareaProps) => { const textareaRef = useRef<HTMLTextAreaElement>(null) + const isSavingRef = useRef(false) useEffect(() => { if (textareaRef.current) { textareaRef.current.style.height = 'auto' textareaRef.current.style.height = textareaRef.current.scrollHeight + 'px' } }, [value]) + const handleSave = () => { + if (isSavingRef.current) return + isSavingRef.current = true + onSave() + } + const handleKeyDown = (e: React.KeyboardEvent) => { if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault() - onSave() + handleSave() } if (e.key === 'Escape') { onCancel() } } return ( <div className="relative w-full ml-1"> <textarea ref={textareaRef} value={value} onChange={onChange} onKeyDown={handleKeyDown} - onBlur={onSave} + onBlur={handleSave}
430-452: Annotations fetch failure blocks transcript loading.If
annotationsApi.getByConversationIdfails, the entirePromise.allrejects, and the transcript won't be displayed. Consider fetching annotations separately so a failure doesn't block the primary functionality.💡 Proposed fix: Handle annotations fetch independently
try { - const [convResponse, annotationsResponse] = await Promise.all([ - conversationsApi.getById(conversation.conversation_id), - annotationsApi.getByConversationId(conversation.conversation_id) - ]) + const convResponse = await conversationsApi.getById(conversation.conversation_id) + + // Fetch annotations separately - don't block transcript on failure + let annotations: any[] = [] + try { + const annotationsResponse = await annotationsApi.getByConversationId(conversation.conversation_id) + annotations = annotationsResponse.data || [] + } catch (annotationsErr) { + console.warn('Failed to fetch annotations:', annotationsErr) + } if (convResponse.status === 200 && convResponse.data.conversation) { // Update the conversation in state with full data and annotations setConversations(prev => prev.map(c => c.conversation_id === conversationId ? { ...c, ...convResponse.data.conversation, - annotations: annotationsResponse.data || [] + annotations } : c ))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backends/advanced/webui/src/pages/Conversations.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.
Files:
backends/advanced/webui/src/pages/Conversations.tsx
🧬 Code graph analysis (1)
backends/advanced/webui/src/pages/Conversations.tsx (1)
backends/advanced/webui/src/services/api.ts (3)
queueApi(199-226)conversationsApi(109-124)annotationsApi(150-159)
🔇 Additional comments (4)
backends/advanced/webui/src/pages/Conversations.tsx (4)
1-5: LGTM!Imports are properly organized at the top of the file with standard library (React), third-party (lucide-react), and local imports grouped appropriately.
908-945: LGTM!The pending suggestions UI is well-structured with clear visual differentiation between original and corrected text. The Accept/Deny buttons are appropriately labeled.
1015-1035: LGTM!The inline editing implementation is clean. The transition between view and edit modes is handled well, with appropriate keyboard shortcuts (Enter to save, Escape to cancel). Hiding the play button during edit prevents conflicting interactions.
29-37: LGTM!The
annotationsinterface extension is well-typed with appropriate fields for the annotation feature.
| useEffect(() => { | ||
| // Poll for job status | ||
| const interval = setInterval(async () => { | ||
| const activeJobs = Object.entries(processingMemories) | ||
| if (activeJobs.length === 0) return | ||
|
|
||
| for (const [convId, jobId] of activeJobs) { | ||
| try { | ||
| const response = await queueApi.getJob(jobId) | ||
| const job = response.data | ||
| setJobProgress(prev => ({ ...prev, [jobId]: job.status })) | ||
|
|
||
| if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) { | ||
| // Job done, remove from tracking and refresh conversation | ||
| setProcessingMemories(prev => { | ||
| const newState = { ...prev } | ||
| delete newState[convId] | ||
| return newState | ||
| }) | ||
|
|
||
| // Refresh this conversation to show new memories | ||
| const convResponse = await conversationsApi.getById(convId) | ||
| if (convResponse.status === 200 && convResponse.data.conversation) { | ||
| setConversations(prev => prev.map(c => | ||
| c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c | ||
| )) | ||
| } | ||
| } | ||
| } catch (err) { | ||
| console.error(`Failed to poll job ${jobId}:`, err) | ||
| } | ||
| } | ||
| }, 2000) | ||
|
|
||
| return () => clearInterval(interval) | ||
| }, [processingMemories]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak: jobProgress entries are never cleaned up.
When a job completes, it's removed from processingMemories (line 150-154), but the corresponding entry in jobProgress is never removed. Over time, this state object will grow unboundedly.
🧹 Proposed fix: Clean up jobProgress when job completes
if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) {
// Job done, remove from tracking and refresh conversation
setProcessingMemories(prev => {
const newState = { ...prev }
delete newState[convId]
return newState
})
+ // Clean up job progress entry
+ setJobProgress(prev => {
+ const newState = { ...prev }
+ delete newState[jobId]
+ return newState
+ })
// Refresh this conversation to show new memories📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| // Poll for job status | |
| const interval = setInterval(async () => { | |
| const activeJobs = Object.entries(processingMemories) | |
| if (activeJobs.length === 0) return | |
| for (const [convId, jobId] of activeJobs) { | |
| try { | |
| const response = await queueApi.getJob(jobId) | |
| const job = response.data | |
| setJobProgress(prev => ({ ...prev, [jobId]: job.status })) | |
| if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) { | |
| // Job done, remove from tracking and refresh conversation | |
| setProcessingMemories(prev => { | |
| const newState = { ...prev } | |
| delete newState[convId] | |
| return newState | |
| }) | |
| // Refresh this conversation to show new memories | |
| const convResponse = await conversationsApi.getById(convId) | |
| if (convResponse.status === 200 && convResponse.data.conversation) { | |
| setConversations(prev => prev.map(c => | |
| c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c | |
| )) | |
| } | |
| } | |
| } catch (err) { | |
| console.error(`Failed to poll job ${jobId}:`, err) | |
| } | |
| } | |
| }, 2000) | |
| return () => clearInterval(interval) | |
| }, [processingMemories]) | |
| useEffect(() => { | |
| // Poll for job status | |
| const interval = setInterval(async () => { | |
| const activeJobs = Object.entries(processingMemories) | |
| if (activeJobs.length === 0) return | |
| for (const [convId, jobId] of activeJobs) { | |
| try { | |
| const response = await queueApi.getJob(jobId) | |
| const job = response.data | |
| setJobProgress(prev => ({ ...prev, [jobId]: job.status })) | |
| if (['finished', 'completed', 'failed', 'stopped', 'canceled'].includes(job.status)) { | |
| // Job done, remove from tracking and refresh conversation | |
| setProcessingMemories(prev => { | |
| const newState = { ...prev } | |
| delete newState[convId] | |
| return newState | |
| }) | |
| // Clean up job progress entry | |
| setJobProgress(prev => { | |
| const newState = { ...prev } | |
| delete newState[jobId] | |
| return newState | |
| }) | |
| // Refresh this conversation to show new memories | |
| const convResponse = await conversationsApi.getById(convId) | |
| if (convResponse.status === 200 && convResponse.data.conversation) { | |
| setConversations(prev => prev.map(c => | |
| c.conversation_id === convId ? { ...c, ...convResponse.data.conversation } : c | |
| )) | |
| } | |
| } | |
| } catch (err) { | |
| console.error(`Failed to poll job ${jobId}:`, err) | |
| } | |
| } | |
| }, 2000) | |
| return () => clearInterval(interval) | |
| }, [processingMemories]) |
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 136 - 171,
jobProgress entries are never removed, causing unbounded growth; when a job
reaches a terminal status inside the useEffect polling loop (the block that
checks if job.status is in
['finished','completed','failed','stopped','canceled']), after you update
processingMemories (the setProcessingMemories call), also call setJobProgress to
remove the jobId key (e.g. setJobProgress(prev => { const next = { ...prev };
delete next[jobId]; return next; })); ensure this runs for the same terminal
branch that refreshes the conversation so jobProgress stays in sync with
processingMemories.
| const handleSaveAnnotation = async () => { | ||
| if (!editingSegment) return | ||
|
|
||
| try { | ||
| // Optimistically update UI | ||
| setConversations(prev => prev.map(c => { | ||
| if (c.conversation_id === editingSegment.conversationId && c.segments) { | ||
| const newSegments = [...c.segments] | ||
| if (newSegments[editingSegment.segmentIndex]) { | ||
| // Store original text in case we need to revert (not implemented here for brevity) | ||
| const originalText = newSegments[editingSegment.segmentIndex].text | ||
| newSegments[editingSegment.segmentIndex] = { | ||
| ...newSegments[editingSegment.segmentIndex], | ||
| text: editingSegment.text | ||
| } | ||
|
|
||
| // Call API in background | ||
| annotationsApi.create({ | ||
| conversation_id: editingSegment.conversationId, | ||
| segment_index: editingSegment.segmentIndex, | ||
| original_text: originalText, | ||
| corrected_text: editingSegment.text, | ||
| status: 'accepted' | ||
| }).then(() => { | ||
| // Trigger memory reprocessing explicitly to get the job ID for the UI | ||
| conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => { | ||
| setProcessingMemories(prev => ({ | ||
| ...prev, | ||
| [editingSegment.conversationId]: res.data.job_id | ||
| })) | ||
| }) | ||
| }).catch(err => { | ||
| console.error('Failed to save annotation:', err) | ||
| setError('Failed to save correction. Please try again.') | ||
| // Revert UI change would go here | ||
| }) | ||
| } | ||
| return { ...c, segments: newSegments } | ||
| } | ||
| return c | ||
| })) | ||
|
|
||
| setEditingSegment(null) | ||
| } catch (err: any) { | ||
| console.error('Error saving annotation:', err) | ||
| setError('Failed to save correction') | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anti-pattern: API call inside setState callback.
The annotationsApi.create() call at line 190 is inside the setConversations callback. This is problematic because:
setStatecallbacks should be pure functions without side effects.- React may call the callback multiple times in concurrent mode.
- The closure captures
editingSegmentwhich is set tonullat line 215 before the async operations complete.
Additionally, the outer try-catch (lines 216-219) won't catch errors from the async .then()/.catch() chain.
🛠️ Suggested refactor: Move API call outside setState
const handleSaveAnnotation = async () => {
if (!editingSegment) return
+
+ const { conversationId, segmentIndex, text } = editingSegment
+
+ // Find original text before updating state
+ const conversation = conversations.find(c => c.conversation_id === conversationId)
+ const originalText = conversation?.segments?.[segmentIndex]?.text || ''
try {
// Optimistically update UI
setConversations(prev => prev.map(c => {
- if (c.conversation_id === editingSegment.conversationId && c.segments) {
+ if (c.conversation_id === conversationId && c.segments) {
const newSegments = [...c.segments]
- if (newSegments[editingSegment.segmentIndex]) {
- // Store original text in case we need to revert (not implemented here for brevity)
- const originalText = newSegments[editingSegment.segmentIndex].text
- newSegments[editingSegment.segmentIndex] = {
- ...newSegments[editingSegment.segmentIndex],
- text: editingSegment.text
+ if (newSegments[segmentIndex]) {
+ newSegments[segmentIndex] = {
+ ...newSegments[segmentIndex],
+ text: text
}
-
- // Call API in background
- annotationsApi.create({
- conversation_id: editingSegment.conversationId,
- segment_index: editingSegment.segmentIndex,
- original_text: originalText,
- corrected_text: editingSegment.text,
- status: 'accepted'
- }).then(() => {
- // Trigger memory reprocessing explicitly to get the job ID for the UI
- conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => {
- setProcessingMemories(prev => ({
- ...prev,
- [editingSegment.conversationId]: res.data.job_id
- }))
- })
- }).catch(err => {
- console.error('Failed to save annotation:', err)
- setError('Failed to save correction. Please try again.')
- // Revert UI change would go here
- })
}
return { ...c, segments: newSegments }
}
return c
}))
setEditingSegment(null)
- } catch (err: any) {
- console.error('Error saving annotation:', err)
- setError('Failed to save correction')
- }
+
+ // Call API after state update
+ await annotationsApi.create({
+ conversation_id: conversationId,
+ segment_index: segmentIndex,
+ original_text: originalText,
+ corrected_text: text,
+ status: 'accepted'
+ })
+
+ // Trigger memory reprocessing
+ const res = await conversationsApi.reprocessMemory(conversationId)
+ setProcessingMemories(prev => ({
+ ...prev,
+ [conversationId]: res.data.job_id
+ }))
+ } catch (err: any) {
+ console.error('Failed to save annotation:', err)
+ setError('Failed to save correction. Please try again.')
+ // TODO: Revert optimistic UI update on failure
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const handleSaveAnnotation = async () => { | |
| if (!editingSegment) return | |
| try { | |
| // Optimistically update UI | |
| setConversations(prev => prev.map(c => { | |
| if (c.conversation_id === editingSegment.conversationId && c.segments) { | |
| const newSegments = [...c.segments] | |
| if (newSegments[editingSegment.segmentIndex]) { | |
| // Store original text in case we need to revert (not implemented here for brevity) | |
| const originalText = newSegments[editingSegment.segmentIndex].text | |
| newSegments[editingSegment.segmentIndex] = { | |
| ...newSegments[editingSegment.segmentIndex], | |
| text: editingSegment.text | |
| } | |
| // Call API in background | |
| annotationsApi.create({ | |
| conversation_id: editingSegment.conversationId, | |
| segment_index: editingSegment.segmentIndex, | |
| original_text: originalText, | |
| corrected_text: editingSegment.text, | |
| status: 'accepted' | |
| }).then(() => { | |
| // Trigger memory reprocessing explicitly to get the job ID for the UI | |
| conversationsApi.reprocessMemory(editingSegment.conversationId).then(res => { | |
| setProcessingMemories(prev => ({ | |
| ...prev, | |
| [editingSegment.conversationId]: res.data.job_id | |
| })) | |
| }) | |
| }).catch(err => { | |
| console.error('Failed to save annotation:', err) | |
| setError('Failed to save correction. Please try again.') | |
| // Revert UI change would go here | |
| }) | |
| } | |
| return { ...c, segments: newSegments } | |
| } | |
| return c | |
| })) | |
| setEditingSegment(null) | |
| } catch (err: any) { | |
| console.error('Error saving annotation:', err) | |
| setError('Failed to save correction') | |
| } | |
| } | |
| const handleSaveAnnotation = async () => { | |
| if (!editingSegment) return | |
| const { conversationId, segmentIndex, text } = editingSegment | |
| // Find original text before updating state | |
| const conversation = conversations.find(c => c.conversation_id === conversationId) | |
| const originalText = conversation?.segments?.[segmentIndex]?.text || '' | |
| try { | |
| // Optimistically update UI | |
| setConversations(prev => prev.map(c => { | |
| if (c.conversation_id === conversationId && c.segments) { | |
| const newSegments = [...c.segments] | |
| if (newSegments[segmentIndex]) { | |
| newSegments[segmentIndex] = { | |
| ...newSegments[segmentIndex], | |
| text: text | |
| } | |
| } | |
| return { ...c, segments: newSegments } | |
| } | |
| return c | |
| })) | |
| setEditingSegment(null) | |
| // Call API after state update | |
| await annotationsApi.create({ | |
| conversation_id: conversationId, | |
| segment_index: segmentIndex, | |
| original_text: originalText, | |
| corrected_text: text, | |
| status: 'accepted' | |
| }) | |
| // Trigger memory reprocessing | |
| const res = await conversationsApi.reprocessMemory(conversationId) | |
| setProcessingMemories(prev => ({ | |
| ...prev, | |
| [conversationId]: res.data.job_id | |
| })) | |
| } catch (err: any) { | |
| console.error('Failed to save annotation:', err) | |
| setError('Failed to save correction. Please try again.') | |
| // TODO: Revert optimistic UI update on failure | |
| } | |
| } |
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 173 - 220,
handleSaveAnnotation currently performs annotationsApi.create and
conversationsApi.reprocessMemory inside the setConversations updater (and then
sets editingSegment to null), which causes side effects in a state updater,
captures editingSegment after it’s cleared, and leaves errors unhandled by the
outer try/catch; fix by snapshotting the editingSegment into a local const (e.g.
const segment = editingSegment) at the top of handleSaveAnnotation, build the
new conversations array (or call setConversations with a pure updater that only
returns updated state) without performing API calls, then after setConversations
run the async work outside the state updater using await/try-catch with
annotationsApi.create({ conversation_id: segment.conversationId, segment_index:
segment.segmentIndex, original_text, corrected_text, status: 'accepted' }) and
then await conversationsApi.reprocessMemory(segment.conversationId) to
setProcessingMemories with res.data.job_id; only then clear editingSegment
(setEditingSegment(null)) and handle errors in the surrounding try/catch to
revert UI or setError as needed.
| const handleAcceptSuggestion = async (conversationId: string, annotation: any) => { | ||
| try { | ||
| // 1. Update annotation status to accepted | ||
| // We'd need an update endpoint, but for now we can create a new one or assume 'create' with same ID updates if we handled it, | ||
| // but simpler is to use the create endpoint to overwrite/confirm. | ||
| // Ideally we should have an update endpoint. | ||
| // Since I didn't create an update endpoint, I'll re-create it as accepted. | ||
|
|
||
| await annotationsApi.create({ | ||
| conversation_id: conversationId, | ||
| segment_index: annotation.segment_index, | ||
| original_text: annotation.original_text, | ||
| corrected_text: annotation.corrected_text, | ||
| status: 'accepted' | ||
| }) | ||
|
|
||
| // 2. Update local state to reflect change (hide suggestion, update transcript) | ||
| setConversations(prev => prev.map(c => { | ||
| if (c.conversation_id === conversationId && c.segments && c.annotations) { | ||
| const newSegments = [...c.segments] | ||
| if (newSegments[annotation.segment_index]) { | ||
| newSegments[annotation.segment_index].text = annotation.corrected_text | ||
| } | ||
| return { | ||
| ...c, | ||
| segments: newSegments, | ||
| annotations: c.annotations.filter(a => a.id !== annotation.id) // Remove processed suggestion | ||
| } | ||
| } | ||
| return c | ||
| })) | ||
| } catch (err) { | ||
| console.error('Failed to accept suggestion:', err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent failure when accepting suggestion.
The catch block (lines 486-488) only logs the error without informing the user. The UI has already been optimistically updated (lines 472-485), so the user won't know the operation failed.
Also, note the comment at lines 457-461 mentions there's no update endpoint, so create is used as a workaround. Verify with the backend that creating a new annotation with the same conversation_id and segment_index properly updates/replaces the existing one rather than creating duplicates.
🛠️ Proposed fix: Add error feedback and consider reverting optimistic update
} catch (err) {
console.error('Failed to accept suggestion:', err)
+ setError('Failed to accept suggestion. Please try again.')
+ // Optionally revert the optimistic update here
}
}| const handleRejectSuggestion = async (conversationId: string, annotationId: string) => { | ||
| // Ideally call API to mark as rejected. For now just remove from UI. | ||
| setConversations(prev => prev.map(c => { | ||
| if (c.conversation_id === conversationId && c.annotations) { | ||
| return { | ||
| ...c, | ||
| annotations: c.annotations.filter(a => a.id !== annotationId) | ||
| } | ||
| } | ||
| return c | ||
| })) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejection is not persisted - suggestion will reappear on refresh.
The function only removes the annotation from local state. If the user refreshes the page, the "rejected" suggestion will reappear as pending. The comment acknowledges this gap.
Consider either:
- Adding an API call to update the annotation status to 'rejected', or
- At minimum, storing rejected annotation IDs in localStorage to filter them client-side.
Also, the async keyword on line 491 is unnecessary since there's no await.
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 491 - 502,
handleRejectSuggestion currently only mutates local state and is unnecessarily
declared async; make rejection persistent by calling the backend to mark the
annotation as rejected (e.g., call a new or existing API like
markAnnotationRejected(conversationId, annotationId) and handle errors) and then
update state via setConversations to remove the annotation (using
conversation_id and annotation.id), or if backend work is not possible yet,
persist rejected IDs in localStorage (store annotationId keyed by conversationId
and filter them out wherever conversations are loaded/rendered). Also remove the
unnecessary async keyword from handleRejectSuggestion if you implement the
localStorage synchronous approach.
| <div | ||
| className="bg-blue-600 h-1.5 rounded-full transition-all duration-500" | ||
| style={{ | ||
| width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' : | ||
| jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%' | ||
| }} | ||
| ></div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress bar doesn't handle all terminal statuses.
Line 148 checks for terminal statuses including 'finished' and 'completed', but the progress bar width calculation (line 1066) only checks for 'finished'. If the backend returns 'completed' instead, the progress bar will show 10% width instead of 100%.
🛠️ Proposed fix
style={{
- width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' :
+ width: ['finished', 'completed'].includes(jobProgress[processingMemories[conversation.conversation_id]]) ? '100%' :
jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%'
}}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <div | |
| className="bg-blue-600 h-1.5 rounded-full transition-all duration-500" | |
| style={{ | |
| width: jobProgress[processingMemories[conversation.conversation_id]] === 'finished' ? '100%' : | |
| jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%' | |
| }} | |
| ></div> | |
| <div | |
| className="bg-blue-600 h-1.5 rounded-full transition-all duration-500" | |
| style={{ | |
| width: ['finished', 'completed'].includes(jobProgress[processingMemories[conversation.conversation_id]]) ? '100%' : | |
| jobProgress[processingMemories[conversation.conversation_id]] === 'started' ? '50%' : '10%' | |
| }} | |
| ></div> |
🤖 Prompt for AI Agents
In @backends/advanced/webui/src/pages/Conversations.tsx around lines 1063 -
1069, The progress bar width logic uses
jobProgress[processingMemories[conversation.conversation_id]] but only treats
'finished' as the terminal successful state; update the width condition in the
progress bar div (the inline style computing width) to also treat 'completed' as
a terminal/success state (i.e., return '100%' when the status is 'finished' OR
'completed'), and consider handling any other terminal statuses your backend may
return the same way so the UI reflects completed jobs correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/cron.py:
- Around line 40-57: The scheduled jobs call surface_error_suggestions() and
finetune_hallucination_model() without scoping to a user_id; update the loop to
iterate over active users (e.g., fetch active user IDs) and invoke await
surface_error_suggestions(user_id) and await
finetune_hallucination_model(user_id) for each user, updating
last_suggestion_run and last_training_run after the per-user batch completes; if
the job functions don't accept user_id, add a user_id parameter (with
appropriate defaults or validation) and ensure exceptions are caught per-user
(log user_id on error) so one failing user doesn't stop the whole batch.
- Line 38: Replace the deprecated call to datetime.utcnow() used to set the now
variable with a timezone-aware call datetime.now(timezone.utc); update imports
in cron.py to import timezone (e.g., from datetime import datetime, timezone) or
use datetime.timezone if you keep a module import, and ensure any downstream
code expecting naive datetimes is adjusted to handle an aware datetime.
In @backends/advanced/src/advanced_omi_backend/models/annotation.py:
- Line 20: The annotation model's id is defined as a UUID string (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from the User
model's MongoDB ObjectId pattern; change the Annotation model to use
Beanie/PydanticObjectId for the primary id (match the User model's id type) and
add a string accessor property (e.g., annotation_id) if a string representation
is needed, ensuring imports and any references to Annotation.id elsewhere are
updated to expect PydanticObjectId and use the new string property when
necessary.
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py:
- Around line 40-43: The existence check currently queries
TranscriptAnnotation.find_one by "conversation_id" and "segment_index" only;
update that query to also include the relevant "user_id" field (e.g., add
"user_id": conv.user_id or job.user_id depending on the surrounding scope) so
the lookup is scoped per user, and ensure the variable you reference
(conv.user_id or job.user_id) is available in the function before using it.
- Around line 22-24: The Conversation.find query in annotation_jobs.py currently
omits user scoping and will retrieve conversations for all users; update the job
to accept a user_id parameter (or obtain the intended user context) and include
a filter like {"user_id": user_id, "created_at": {"$gte": since}} when calling
Conversation.find, or if the job truly must be global add a clear comment near
Conversation.find explaining why user_id scoping is intentionally omitted and
ensure any permission checks are applied; locate this change around the
Conversation.find call and the surrounding job entrypoint to wire the user_id
parameter through.
- Line 21: Replace the naive UTC timestamp usage assigned to variable "since"
(currently using datetime.utcnow()) with a timezone-aware timestamp by calling
datetime.now(timezone.utc) and subtracting timedelta(days=1); also ensure
timezone is imported from datetime (or use datetime.timezone) so the expression
becomes timezone-aware throughout the module.
- Around line 58-59: The comment "Surface 5-6 places as requested" is
inconsistent with the condition `if count >= 6` (which actually limits results
to up to 6); update the comment near the `if count >= 6` check to "Surface up to
6 places" (or "up to 6 suggestions"), or if you truly want to enforce 5–6
results, change the condition logic around the `count` variable (e.g., break
when `count >= 5` and ensure counting semantics produce 5–6 items) so the
comment and the `if count >= 6` behavior match.
- Around line 72-74: The query fetching TranscriptAnnotation entries currently
returns accepted annotations across all users; update the cron job that contains
the corrections = await TranscriptAnnotation.find(...) call to scope by user_id:
either add a user_id parameter to that function and include "user_id": user_id
in the find filter alongside
TranscriptAnnotation.AnnotationStatus.ACCEPTED.value, or modify the cron to
iterate over all user ids and run the existing logic per-user (adding the same
"user_id" filter for each iteration); ensure you update any callers to pass the
user_id when choosing the parameter approach or implement the user-list
iteration and per-user processing path if no request context exists.
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/models/annotation.py (2)
1-6: Group imports by category.Imports should be organized: standard library, third-party, then local imports. Currently, standard library imports are mixed with third-party imports.
♻️ Reorganize imports
+# Standard library from datetime import datetime, timezone +from enum import Enum from typing import Optional, List +import uuid + +# Third-party from pydantic import Field from beanie import Document, Indexed -from enum import Enum -import uuidAs per coding guidelines, imports should be grouped and ordered consistently.
33-39: Annotate indexes withClassVarfor proper Pydantic handling.The
indexeslist should be annotated withtyping.ClassVarto indicate it's a class attribute rather than an instance field, preventing Pydantic from treating it as a model field.♻️ Add ClassVar annotation
from datetime import datetime, timezone -from typing import Optional, List +from typing import Optional, List, ClassVar from pydantic import Field from beanie import Document, Indexed from enum import Enum import uuid class TranscriptAnnotation(Document): """Model for transcript annotations/corrections.""" # ... (enum definitions and fields) class Settings: name = "transcript_annotations" - indexes = [ + indexes: ClassVar[List[str]] = [ "conversation_id", "user_id", "status" ]Based on static analysis hints.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx,js,jsx}: ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Use lazy imports sparingly and only when absolutely necessary for circular import issues.
Group imports in order: standard library, third-party, local imports.
Always raise errors, never silently ignore them. Use explicit error handling with proper exceptions rather than silent failures.
Files:
backends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT add defensivehasattr()checks. Instead, research and understand input/response or class structure.
Use Black formatter with 100-character line length for Python code.
Use isort for Python import organization.
Use Wyoming protocol (JSONL + binary) for WebSocket communication for structured audio sessions.
Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Conversations must only be created when speech is detected, eliminating noise-only sessions from user interfaces.
Always store audio sessions inaudio_chunkscollection byaudio_uuid. Only create entries inconversationscollection when speech is detected.
Use email-based authentication with MongoDB ObjectId user IDs.
Use automatic{objectid_suffix}-{device_name}format for client registration.
All data must be scoped by user_id with efficient permission checking for data isolation.
JWT tokens are required for all endpoints and WebSocket connections.
Support pluggable memory providers (Chronicle native with individual facts or OpenMemory MCP delegation).
Files:
backends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
**/*.{py,sh,md}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
uv run python3instead ofpython3for running Python commands, as uv is the package manager for all Python projects.
Files:
backends/advanced/src/advanced_omi_backend/cron.pybackends/advanced/src/advanced_omi_backend/models/annotation.pybackends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧠 Learnings (2)
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.{py,ts,tsx,js,jsx} : ALL imports must be at the top of the file after the docstring. NEVER import modules in the middle of functions or files.
Applied to files:
backends/advanced/src/advanced_omi_backend/cron.py
📚 Learning: 2026-01-06T08:43:14.315Z
Learnt from: CR
Repo: SimpleOpenSoftware/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-06T08:43:14.315Z
Learning: Applies to **/*.py : Use job-based tracking system for all audio processing (WebSocket and file uploads) with stage events (audio → transcription → memory) and completion status.
Applied to files:
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/cron.py (2)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
surface_error_suggestions(13-63)finetune_hallucination_model(65-98)backends/advanced/upload_files.py (1)
format(32-36)
backends/advanced/src/advanced_omi_backend/models/annotation.py (1)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
user_id(71-73)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
backends/advanced/src/advanced_omi_backend/models/conversation.py (3)
Conversation(17-314)active_transcript(166-174)segments(197-199)backends/advanced/src/advanced_omi_backend/models/annotation.py (3)
TranscriptAnnotation(8-39)AnnotationStatus(11-14)AnnotationSource(16-18)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/cron.py
61-61: Unused function argument: signum
(ARG001)
61-61: Unused function argument: frame
(ARG001)
backends/advanced/src/advanced_omi_backend/models/annotation.py
35-39: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py
35-35: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
36-36: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
🔇 Additional comments (7)
backends/advanced/src/advanced_omi_backend/models/annotation.py (2)
27-31: LGTM! Good use of timezone-aware timestamps.The default values are well-chosen:
ACCEPTEDstatus for user edits makes sense and is clearly documented- Using
datetime.now(timezone.utc)follows best practices for timezone-aware timestamps
21-25: Field types are consistent with related models.Verification confirms that
conversation_idanduser_idin the annotation model match the types used in theConversationmodel and throughout the codebase. Both fields correctly useIndexed(str), maintaining consistency with other models (conversation.pyandaudio_file.py).backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py (2)
1-11: LGTM! Imports are properly organized.Imports follow the correct order: standard library followed by local imports.
90-96: Mock implementation is clearly documented.The mock training process using
asyncio.sleep(2)is appropriate for this placeholder implementation and well-documented with comments indicating this is temporary.backends/advanced/src/advanced_omi_backend/cron.py (3)
1-27: LGTM! Good configuration structure.The interval configuration with
DEV_MODEoverride is well-designed for both production and testing scenarios.
34-35: Jobs will execute immediately on startup.Initializing
last_suggestion_runandlast_training_runtodatetime.minmeans both jobs will run immediately on the first scheduler check (line 41-42 and 50-51), as the time difference will always exceed the configured intervals.If immediate execution on startup is not desired, initialize these to
datetime.now(timezone.utc)instead.⏰ Alternative initialization
If jobs should not run immediately on startup:
# Initialize DB connection await init_db() - last_suggestion_run = datetime.min - last_training_run = datetime.min + last_suggestion_run = datetime.now(timezone.utc) + last_training_run = datetime.now(timezone.utc)
61-63: Signal handler parameters are correctly defined.The
signumandframeparameters are required by thesignal.signal()API, even though they're not used in this handler. This is expected behavior, and the static analysis warning is a false positive.
| last_training_run = datetime.min | ||
|
|
||
| while True: | ||
| now = datetime.utcnow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated datetime.utcnow() with datetime.now(timezone.utc).
datetime.utcnow() is deprecated in Python 3.12+ in favor of timezone-aware alternatives.
🔧 Use timezone-aware datetime
while True:
- now = datetime.utcnow()
+ now = datetime.now(timezone.utc)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| now = datetime.utcnow() | |
| while True: | |
| now = datetime.now(timezone.utc) |
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/cron.py at line 38, Replace the
deprecated call to datetime.utcnow() used to set the now variable with a
timezone-aware call datetime.now(timezone.utc); update imports in cron.py to
import timezone (e.g., from datetime import datetime, timezone) or use
datetime.timezone if you keep a module import, and ensure any downstream code
expecting naive datetimes is adjusted to handle an aware datetime.
| # Check Suggestions Job | ||
| if (now - last_suggestion_run).total_seconds() >= SUGGESTION_INTERVAL: | ||
| logger.info("Running scheduled job: surface_error_suggestions") | ||
| try: | ||
| await surface_error_suggestions() | ||
| last_suggestion_run = now | ||
| except Exception as e: | ||
| logger.error(f"Error in surface_error_suggestions: {e}", exc_info=True) | ||
|
|
||
| # Check Training Job | ||
| if (now - last_training_run).total_seconds() >= TRAINING_INTERVAL: | ||
| logger.info("Running scheduled job: finetune_hallucination_model") | ||
| try: | ||
| await finetune_hallucination_model() | ||
| last_training_run = now | ||
| except Exception as e: | ||
| logger.error(f"Error in finetune_hallucination_model: {e}", exc_info=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jobs lack user_id scoping required by coding guidelines.
The surface_error_suggestions() and finetune_hallucination_model() jobs are invoked without user_id parameters, but per the coding guidelines, all data operations must be scoped by user_id. This creates an architectural inconsistency with the worker implementations.
Consider one of these approaches:
- Per-user execution: Iterate over active users and invoke jobs for each user
- Batch processing: Refactor jobs to process all users in batches while maintaining proper data isolation
- Global processing: If intentionally global, document the architectural decision and ensure it aligns with system requirements
Would you like me to generate a solution that iterates over users and invokes these jobs per-user?
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/cron.py around lines 40 - 57, The
scheduled jobs call surface_error_suggestions() and
finetune_hallucination_model() without scoping to a user_id; update the loop to
iterate over active users (e.g., fetch active user IDs) and invoke await
surface_error_suggestions(user_id) and await
finetune_hallucination_model(user_id) for each user, updating
last_suggestion_run and last_training_run after the per-user batch completes; if
the job functions don't accept user_id, add a user_id parameter (with
appropriate defaults or validation) and ensure exceptions are caught per-user
(log user_id on error) so one failing user doesn't stop the whole batch.
| USER = "user" | ||
| MODEL_SUGGESTION = "model_suggestion" | ||
|
|
||
| id: str = Field(default_factory=lambda: str(uuid.uuid4())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent ID type: use MongoDB ObjectId instead of UUID string.
The id field uses a UUID string, which is inconsistent with other models in the codebase. The User model (see backends/advanced/src/advanced_omi_backend/models/user.py) uses MongoDB ObjectId for id and provides a user_id property for string representation. This inconsistency could cause:
- Type mismatches in queries and relationships
- Non-uniform ID patterns across the codebase
- Potential issues with Beanie's default behavior
🔍 Proposed fix
Based on the Beanie documentation, the default type of the id field is PydanticObjectId. While you can override it with UUID as shown in the documentation, consider aligning with the existing User model pattern for consistency across the codebase.
+from beanie import Document, Indexed, PydanticObjectId
class TranscriptAnnotation(Document):
"""Model for transcript annotations/corrections."""
class AnnotationStatus(str, Enum):
PENDING = "pending"
ACCEPTED = "accepted"
REJECTED = "rejected"
class AnnotationSource(str, Enum):
USER = "user"
MODEL_SUGGESTION = "model_suggestion"
- id: str = Field(default_factory=lambda: str(uuid.uuid4()))
+ # id field defaults to PydanticObjectId - MongoDB's ObjectId type
conversation_id: Indexed(str)🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/models/annotation.py at line 20,
The annotation model's id is defined as a UUID string (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from the User
model's MongoDB ObjectId pattern; change the Annotation model to use
Beanie/PydanticObjectId for the primary id (match the User model's id type) and
add a string accessor property (e.g., annotation_id) if a string representation
is needed, ensuring imports and any references to Annotation.id elsewhere are
updated to expect PydanticObjectId and use the new string property when
necessary.
| logger.info("Starting surface_error_suggestions job...") | ||
|
|
||
| # Get conversations from the last 24 hours | ||
| since = datetime.utcnow() - timedelta(days=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated datetime.utcnow() with datetime.now(timezone.utc).
datetime.utcnow() is deprecated in Python 3.12+ in favor of timezone-aware alternatives.
🔧 Use timezone-aware datetime
- since = datetime.utcnow() - timedelta(days=1)
+ since = datetime.now(timezone.utc) - timedelta(days=1)🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py at
line 21, Replace the naive UTC timestamp usage assigned to variable "since"
(currently using datetime.utcnow()) with a timezone-aware timestamp by calling
datetime.now(timezone.utc) and subtracting timedelta(days=1); also ensure
timezone is imported from datetime (or use datetime.timezone) so the expression
becomes timezone-aware throughout the module.
| conversations = await Conversation.find( | ||
| {"created_at": {"$gte": since}} | ||
| ).to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Missing user_id scoping violates data isolation requirements.
The query fetches conversations without filtering by user_id, which violates the coding guideline that states: "All data must be scoped by user_id with efficient permission checking for data isolation."
This cron job would process annotations across all users globally, which may not be the intended behavior. If this is meant to run per-user, it should be invoked with a user_id parameter and scope the query accordingly.
💡 Clarify intended behavior
Option 1: If this job should run per-user:
-async def surface_error_suggestions():
+async def surface_error_suggestions(user_id: str):
"""
Cron job to surface potential errors in transcripts.
Mocks the behavior of an ML model identifying low-confidence segments.
"""
logger.info("Starting surface_error_suggestions job...")
# Get conversations from the last 24 hours
since = datetime.now(timezone.utc) - timedelta(days=1)
conversations = await Conversation.find(
- {"created_at": {"$gte": since}}
+ {"created_at": {"$gte": since}, "user_id": user_id}
).to_list()Option 2: If this job is intentionally global (processing all users), add a comment explaining why user_id scoping is not required and ensure this aligns with system architecture decisions.
As per coding guidelines, data must be scoped by user_id.
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 22 - 24, The Conversation.find query in annotation_jobs.py currently omits
user scoping and will retrieve conversations for all users; update the job to
accept a user_id parameter (or obtain the intended user context) and include a
filter like {"user_id": user_id, "created_at": {"$gte": since}} when calling
Conversation.find, or if the job truly must be global add a clear comment near
Conversation.find explaining why user_id scoping is intentionally omitted and
ensure any permission checks are applied; locate this change around the
Conversation.find call and the surrounding job entrypoint to wire the user_id
parameter through.
| existing = await TranscriptAnnotation.find_one({ | ||
| "conversation_id": conv.conversation_id, | ||
| "segment_index": segment_idx | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add user_id to annotation existence check.
The query for existing annotations should also be scoped by user_id for consistency with data isolation requirements.
🔧 Scope query by user_id
# Check if annotation already exists
existing = await TranscriptAnnotation.find_one({
"conversation_id": conv.conversation_id,
- "segment_index": segment_idx
+ "segment_index": segment_idx,
+ "user_id": conv.user_id
})As per coding guidelines, data must be scoped by user_id.
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 40 - 43, The existence check currently queries
TranscriptAnnotation.find_one by "conversation_id" and "segment_index" only;
update that query to also include the relevant "user_id" field (e.g., add
"user_id": conv.user_id or job.user_id depending on the surrounding scope) so
the lookup is scoped per user, and ensure the variable you reference
(conv.user_id or job.user_id) is available in the function before using it.
| if count >= 6: # Surface 5-6 places as requested | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comment/logic mismatch.
The comment states "Surface 5-6 places" but the logic checks count >= 6, which surfaces up to 6 suggestions (not 5-6). Either update the comment to "up to 6" or change the condition to match "5-6".
📝 Align comment with logic
Option 1: Update comment to match logic:
- if count >= 6: # Surface 5-6 places as requested
+ if count >= 6: # Surface up to 6 suggestionsOption 2: Update logic to match comment (surface between 5-6):
- if count >= 6: # Surface 5-6 places as requested
+ if count >= 5: # Surface 5-6 places as requested📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if count >= 6: # Surface 5-6 places as requested | |
| break | |
| if count >= 6: # Surface up to 6 suggestions | |
| break |
| if count >= 6: # Surface 5-6 places as requested | |
| break | |
| if count >= 5: # Surface 5-6 places as requested | |
| break |
🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 58 - 59, The comment "Surface 5-6 places as requested" is inconsistent
with the condition `if count >= 6` (which actually limits results to up to 6);
update the comment near the `if count >= 6` check to "Surface up to 6 places"
(or "up to 6 suggestions"), or if you truly want to enforce 5–6 results, change
the condition logic around the `count` variable (e.g., break when `count >= 5`
and ensure counting semantics produce 5–6 items) so the comment and the `if
count >= 6` behavior match.
| corrections = await TranscriptAnnotation.find({ | ||
| "status": TranscriptAnnotation.AnnotationStatus.ACCEPTED.value | ||
| }).to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the file and examine the function context (lines 65-98)
echo "=== Examining the annotation_jobs.py file and function context ==="
cat -n backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py | sed -n '65,98p'
echo -e "\n=== Searching for TranscriptAnnotation model definition ==="
fd "\.py$" | xargs grep -l "class TranscriptAnnotation" | head -5Repository: SimpleOpenSoftware/chronicle
Length of output: 1597
🏁 Script executed:
#!/bin/bash
# Search for TranscriptAnnotation model and enum field definitions
echo "=== Finding TranscriptAnnotation model ==="
rg -n "class TranscriptAnnotation" -A 30 --type=py
echo -e "\n=== Searching for AnnotationStatus enum definition ==="
rg -n "class AnnotationStatus" -A 10 --type=py | head -40Repository: SimpleOpenSoftware/chronicle
Length of output: 4356
🏁 Script executed:
#!/bin/bash
# Search for how enums are queried in Beanie models throughout codebase
echo "=== Enum query patterns in codebase ==="
rg -n "\.find\(\{" -A 5 --type=py | grep -A 5 "\.value\|Enum" | head -50
echo -e "\n=== Check for other status field queries ==="
rg -n '"status":\s*' --type=py | grep -E "(\.value|Enum|Status)" | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 2356
🏁 Script executed:
#!/bin/bash
# Check if there are user_id fields in similar queries
echo "=== Searching for user_id usage in find queries ==="
rg -n '\.find\(\{' -A 3 --type=py | grep -B 3 "user_id" | head -40
echo -e "\n=== Check TranscriptAnnotation schema for user_id field ==="
rg -n "TranscriptAnnotation" -B 5 -A 20 --type=py | grep -E "(user_id|class TranscriptAnnotation)" | head -30Repository: SimpleOpenSoftware/chronicle
Length of output: 2785
Add user_id scoping to comply with data isolation requirements.
The query retrieves annotations for all users without filtering by user_id. This violates the data isolation guideline. Since this is a cron job without request context, the function must either accept a user_id parameter and filter accordingly, or iterate over all users and process per-user.
The enum query syntax using .value is correct and consistent with the codebase pattern.
Add user_id parameter and filter
-async def finetune_hallucination_model():
+async def finetune_hallucination_model(user_id: str):
"""
Cron job to finetune a LORA model on corrections.
"""
logger.info("Starting finetune_hallucination_model job...")
# Gather accepted corrections
corrections = await TranscriptAnnotation.find({
"status": TranscriptAnnotation.AnnotationStatus.ACCEPTED.value,
+ "user_id": user_id
}).to_list()🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py around
lines 72 - 74, The query fetching TranscriptAnnotation entries currently returns
accepted annotations across all users; update the cron job that contains the
corrections = await TranscriptAnnotation.find(...) call to scope by user_id:
either add a user_id parameter to that function and include "user_id": user_id
in the find filter alongside
TranscriptAnnotation.AnnotationStatus.ACCEPTED.value, or modify the cron to
iterate over all user ids and run the existing logic per-user (adding the same
"user_id" filter for each iteration); ensure you update any callers to pass the
user_id when choosing the parameter approach or implement the user-list
iteration and per-user processing path if no request context exists.
Introduced a new
TranscriptAnnotationmodel for managing transcript corrections.Added
annotation_routesfor creating and retrieving annotations via API.Implemented cron jobs to surface potential errors in transcripts and finetune a model based on accepted corrections (actual surfacing of errors is not implemented, this is only a cron service with time.sleep
backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.pyUpdated Docker Compose to include a cron service for running scheduled tasks.
Enhanced the web UI to support displaying and managing annotations, including accepting and rejecting suggestions.
Added tests for annotation model and integration flow to ensure functionality and reliability.
Summary by CodeRabbit
New Features
Background Jobs
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.