Skip to content

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Dec 14, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for multiple transcription providers (Deepgram and Parakeet).
    • Implemented configurable diarization modes for flexible speaker attribution.
    • Enhanced speaker identification and labeling in transcription results.
  • Chores

    • Updated speaker diarization model to community version.
    • Upgraded PyTorch and audio processing dependencies.
    • Removed legacy CUDA 12.1 support; added CUDA 12.8 variant.
    • Improved Docker and environment configuration for CI/local testing.

✏️ Tip: You can customize this high-level summary in your review settings.

…n features

- Updated `.env.template` to include configuration options for transcription providers (Deepgram and Parakeet) and diarization modes.
- Modified `docker-compose.yml` and `docker-compose-test.yml` to support new environment variables for transcription services.
- Refactored `Dockerfile` to use JSON form for CMD to allow environment variable expansion.
- Added a new `laptop_client.py` script for command-line interactions with the speaker recognition service.
- Introduced a transcription orchestrator to manage transcription and diarization processes.
- Updated dependencies in `pyproject.toml` to use the latest version of `pyannote.audio`.
- Implemented new service structure for transcription and diarization, including base classes and specific providers for Deepgram and Parakeet.
- Enhanced logging and error handling for better service reliability and debugging.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a multi-provider transcription architecture for speaker-recognition. It replaces direct Deepgram HTTP integration with a provider-agnostic orchestrator pattern, adds Parakeet transcription support, creates new diarization and transcription provider abstractions, upgrades the pyannote model version, and updates environment configuration, Docker setup, and test infrastructure accordingly.

Changes

Cohort / File(s) Summary
Configuration & Environment Setup
extras/speaker-recognition/.env.template, extras/speaker-recognition/docker-compose.yml, extras/speaker-recognition/docker-compose-test.yml
Added new environment variables for transcription provider selection (TRANSCRIPTION_PROVIDER, DEEPGRAM_API_KEY, PARAKEET_ASR_URL, DIARIZATION_MODE, SPEAKER_SERVICE_TEST_PORT). Updated docker-compose to export these settings and corrected variable reference from COMPUTE_MODE to PYTORCH_CUDA_VERSION.
Docker & Build Configuration
extras/speaker-recognition/Dockerfile, extras/speaker-recognition/pyproject.toml
Converted CMD instruction to JSON form for proper shell execution. Upgraded pyannote.audio from >=3.3.2 to >=4.0.0, removed cu121 CUDA variant, and updated cu126/cu128 to use torch>=2.8.0 and torchaudio>=2.8.0.
Documentation & Testing
extras/speaker-recognition/README.md, extras/speaker-recognition/run-test.sh
Updated model references from pyannote/speaker-diarization-3.1 to pyannote/speaker-diarization-community-1. Enhanced run-test.sh with conditional environment loading (local .env file or CI variables) and comprehensive variable validation.
Core Service & API
extras/speaker-recognition/src/simple_speaker_recognition/api/service.py, extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py, extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py
Added Settings field for diarization_mode, initialized TranscriptionOrchestrator in lifespan. Refactored deepgram_wrapper to use orchestrator-based transcription instead of direct HTTP calls. Updated audio_backend to use new pyannote model and removed backward-compatibility for 3.x output formats.
Transcription Provider Abstraction
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py, extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py, extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
Introduced BaseTranscriptionProvider abstract class with TranscriptionWord and TranscriptionResult data models. Implemented DeepgramProvider and ParakeetProvider subclasses with HTTP transcription endpoints, response parsing, and health checks.
Transcription Service Factory
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py, extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py
Added get_transcription_provider() and get_transcription_orchestrator() factories for provider selection and configuration. Created utility functions for speaker attribution, response formatting, and segment extraction.
Diarization Provider Abstraction
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py, extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py, extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py
Introduced BaseDiarizationProvider abstract class with DiarizationResult model. Implemented PyannoteDiarizationProvider with lazy pipeline initialization, audio conversion, and speaker extraction.
Orchestration & Service Package
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py, extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py
Created TranscriptionOrchestrator to coordinate transcription and diarization workflows with configurable strategies (auto, native, pyannote, none). Added package docstring.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router as deepgram_wrapper
    participant Orchestrator
    participant TranscProvider as Transcription<br/>Provider
    participant DiarProvider as Diarization<br/>Provider
    participant Response

    Client->>Router: POST /transcribe
    activate Router
    
    Router->>Router: get_transcription_orchestrator()
    activate Router
    Router->>Router: initialize if needed
    deactivate Router
    
    Router->>Orchestrator: process(audio_data, sample_rate, config)
    activate Orchestrator
    
    Orchestrator->>TranscProvider: transcribe_http(audio_data, ...)
    activate TranscProvider
    TranscProvider-->>Orchestrator: TranscriptionResult<br/>(text, words)
    deactivate TranscProvider
    
    Orchestrator->>Orchestrator: determine_diarization_strategy()
    
    alt Diarization Needed
        Orchestrator->>DiarProvider: diarize(audio_data, ...)
        activate DiarProvider
        DiarProvider-->>Orchestrator: DiarizationResult
        deactivate DiarProvider
        
        Orchestrator->>Orchestrator: enrich_words_with_speakers<br/>(words, diarization)
    end
    
    Orchestrator-->>Router: Enhanced TranscriptionResult
    deactivate Orchestrator
    
    Router->>Router: format_deepgram_response(result)
    Router->>Response: Return Deepgram-compatible JSON
    deactivate Router
    
    Response-->>Client: Transcription + Optional Speakers
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • run-test.sh: Multi-branch environment loading logic with conditional CI/local paths requires careful verification of variable export ordering and precedence
  • orchestrator.py: Complex speaker enrichment strategy with conditional diarization paths and speaker lookup via segment overlap requires thorough logic verification
  • deepgram_wrapper.py: Refactored endpoint logic now delegates to orchestrator; verify correct response format conversion and optional speaker ID enhancement
  • Base abstractions (transcription/diarization): Ensure abstract method contracts and configuration passing are consistently implemented across all providers
  • Provider implementations (Deepgram, Parakeet): Verify response parsing, error handling, and audio format handling are robust

Possibly related PRs

  • chronicler-ai/chronicle#150: Directly related — both implement transcription provider selection, update TRANSCRIPTION_PROVIDER/PARAKEET_ASR_URL/DEEPGRAM_API_KEY configuration, and integrate Deepgram/Parakeet providers.

Poem

🐰 A multi-provider chorus now plays,
Where Deepgram and Parakeet dance in harmonious ways,
The orchestrator conducts with speaker delight,
Diarization abstracts, the future shines bright!
🎵✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Diar 5' is vague and non-descriptive, using a cryptic abbreviation that does not convey meaningful information about the changeset's primary objectives. Revise the title to be more descriptive and specific about the main change. For example: 'Refactor transcription to support multiple providers' or 'Integrate provider-agnostic transcription orchestrator'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 92.45% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (12)
extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py (1)

1-1: Expose public API from services package.

The services __init__.py currently has no exports. Per the PR context, new diarization providers, transcription providers, orchestrator, and utilities are being introduced. Consider re-exporting the primary public interfaces (e.g., DiarizationResult, diarization/transcription providers, TranscriptionOrchestrator, and factory functions) to provide a convenient and documented API surface for consumers of the services package.

Verify which entities should be part of the public API by examining the diarization and transcription subpackages introduced in this PR. A typical pattern would be:

"""Services package for speaker recognition."""

from simple_speaker_recognition.services.diarization import (
    BaseDiarizationProvider,
    DiarizationResult,
    PyannoteDiarizationProvider,
)
from simple_speaker_recognition.services.transcription import (
    DeepgramProvider,
    ParakeetProvider,
    TranscriptionOrchestrator,
    # Add factory functions if they exist
)

__all__ = [
    "BaseDiarizationProvider",
    "DiarizationResult",
    "PyannoteDiarizationProvider",
    "DeepgramProvider",
    "ParakeetProvider",
    "TranscriptionOrchestrator",
]

Adjust based on the actual public API you want to expose.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)

17-28: Consider sorting __all__ for consistency.

Static analysis suggests applying isort-style sorting to __all__.

 __all__ = [
     "BaseTranscriptionProvider",
+    "TranscriptionOrchestrator",
     "TranscriptionResult",
     "TranscriptionWord",
-    "TranscriptionOrchestrator",
-    "get_transcription_provider",
-    "get_transcription_orchestrator",
+    "extract_speaker_segments",
+    "find_speaker_at_time",
     "format_deepgram_response",
-    "find_speaker_at_time",
-    "extract_speaker_segments",
+    "get_transcription_orchestrator",
+    "get_transcription_provider",
     "get_speaker_statistics",
 ]
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (2)

36-46: Use logging.exception() for full stack trace.

Based on learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace.

             except Exception as e:
-                log.error(f"Failed to load Pyannote pipeline: {e}")
+                log.exception("Failed to load Pyannote pipeline")
                 raise

48-50: Unused config parameter.

The config parameter is accepted but never used. Either utilize it for min_speakers/max_speakers from the config dict, or prefix with underscore to indicate intentional non-use.

Based on the relevant code snippet from audio_backend.py showing diarization supports min_speakers, max_speakers, etc., consider:

     async def diarize(
-        self, audio_data: bytes, sample_rate: int, config: Dict
+        self, audio_data: bytes, sample_rate: int, config: Dict
     ) -> DiarizationResult:
         ...
         # Ensure pipeline is loaded
         self._ensure_pipeline()
+
+        # Extract diarization parameters from config
+        min_speakers = config.get("min_speakers")
+        max_speakers = config.get("max_speakers")
+
+        kwargs = {}
+        if min_speakers is not None:
+            kwargs["min_speakers"] = min_speakers
+        if max_speakers is not None:
+            kwargs["max_speakers"] = max_speakers
         ...
         try:
             # Run diarization (returns DiarizeOutput in pyannote 4.x)
-            diarization_output = self.diarization_pipeline(tmp_path)
+            diarization_output = self.diarization_pipeline(tmp_path, **kwargs)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)

214-221: Rename unused loop variable to underscore prefix.

The track_ann variable is not used within the loop body.

-        for segment_ann, track_ann, label in overlapping.itertracks(yield_label=True):
+        for segment_ann, _track_ann, label in overlapping.itertracks(yield_label=True):
             overlap = (segment & segment_ann).duration
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (1)

28-29: Rename unused loop variables to underscore prefix.

Multiple loop variables are unused and should be prefixed with underscore per convention.

-    for segment, track, label in overlapping.itertracks(yield_label=True):
+    for _segment, _track, label in overlapping.itertracks(yield_label=True):
         return label
-    for segment, track, label in diarization.itertracks(yield_label=True):
+    for segment, _track, label in diarization.itertracks(yield_label=True):
         segments.append(

Also applies to: 109-109

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (3)

91-94: Use custom exception and chain with from.

Raising a generic Exception loses context. Create a specific exception or use HTTPException equivalent, and chain with from for context preservation.

Based on learnings: "raise ... from e' to preserve the original context."

                 else:
                     error_msg = f"Parakeet error: {response.status_code} - {response.text}"
                     log.error(error_msg)
-                    raise Exception(error_msg)
+                    raise RuntimeError(error_msg)

Or better, create a custom TranscriptionError exception class.


100-102: Use logging.exception() for full stack trace.

Based on learnings, prefer logging.exception() inside except blocks.

         except Exception as e:
-            log.error(f"Error calling Parakeet service: {e}")
+            log.exception("Error calling Parakeet service")
             raise

65-67: Unused config parameter.

The config parameter is accepted but never used. Consider passing language from config to the Parakeet service if supported.

Since get_supported_params() returns ["language"], the config should be utilized:

     async def _parakeet_transcribe(
-        self, audio_data: bytes, sample_rate: int, config: Dict
+        self, audio_data: bytes, sample_rate: int, config: Dict
     ) -> Dict:
         """Call Parakeet ASR service."""
         try:
             ...
+            # Build request params from config
+            params = {}
+            if "language" in config:
+                params["language"] = config["language"]
+
             async with httpx.AsyncClient(timeout=180.0) as client:
                 with open(tmp_filename, "rb") as f:
                     files = {"file": ("audio.wav", f, "audio/wav")}
-                    response = await client.post(self.transcribe_url, files=files)
+                    response = await client.post(self.transcribe_url, files=files, params=params)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (1)

122-130: Fix f-string, use logging.exception(), and chain exception.

Multiple issues in the error handling block:

  1. Line 122: f-string has no placeholders
  2. Line 126: Use logging.exception() for stack trace
  3. Line 127: Chain exception with from e

Based on learnings: prefer logging.exception() and chain with raise ... from e.

-        log.info(f"Transcription complete via orchestrator")
+        log.info("Transcription complete via orchestrator")
         return deepgram_response

     except Exception as e:
-        log.error(f"Orchestrator transcription error: {e}")
+        log.exception("Orchestrator transcription error")
         raise HTTPException(
             status_code=500,
-            detail=f"Transcription error: {str(e)}"
-        )
+            detail=f"Transcription error: {e!s}"
+        ) from e
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (2)

28-30: Document or utilize the unused sample_rate parameter.

The sample_rate parameter is unused in this method. While Deepgram can infer the sample rate from audio data, consider either:

  1. Documenting why it's intentionally unused in the docstring
  2. Passing it to Deepgram if supported (e.g., via encoding or sample_rate query parameters)

This maintains clarity about the interface contract with BaseTranscriptionProvider.


121-130: Consider merging consecutive same-speaker words into single segments.

The current implementation creates a separate segment for each word, resulting in many tiny segments. For better efficiency and semantic correctness, merge consecutive words from the same speaker into single, continuous segments.

Example optimization:

# Group consecutive words by speaker into segments
current_speaker = None
segment_start = None
segment_end = None

for w_data in alternative["words"]:
    speaker = w_data["speaker"]
    
    if speaker != current_speaker:
        # Save previous segment if exists
        if current_speaker is not None:
            speaker_label = f"SPEAKER_{current_speaker}"
            diarization[Segment(segment_start, segment_end)] = speaker_label
        
        # Start new segment
        current_speaker = speaker
        segment_start = w_data["start"]
        segment_end = w_data["end"]
    else:
        # Extend current segment
        segment_end = w_data["end"]

# Don't forget the last segment
if current_speaker is not None:
    speaker_label = f"SPEAKER_{current_speaker}"
    diarization[Segment(segment_start, segment_end)] = speaker_label
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dededcd and 9b0a4f7.

⛔ Files ignored due to path filters (1)
  • extras/speaker-recognition/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • extras/speaker-recognition/.env.template (1 hunks)
  • extras/speaker-recognition/Dockerfile (1 hunks)
  • extras/speaker-recognition/README.md (3 hunks)
  • extras/speaker-recognition/docker-compose-test.yml (1 hunks)
  • extras/speaker-recognition/docker-compose.yml (1 hunks)
  • extras/speaker-recognition/pyproject.toml (2 hunks)
  • extras/speaker-recognition/run-test.sh (2 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (4 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/api/service.py (5 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py (2 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1 hunks)
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py
  • extras/speaker-recognition/src/simple_speaker_recognition/api/service.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
  • extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py
  • extras/speaker-recognition/src/simple_speaker_recognition/api/service.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py
  • extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
  • extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py
  • extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
🧬 Code graph analysis (8)
extras/speaker-recognition/run-test.sh (2)
backends/advanced/run-test.sh (1)
  • print_info (16-18)
run-test.sh (1)
  • print_info (17-19)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (2)
extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py (1)
  • diarize (53-100)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (1)
  • health_check (92-94)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (1)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (4)
  • TranscriptionResult (33-56)
  • TranscriptionWord (10-30)
  • start (23-25)
  • end (28-30)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (6)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (3)
  • BaseTranscriptionProvider (59-94)
  • TranscriptionResult (33-56)
  • TranscriptionWord (10-30)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (4)
  • extract_speaker_segments (98-118)
  • find_speaker_at_time (10-31)
  • format_deepgram_response (34-95)
  • get_speaker_statistics (121-154)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
  • TranscriptionOrchestrator (20-245)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)
  • ParakeetProvider (18-137)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (1)
  • DeepgramProvider (14-254)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (1)
  • get_transcription_orchestrator (57-65)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (3)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (2)
  • format_deepgram_response (34-95)
  • extract_speaker_segments (98-118)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)
  • get_transcription_orchestrator (83-114)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
  • process (76-140)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (7)
extras/speaker-recognition/src/simple_speaker_recognition/api/service.py (1)
  • Config (41-45)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py (2)
  • Config (17-18)
  • health_check (42-44)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (5)
  • has_native_diarization (24-26)
  • transcribe_http (28-63)
  • transcribe_stream (197-205)
  • get_supported_params (207-240)
  • health_check (242-254)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (5)
  • has_native_diarization (31-33)
  • transcribe_http (35-63)
  • transcribe_stream (105-116)
  • get_supported_params (118-120)
  • health_check (122-137)
app/metro.config.js (1)
  • config (8-8)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)
  • health_check (101-108)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
  • health_check (224-245)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (4)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (8)
  • BaseTranscriptionProvider (59-94)
  • TranscriptionResult (33-56)
  • TranscriptionWord (10-30)
  • has_native_diarization (68-70)
  • transcribe_http (73-77)
  • transcribe_stream (80-84)
  • get_supported_params (87-89)
  • health_check (92-94)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (5)
  • has_native_diarization (31-33)
  • transcribe_http (35-63)
  • transcribe_stream (105-116)
  • get_supported_params (118-120)
  • health_check (122-137)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)
  • health_check (101-108)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
  • health_check (224-245)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (8)
  • BaseTranscriptionProvider (59-94)
  • TranscriptionResult (33-56)
  • TranscriptionWord (10-30)
  • has_native_diarization (68-70)
  • transcribe_http (73-77)
  • transcribe_stream (80-84)
  • get_supported_params (87-89)
  • health_check (92-94)
🪛 GitHub Actions: Speaker Recognition Tests
extras/speaker-recognition/run-test.sh

[error] 171-171: syntax error: unexpected end of file

🪛 markdownlint-cli2 (0.18.1)
extras/speaker-recognition/README.md

19-19: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.8)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py

28-28: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


49-49: Unused method argument: config

(ARG002)


105-105: Consider moving this statement to an else block

(TRY300)


106-106: Do not catch blind exception: Exception

(BLE001)

extras/speaker-recognition/src/simple_speaker_recognition/api/service.py

119-119: Do not catch blind exception: Exception

(BLE001)


120-120: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py

28-28: Loop control variable segment not used within loop body

Rename unused segment to _segment

(B007)


28-28: Loop control variable track not used within loop body

Rename unused track to _track

(B007)


109-109: Loop control variable track not used within loop body

Rename unused track to _track

(B007)

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py

17-28: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


52-52: Avoid specifying long messages outside the exception class

(TRY003)


60-60: Avoid specifying long messages outside the exception class

(TRY003)


76-80: Avoid specifying long messages outside the exception class

(TRY003)

extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py

122-122: f-string without any placeholders

Remove extraneous f prefix

(F541)


123-123: Consider moving this statement to an else block

(TRY300)


125-125: Do not catch blind exception: Exception

(BLE001)


126-126: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


127-130: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


129-129: Use explicit conversion flag

Replace with conversion flag

(RUF010)

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py

29-29: Unused method argument: sample_rate

(ARG002)


57-57: Create your own exception

(TRY002)


57-57: Avoid specifying long messages outside the exception class

(TRY003)


252-252: Do not catch blind exception: Exception

(BLE001)

extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py

66-69: Avoid specifying long messages outside the exception class

(TRY003)


216-216: Loop control variable track_ann not used within loop body

Rename unused track_ann to _track_ann

(B007)


241-241: Do not catch blind exception: Exception

(BLE001)

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py

66-66: Unused method argument: config

(ARG002)


94-94: Create your own exception

(TRY002)


101-101: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


129-129: Do not catch blind exception: Exception

(BLE001)


136-136: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (34)
extras/speaker-recognition/README.md (1)

832-832: Laptop client path updates are consistent.

All references to the laptop client have been properly updated from laptop_client.py to scripts/laptop_client.py across the examples. The changes align with the actual relocated script path.

Also applies to: 857-857, 860-860, 863-863, 866-866, 869-869, 872-872

extras/speaker-recognition/pyproject.toml (2)

42-43: Reconcile inconsistent PyTorch version constraints across variants.

The cpu variant pins torch>=2.0.0 and torchaudio>=2.0.0, while cu126 and cu128 now require >=2.8.0. This creates a dependency resolution mismatch:

  • Users installing with the cpu extra will get older PyTorch versions
  • Users installing with cu126 or cu128 will get newer versions
  • If shared code depends on features only available in PyTorch 2.8+, CPU users may encounter runtime errors

Either:

  1. Upgrade the cpu variant to match the CUDA variants, or
  2. Document the minimum PyTorch version if 2.0.0 is intentionally lower for compatibility reasons

Confirm that pyannote.audio 4.0.0 works with torch 2.0.0 to justify the discrepancy.

Also applies to: 47-48, 52-53


11-11: Pyannote.audio 4.0.0 compatibility confirmed.

The upgrade to pyannote.audio>=4.0.0 introduces breaking changes: Python >= 3.10 requirement (satisfied in pyproject.toml line 5) and removal of sox/soundfile I/O backends. The codebase is compatible with these changes—soundfile is used only for writing temporary audio files before passing file paths to pyannote's pipeline, which handles I/O via ffmpeg or in-memory backends. Code explicitly handles pyannote.audio 4.x API (e.g., DiarizeOutput handling in services/diarization/pyannote.py lines 79-80).

extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py (1)

83-85: Correct attribute usage for pyannote.audio 4.x.

The code correctly accesses output.speaker_diarization from the DiarizeOutput wrapper in pyannote.audio 4.x. This attribute name is correct and returns a pyannote.core.Annotation object. The direct attribute access without defensive checks aligns with the coding guidelines (understand data structures rather than adding hasattr() checks), and the pattern is used consistently throughout the codebase.

extras/speaker-recognition/Dockerfile (1)

45-46: LGTM! Proper JSON form CMD with environment variable support.

The switch from shell form to JSON form CMD with sh -c wrapper correctly enables proper signal handling while preserving environment variable expansion. This aligns with Docker best practices and the coding guideline to use uv run for Python execution.

extras/speaker-recognition/docker-compose-test.yml (1)

8-8: LGTM! Variable rename improves consistency.

The change from COMPUTE_MODE to PYTORCH_CUDA_VERSION aligns the test configuration with the Dockerfile's ARG name and provides more specific naming. The default value of cpu remains appropriate for test environments.

extras/speaker-recognition/.env.template (1)

42-71: Excellent configuration documentation!

The new transcription and diarization configuration block is well-structured with clear section headers, comprehensive inline documentation, and sensible defaults. The multi-provider approach with auto-detection fallback provides good flexibility for different deployment scenarios.

extras/speaker-recognition/docker-compose.yml (1)

29-32: LGTM! Environment variables properly configured.

The new TRANSCRIPTION_PROVIDER and PARAKEET_ASR_URL environment variables are well-integrated with sensible defaults that match the .env.template documentation. This enables runtime provider selection for the new orchestrator architecture.

extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py (1)

1-6: LGTM! Clean package interface.

The package initialization properly defines the public API with explicit __all__ and follows coding guidelines by placing all imports at the top. The module provides a clear abstraction for diarization providers.

extras/speaker-recognition/run-test.sh (2)

64-102: Environment loading logic is well-structured.

Once the syntax error at lines 60-61 is fixed, the environment loading logic properly handles both CI and local development scenarios. The CI path creates a .env file for downstream compatibility, and error messages clearly guide users on required credentials.


115-118: Good security practice logging token lengths.

Reporting the length of HF_TOKEN and DEEPGRAM_API_KEY instead of their actual values provides verification that credentials are loaded while maintaining security.

extras/speaker-recognition/src/simple_speaker_recognition/api/service.py (3)

20-23: LGTM! Imports properly organized.

The new transcription orchestrator imports are correctly placed at the top of the file with other local imports, following the coding guidelines.


39-39: LGTM! Settings field properly configured.

The diarization_mode field is well-documented with a sensible default of "auto" that aligns with the .env.template configuration.


66-66: LGTM! Global variable follows existing pattern.

The transcription_orchestrator global follows the same pattern as other service resources (audio_backend, speaker_db) and is properly typed as Optional to indicate potential unavailability.

extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py (2)

10-18: LGTM! Well-designed result model.

The DiarizationResult model is properly configured with arbitrary_types_allowed = True to support pyannote's Annotation type. The optional exclusive_diarization field and extensible metadata dictionary provide good flexibility for different diarization backends.


21-44: LGTM! Clean provider abstraction.

The BaseDiarizationProvider abstract class follows Python conventions with clear async method signatures and comprehensive docstrings. The abstraction provides a clean contract for implementing different diarization backends.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)

83-114: LGTM!

The get_transcription_orchestrator factory function correctly builds the orchestrator with provider, diarization mode, and token configuration.

extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)

101-108: LGTM with minor suggestion.

The health check logic is correct. The static analysis suggests moving Line 105 to an else block, but this is a minor style preference that doesn't affect correctness.

extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (3)

30-61: LGTM!

The orchestrator initialization properly validates diarization modes, falls back gracefully on invalid input, and supports lazy-loading of the Pyannote provider.


76-140: LGTM!

The process method correctly orchestrates the transcription-diarization pipeline:

  1. Transcribes via the configured provider
  2. Determines diarization strategy based on mode and provider capabilities
  3. Runs standalone Pyannote if needed
  4. Enriches words with speaker labels

178-195: In-place mutation of input list.

_enrich_words_with_speakers modifies the words list in-place by setting word.speaker. While it returns the list, the caller's original list is also mutated. This is currently fine since Line 137 reassigns transcription_result.words, but could be surprising.

Consider whether this should create new TranscriptionWord instances to avoid side effects, or document that this mutates in-place.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (2)

34-95: LGTM!

The format_deepgram_response function correctly converts the internal TranscriptionResult to Deepgram-compatible JSON format with proper handling of optional fields and speaker label conversion.


121-154: LGTM!

The get_speaker_statistics function properly guards against empty diarization and division by zero, and uses pyannote's chart() method correctly.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)

35-63: LGTM!

The transcribe_http method correctly converts Parakeet response to TranscriptionResult with proper TranscriptionWord construction.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (3)

10-31: LGTM!

TranscriptionWord is well-designed with pyannote Segment for timing, backward-compatible start/end properties, and proper Pydantic configuration for arbitrary types.


33-57: LGTM!

TranscriptionResult provides a clean data model with optional diarization fields, helper methods for speaker timeline/chart access, and sensible defaults.


59-94: LGTM!

BaseTranscriptionProvider defines a clear abstract interface with sensible defaults for supports_streaming and has_native_diarization. The abstract methods cover HTTP transcription, streaming, parameter discovery, and health checks.

extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (3)

57-66: LGTM!

The get_transcription_orchestrator() helper properly retrieves the orchestrator from the service module with appropriate error handling for uninitialized state.


90-112: LGTM!

The transcribe_with_provider function correctly integrates with the orchestrator pattern, builds provider-agnostic config, and converts results to Deepgram format for backward compatibility.


596-624: LGTM!

The endpoint correctly uses the provider abstraction for transcription and conditionally applies speaker enhancement when user_id is provided.

extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (4)

23-26: LGTM!

Correctly identifies that Deepgram has built-in speaker diarization capabilities.


142-195: LGTM!

The parameter building logic correctly handles type conversions (booleans to lowercase strings, lists to comma-separated values) and provides comprehensive coverage of Deepgram API parameters.


197-205: LGTM!

Clear TODO with NotImplementedError for streaming support planned in Phase 3. This is appropriate for incremental feature development.


207-240: LGTM!

The supported parameters list is comprehensive and consistent with the parameters handled in _build_params.

Comment on lines +18 to +19
Accept the terms and conditions for
https://huggingface.co/pyannote/speaker-diarization-community-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Format the bare URL as a markdown link.

Line 19 contains a bare URL that should be wrapped in markdown link syntax for proper formatting and compliance with markdown standards.

Apply this diff to format the URL correctly:

-Accept the terms and conditions for
-https://huggingface.co/pyannote/speaker-diarization-community-1
+Accept the terms and conditions for:
+[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Accept the terms and conditions for
https://huggingface.co/pyannote/speaker-diarization-community-1
Accept the terms and conditions for:
[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

19-19: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In extras/speaker-recognition/README.md around lines 18 to 19, the bare URL
should be formatted as a markdown link; replace the plain URL with markdown link
syntax, for example using the repo name as the link text (e.g.
[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1))
so the URL is not bare and renders correctly in Markdown.

Comment on lines +59 to 61
# Load environment variables (CI or local)
if [ -f ".env" ]; then
print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical syntax error: orphaned if statement.

Lines 60-61 contain an unclosed if statement that causes the "syntax error: unexpected end of file" reported by the pipeline at line 171. This appears to be leftover debug code.

Apply this diff to fix the syntax error:

-
-if [ -f ".env" ]; then
-print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')"
-
 # Load environment variables (CI or local)
 if [ -f ".env" ] && [ -z "${HF_TOKEN:-}" ]; then

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In extras/speaker-recognition/run-test.sh around lines 59 to 61 there is an
unclosed if statement causing a syntax error; close the conditional by adding a
matching "fi" after the print_info line (or remove the orphaned if/debug block
entirely) so the if [ -f ".env" ]; then ... is properly terminated and the
script no longer errors on unexpected end of file.

Comment on lines +119 to +121
except Exception as e:
log.error(f"Failed to initialize transcription orchestrator: {e}")
log.warning("Service will continue with limited functionality")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use logging.exception() for better debugging.

Per coding guidelines and learnings, use logging.exception() instead of logging.error() inside except blocks to automatically log the full stack trace. This improves debuggability when orchestrator initialization fails.

Based on learnings and static analysis.

Apply this diff:

 except Exception as e:
-    log.error(f"Failed to initialize transcription orchestrator: {e}")
+    log.exception("Failed to initialize transcription orchestrator")
     log.warning("Service will continue with limited functionality")
🧰 Tools
🪛 Ruff (0.14.8)

119-119: Do not catch blind exception: Exception

(BLE001)


120-120: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In extras/speaker-recognition/src/simple_speaker_recognition/api/service.py
around lines 119-121, replace the current except-block logging so that the
exception is logged with its full stack trace: use log.exception("Failed to
initialize transcription orchestrator") (logging.exception already records the
traceback) instead of log.error(f"...{e}"), and keep the subsequent
log.warning("Service will continue with limited functionality") unchanged.

Comment on lines 22 to 24
self.diar = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1", token=hf_token
"pyannote/speaker-diarization-community-1", token=hf_token
).to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for dependency files in the extras/speaker-recognition directory
find extras/speaker-recognition -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "poetry.lock" -o -name "setup.py" \) -exec echo "=== {} ===" \; -exec cat {} \;

Repository: chronicler-ai/chronicle

Length of output: 190


🌐 Web query:

pyannote speaker-diarization-community-1 huggingface model

💡 Result:

Short summary — pyannote/speaker-diarization-community-1 (Hugging Face)

  • What it is: a free community speaker-diarization pipeline (pyannote.audio v4.x) that runs on 16 kHz mono audio and outputs regular + exclusive diarization; improved speaker counting/assignment vs legacy models. [1]

  • Access: model card requires accepting access conditions on Hugging Face and using a Hugging Face token (or clone for offline use). [1][2]

  • Quick start (Python):

    1. pip install pyannote.audio
    2. create HF token (hf.co/settings/tokens) and accept model conditions
  1. example:
    from pyannote.audio import Pipeline
    pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-community-1", token="{HUGGINGFACE_TOKEN}")
    diarization = pipeline("audio.wav")
    for turn, speaker in diarization.speaker_diarization:
    print(f"{speaker} {turn.start:.3f}-{turn.end:.3f}s")
    (You can run the pipeline offline by cloning the repo with git lfs and loading from disk.) [1][2]
  • Notes:
    • Optional hosted/cloud variant runs on pyannoteAI servers (requires pyannoteAI API key). [3]
    • You can control num_speakers, min/max_speakers, send pipeline to GPU, or feed in-memory waveforms. [1][2]

Sources:
[1] pyannote/speaker-diarization-community-1 model card (Hugging Face). (huggingface.co)
[2] pyannote-community/speaker-diarization-community-1 model card (Hugging Face). (huggingface.co)
[3] pyannote/speaker-diarization-community-1-cloud (hosted) model card (Hugging Face). (huggingface.co)

Citations:


Confirm that pyannote.audio 4.x is pinned in the project's dependencies.

The model name "pyannote/speaker-diarization-community-1" is correct for pyannote.audio 4.x and exists on Hugging Face. The code usage matches the official documentation. However, ensure that your project's requirements file (requirements.txt, pyproject.toml, or poetry.lock) explicitly pins pyannote.audio to version 4.x, as this upgrade is a breaking change from 3.x and the code no longer includes fallback logic for the older output structure.

Comment on lines +53 to +55
from .parakeet import ParakeetProvider

return ParakeetProvider(parakeet_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Imports inside functions violate coding guidelines.

All imports must be at the top of the file. Move DeepgramProvider and ParakeetProvider imports to the top-level imports section.

As per coding guidelines: "NEVER import modules in the middle of functions or files - ALL imports must be at the top."

Apply this diff:

 from .base import BaseTranscriptionProvider, TranscriptionResult, TranscriptionWord
+from .deepgram import DeepgramProvider
+from .parakeet import ParakeetProvider
 from .utils import (
     extract_speaker_segments,
     find_speaker_at_time,

Then remove the inline imports within get_transcription_provider():

     if provider_name == "parakeet":
         parakeet_url = os.getenv("PARAKEET_ASR_URL")
         if not parakeet_url:
             raise ValueError("PARAKEET_ASR_URL required for Parakeet provider")
-        from .parakeet import ParakeetProvider
-
         return ParakeetProvider(parakeet_url)

     elif provider_name == "deepgram":
         deepgram_key = os.getenv("DEEPGRAM_API_KEY")
         if not deepgram_key:
             raise ValueError("DEEPGRAM_API_KEY required for Deepgram provider")
-        from .deepgram import DeepgramProvider
-
         return DeepgramProvider(deepgram_key)

     # Auto-detection fallback (prefers Deepgram)
     if os.getenv("DEEPGRAM_API_KEY"):
-        from .deepgram import DeepgramProvider
-
         return DeepgramProvider(os.getenv("DEEPGRAM_API_KEY"))

     if os.getenv("PARAKEET_ASR_URL"):
-        from .parakeet import ParakeetProvider
-
         return ParakeetProvider(os.getenv("PARAKEET_ASR_URL"))

Also applies to: 61-63, 67-69, 72-74

🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py
around lines 53-55 (and also apply same change for lines 61-63, 67-69, 72-74),
there are inline imports of DeepgramProvider and ParakeetProvider inside
get_transcription_provider(); move those imports to the top-level import section
of the file and remove the inline imports, then update
get_transcription_provider() to directly reference the already-imported classes
(returning DeepgramProvider(...) or ParakeetProvider(...) as appropriate) so no
imports occur inside the function.

Comment on lines +54 to +57
if response.status != 200:
error_text = await response.text()
log.error(f"Deepgram API error: {response.status} - {error_text}")
raise Exception(f"Deepgram API error: {error_text}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Improve error handling with proper exception context.

The error handling raises a generic Exception without preserving the HTTP response context. According to retrieved learnings, exceptions should be properly chained or use more specific exception types.

Apply this diff to improve the error handling:

                 if response.status != 200:
                     error_text = await response.text()
-                    log.error(f"Deepgram API error: {response.status} - {error_text}")
-                    raise Exception(f"Deepgram API error: {error_text}")
+                    error_msg = f"Deepgram API error: {response.status} - {error_text}"
+                    log.error(error_msg)
+                    raise ValueError(error_msg)

Based on learnings, prefer chaining exceptions with from e when re-raising, though in this case there's no caught exception to chain from. Using ValueError is more specific than generic Exception.

🧰 Tools
🪛 Ruff (0.14.8)

57-57: Create your own exception

(TRY002)


57-57: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
around lines 54-57, the current error handling raises a generic Exception
without preserving HTTP response details; change this to raise a more specific
ValueError containing the HTTP status and error body (use the already-read
error_text and response.status in the message) so callers get structured context
— do not swallow the original response, just include its status and text in the
ValueError message (no exception chaining required here).

Comment on lines +242 to +254
async def health_check(self) -> bool:
"""Check Deepgram API accessibility."""
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"{self.base_url}/v1/projects",
headers={"Authorization": f"Token {self.api_key}"},
timeout=aiohttp.ClientTimeout(total=5),
) as response:
return response.status == 200
except Exception as e:
log.warning(f"Deepgram health check failed: {e}")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use logging.exception() to capture full stack trace.

The exception handler logs a warning but doesn't capture the full stack trace. According to retrieved learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace for better debuggability.

Apply this diff:

         except Exception as e:
-            log.warning(f"Deepgram health check failed: {e}")
+            log.exception("Deepgram health check failed")
             return False

Based on learnings, as per guidance from AnkushMalaker in PR 178.

🧰 Tools
🪛 Ruff (0.14.8)

252-252: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
around lines 242 to 254, replace the log.warning call in the except block with
log.exception (or logging.exception if using module) so the full stack trace is
recorded; keep the existing message text (e.g., "Deepgram health check failed:
%s") and return False as before, ensuring the except block logs the exception
with stack trace and preserves behavior.

@AnkushMalaker
Copy link
Collaborator Author

Should add streaming to this PR to solve #183

@AnkushMalaker AnkushMalaker changed the base branch from main to dev December 23, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants