-
Notifications
You must be signed in to change notification settings - Fork 22
Diar 5 #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Diar 5 #195
Conversation
…n features - Updated `.env.template` to include configuration options for transcription providers (Deepgram and Parakeet) and diarization modes. - Modified `docker-compose.yml` and `docker-compose-test.yml` to support new environment variables for transcription services. - Refactored `Dockerfile` to use JSON form for CMD to allow environment variable expansion. - Added a new `laptop_client.py` script for command-line interactions with the speaker recognition service. - Introduced a transcription orchestrator to manage transcription and diarization processes. - Updated dependencies in `pyproject.toml` to use the latest version of `pyannote.audio`. - Implemented new service structure for transcription and diarization, including base classes and specific providers for Deepgram and Parakeet. - Enhanced logging and error handling for better service reliability and debugging.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a multi-provider transcription architecture for speaker-recognition. It replaces direct Deepgram HTTP integration with a provider-agnostic orchestrator pattern, adds Parakeet transcription support, creates new diarization and transcription provider abstractions, upgrades the pyannote model version, and updates environment configuration, Docker setup, and test infrastructure accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router as deepgram_wrapper
participant Orchestrator
participant TranscProvider as Transcription<br/>Provider
participant DiarProvider as Diarization<br/>Provider
participant Response
Client->>Router: POST /transcribe
activate Router
Router->>Router: get_transcription_orchestrator()
activate Router
Router->>Router: initialize if needed
deactivate Router
Router->>Orchestrator: process(audio_data, sample_rate, config)
activate Orchestrator
Orchestrator->>TranscProvider: transcribe_http(audio_data, ...)
activate TranscProvider
TranscProvider-->>Orchestrator: TranscriptionResult<br/>(text, words)
deactivate TranscProvider
Orchestrator->>Orchestrator: determine_diarization_strategy()
alt Diarization Needed
Orchestrator->>DiarProvider: diarize(audio_data, ...)
activate DiarProvider
DiarProvider-->>Orchestrator: DiarizationResult
deactivate DiarProvider
Orchestrator->>Orchestrator: enrich_words_with_speakers<br/>(words, diarization)
end
Orchestrator-->>Router: Enhanced TranscriptionResult
deactivate Orchestrator
Router->>Router: format_deepgram_response(result)
Router->>Response: Return Deepgram-compatible JSON
deactivate Router
Response-->>Client: Transcription + Optional Speakers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (12)
extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py (1)
1-1: Expose public API from services package.The services
__init__.pycurrently has no exports. Per the PR context, new diarization providers, transcription providers, orchestrator, and utilities are being introduced. Consider re-exporting the primary public interfaces (e.g.,DiarizationResult, diarization/transcription providers,TranscriptionOrchestrator, and factory functions) to provide a convenient and documented API surface for consumers of the services package.Verify which entities should be part of the public API by examining the diarization and transcription subpackages introduced in this PR. A typical pattern would be:
"""Services package for speaker recognition.""" from simple_speaker_recognition.services.diarization import ( BaseDiarizationProvider, DiarizationResult, PyannoteDiarizationProvider, ) from simple_speaker_recognition.services.transcription import ( DeepgramProvider, ParakeetProvider, TranscriptionOrchestrator, # Add factory functions if they exist ) __all__ = [ "BaseDiarizationProvider", "DiarizationResult", "PyannoteDiarizationProvider", "DeepgramProvider", "ParakeetProvider", "TranscriptionOrchestrator", ]Adjust based on the actual public API you want to expose.
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)
17-28: Consider sorting__all__for consistency.Static analysis suggests applying isort-style sorting to
__all__.__all__ = [ "BaseTranscriptionProvider", + "TranscriptionOrchestrator", "TranscriptionResult", "TranscriptionWord", - "TranscriptionOrchestrator", - "get_transcription_provider", - "get_transcription_orchestrator", + "extract_speaker_segments", + "find_speaker_at_time", "format_deepgram_response", - "find_speaker_at_time", - "extract_speaker_segments", + "get_transcription_orchestrator", + "get_transcription_provider", "get_speaker_statistics", ]extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (2)
36-46: Uselogging.exception()for full stack trace.Based on learnings, prefer
logging.exception()inside except blocks to automatically log the full stack trace.except Exception as e: - log.error(f"Failed to load Pyannote pipeline: {e}") + log.exception("Failed to load Pyannote pipeline") raise
48-50: Unusedconfigparameter.The
configparameter is accepted but never used. Either utilize it formin_speakers/max_speakersfrom the config dict, or prefix with underscore to indicate intentional non-use.Based on the relevant code snippet from
audio_backend.pyshowing diarization supportsmin_speakers,max_speakers, etc., consider:async def diarize( - self, audio_data: bytes, sample_rate: int, config: Dict + self, audio_data: bytes, sample_rate: int, config: Dict ) -> DiarizationResult: ... # Ensure pipeline is loaded self._ensure_pipeline() + + # Extract diarization parameters from config + min_speakers = config.get("min_speakers") + max_speakers = config.get("max_speakers") + + kwargs = {} + if min_speakers is not None: + kwargs["min_speakers"] = min_speakers + if max_speakers is not None: + kwargs["max_speakers"] = max_speakers ... try: # Run diarization (returns DiarizeOutput in pyannote 4.x) - diarization_output = self.diarization_pipeline(tmp_path) + diarization_output = self.diarization_pipeline(tmp_path, **kwargs)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
214-221: Rename unused loop variable to underscore prefix.The
track_annvariable is not used within the loop body.- for segment_ann, track_ann, label in overlapping.itertracks(yield_label=True): + for segment_ann, _track_ann, label in overlapping.itertracks(yield_label=True): overlap = (segment & segment_ann).durationextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (1)
28-29: Rename unused loop variables to underscore prefix.Multiple loop variables are unused and should be prefixed with underscore per convention.
- for segment, track, label in overlapping.itertracks(yield_label=True): + for _segment, _track, label in overlapping.itertracks(yield_label=True): return label- for segment, track, label in diarization.itertracks(yield_label=True): + for segment, _track, label in diarization.itertracks(yield_label=True): segments.append(Also applies to: 109-109
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (3)
91-94: Use custom exception and chain withfrom.Raising a generic
Exceptionloses context. Create a specific exception or useHTTPExceptionequivalent, and chain withfromfor context preservation.Based on learnings: "raise ... from e' to preserve the original context."
else: error_msg = f"Parakeet error: {response.status_code} - {response.text}" log.error(error_msg) - raise Exception(error_msg) + raise RuntimeError(error_msg)Or better, create a custom
TranscriptionErrorexception class.
100-102: Uselogging.exception()for full stack trace.Based on learnings, prefer
logging.exception()inside except blocks.except Exception as e: - log.error(f"Error calling Parakeet service: {e}") + log.exception("Error calling Parakeet service") raise
65-67: Unusedconfigparameter.The
configparameter is accepted but never used. Consider passinglanguagefrom config to the Parakeet service if supported.Since
get_supported_params()returns["language"], the config should be utilized:async def _parakeet_transcribe( - self, audio_data: bytes, sample_rate: int, config: Dict + self, audio_data: bytes, sample_rate: int, config: Dict ) -> Dict: """Call Parakeet ASR service.""" try: ... + # Build request params from config + params = {} + if "language" in config: + params["language"] = config["language"] + async with httpx.AsyncClient(timeout=180.0) as client: with open(tmp_filename, "rb") as f: files = {"file": ("audio.wav", f, "audio/wav")} - response = await client.post(self.transcribe_url, files=files) + response = await client.post(self.transcribe_url, files=files, params=params)extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (1)
122-130: Fix f-string, uselogging.exception(), and chain exception.Multiple issues in the error handling block:
- Line 122: f-string has no placeholders
- Line 126: Use
logging.exception()for stack trace- Line 127: Chain exception with
from eBased on learnings: prefer
logging.exception()and chain withraise ... from e.- log.info(f"Transcription complete via orchestrator") + log.info("Transcription complete via orchestrator") return deepgram_response except Exception as e: - log.error(f"Orchestrator transcription error: {e}") + log.exception("Orchestrator transcription error") raise HTTPException( status_code=500, - detail=f"Transcription error: {str(e)}" - ) + detail=f"Transcription error: {e!s}" + ) from eextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (2)
28-30: Document or utilize the unusedsample_rateparameter.The
sample_rateparameter is unused in this method. While Deepgram can infer the sample rate from audio data, consider either:
- Documenting why it's intentionally unused in the docstring
- Passing it to Deepgram if supported (e.g., via
encodingorsample_ratequery parameters)This maintains clarity about the interface contract with BaseTranscriptionProvider.
121-130: Consider merging consecutive same-speaker words into single segments.The current implementation creates a separate segment for each word, resulting in many tiny segments. For better efficiency and semantic correctness, merge consecutive words from the same speaker into single, continuous segments.
Example optimization:
# Group consecutive words by speaker into segments current_speaker = None segment_start = None segment_end = None for w_data in alternative["words"]: speaker = w_data["speaker"] if speaker != current_speaker: # Save previous segment if exists if current_speaker is not None: speaker_label = f"SPEAKER_{current_speaker}" diarization[Segment(segment_start, segment_end)] = speaker_label # Start new segment current_speaker = speaker segment_start = w_data["start"] segment_end = w_data["end"] else: # Extend current segment segment_end = w_data["end"] # Don't forget the last segment if current_speaker is not None: speaker_label = f"SPEAKER_{current_speaker}" diarization[Segment(segment_start, segment_end)] = speaker_label
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
extras/speaker-recognition/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
extras/speaker-recognition/.env.template(1 hunks)extras/speaker-recognition/Dockerfile(1 hunks)extras/speaker-recognition/README.md(3 hunks)extras/speaker-recognition/docker-compose-test.yml(1 hunks)extras/speaker-recognition/docker-compose.yml(1 hunks)extras/speaker-recognition/pyproject.toml(2 hunks)extras/speaker-recognition/run-test.sh(2 hunks)extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py(4 hunks)extras/speaker-recognition/src/simple_speaker_recognition/api/service.py(5 hunks)extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py(2 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/__init__.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py(1 hunks)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/services/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.pyextras/speaker-recognition/src/simple_speaker_recognition/api/service.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.pyextras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.pyextras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.pyextras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.
Applied to files:
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/services/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.pyextras/speaker-recognition/src/simple_speaker_recognition/api/service.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.pyextras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.pyextras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.pyextras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.pyextras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.pyextras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
🧬 Code graph analysis (8)
extras/speaker-recognition/run-test.sh (2)
backends/advanced/run-test.sh (1)
print_info(16-18)run-test.sh (1)
print_info(17-19)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (2)
extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py (1)
diarize(53-100)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (1)
health_check(92-94)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (1)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (4)
TranscriptionResult(33-56)TranscriptionWord(10-30)start(23-25)end(28-30)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (6)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (3)
BaseTranscriptionProvider(59-94)TranscriptionResult(33-56)TranscriptionWord(10-30)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (4)
extract_speaker_segments(98-118)find_speaker_at_time(10-31)format_deepgram_response(34-95)get_speaker_statistics(121-154)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
TranscriptionOrchestrator(20-245)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)
ParakeetProvider(18-137)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (1)
DeepgramProvider(14-254)extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (1)
get_transcription_orchestrator(57-65)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (3)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (2)
format_deepgram_response(34-95)extract_speaker_segments(98-118)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)
get_transcription_orchestrator(83-114)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
process(76-140)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (7)
extras/speaker-recognition/src/simple_speaker_recognition/api/service.py (1)
Config(41-45)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py (2)
Config(17-18)health_check(42-44)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (5)
has_native_diarization(24-26)transcribe_http(28-63)transcribe_stream(197-205)get_supported_params(207-240)health_check(242-254)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (5)
has_native_diarization(31-33)transcribe_http(35-63)transcribe_stream(105-116)get_supported_params(118-120)health_check(122-137)app/metro.config.js (1)
config(8-8)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)
health_check(101-108)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
health_check(224-245)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (4)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (8)
BaseTranscriptionProvider(59-94)TranscriptionResult(33-56)TranscriptionWord(10-30)has_native_diarization(68-70)transcribe_http(73-77)transcribe_stream(80-84)get_supported_params(87-89)health_check(92-94)extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (5)
has_native_diarization(31-33)transcribe_http(35-63)transcribe_stream(105-116)get_supported_params(118-120)health_check(122-137)extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)
health_check(101-108)extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (1)
health_check(224-245)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (8)
BaseTranscriptionProvider(59-94)TranscriptionResult(33-56)TranscriptionWord(10-30)has_native_diarization(68-70)transcribe_http(73-77)transcribe_stream(80-84)get_supported_params(87-89)health_check(92-94)
🪛 GitHub Actions: Speaker Recognition Tests
extras/speaker-recognition/run-test.sh
[error] 171-171: syntax error: unexpected end of file
🪛 markdownlint-cli2 (0.18.1)
extras/speaker-recognition/README.md
19-19: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.14.8)
extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py
28-28: Avoid specifying long messages outside the exception class
(TRY003)
45-45: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
49-49: Unused method argument: config
(ARG002)
105-105: Consider moving this statement to an else block
(TRY300)
106-106: Do not catch blind exception: Exception
(BLE001)
extras/speaker-recognition/src/simple_speaker_recognition/api/service.py
119-119: Do not catch blind exception: Exception
(BLE001)
120-120: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py
28-28: Loop control variable segment not used within loop body
Rename unused segment to _segment
(B007)
28-28: Loop control variable track not used within loop body
Rename unused track to _track
(B007)
109-109: Loop control variable track not used within loop body
Rename unused track to _track
(B007)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py
17-28: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
52-52: Avoid specifying long messages outside the exception class
(TRY003)
60-60: Avoid specifying long messages outside the exception class
(TRY003)
76-80: Avoid specifying long messages outside the exception class
(TRY003)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py
122-122: f-string without any placeholders
Remove extraneous f prefix
(F541)
123-123: Consider moving this statement to an else block
(TRY300)
125-125: Do not catch blind exception: Exception
(BLE001)
126-126: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
127-130: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
129-129: Use explicit conversion flag
Replace with conversion flag
(RUF010)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
29-29: Unused method argument: sample_rate
(ARG002)
57-57: Create your own exception
(TRY002)
57-57: Avoid specifying long messages outside the exception class
(TRY003)
252-252: Do not catch blind exception: Exception
(BLE001)
extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py
66-69: Avoid specifying long messages outside the exception class
(TRY003)
216-216: Loop control variable track_ann not used within loop body
Rename unused track_ann to _track_ann
(B007)
241-241: Do not catch blind exception: Exception
(BLE001)
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py
66-66: Unused method argument: config
(ARG002)
94-94: Create your own exception
(TRY002)
101-101: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
129-129: Do not catch blind exception: Exception
(BLE001)
136-136: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (34)
extras/speaker-recognition/README.md (1)
832-832: Laptop client path updates are consistent.All references to the laptop client have been properly updated from
laptop_client.pytoscripts/laptop_client.pyacross the examples. The changes align with the actual relocated script path.Also applies to: 857-857, 860-860, 863-863, 866-866, 869-869, 872-872
extras/speaker-recognition/pyproject.toml (2)
42-43: Reconcile inconsistent PyTorch version constraints across variants.The
cpuvariant pinstorch>=2.0.0andtorchaudio>=2.0.0, whilecu126andcu128now require>=2.8.0. This creates a dependency resolution mismatch:
- Users installing with the
cpuextra will get older PyTorch versions- Users installing with
cu126orcu128will get newer versions- If shared code depends on features only available in PyTorch 2.8+, CPU users may encounter runtime errors
Either:
- Upgrade the
cpuvariant to match the CUDA variants, or- Document the minimum PyTorch version if 2.0.0 is intentionally lower for compatibility reasons
Confirm that pyannote.audio 4.0.0 works with torch 2.0.0 to justify the discrepancy.
Also applies to: 47-48, 52-53
11-11: Pyannote.audio 4.0.0 compatibility confirmed.The upgrade to
pyannote.audio>=4.0.0introduces breaking changes: Python >= 3.10 requirement (satisfied in pyproject.toml line 5) and removal of sox/soundfile I/O backends. The codebase is compatible with these changes—soundfile is used only for writing temporary audio files before passing file paths to pyannote's pipeline, which handles I/O via ffmpeg or in-memory backends. Code explicitly handles pyannote.audio 4.x API (e.g., DiarizeOutput handling inservices/diarization/pyannote.pylines 79-80).extras/speaker-recognition/src/simple_speaker_recognition/core/audio_backend.py (1)
83-85: Correct attribute usage for pyannote.audio 4.x.The code correctly accesses
output.speaker_diarizationfrom theDiarizeOutputwrapper in pyannote.audio 4.x. This attribute name is correct and returns apyannote.core.Annotationobject. The direct attribute access without defensive checks aligns with the coding guidelines (understand data structures rather than addinghasattr()checks), and the pattern is used consistently throughout the codebase.extras/speaker-recognition/Dockerfile (1)
45-46: LGTM! Proper JSON form CMD with environment variable support.The switch from shell form to JSON form CMD with
sh -cwrapper correctly enables proper signal handling while preserving environment variable expansion. This aligns with Docker best practices and the coding guideline to useuv runfor Python execution.extras/speaker-recognition/docker-compose-test.yml (1)
8-8: LGTM! Variable rename improves consistency.The change from
COMPUTE_MODEtoPYTORCH_CUDA_VERSIONaligns the test configuration with the Dockerfile's ARG name and provides more specific naming. The default value ofcpuremains appropriate for test environments.extras/speaker-recognition/.env.template (1)
42-71: Excellent configuration documentation!The new transcription and diarization configuration block is well-structured with clear section headers, comprehensive inline documentation, and sensible defaults. The multi-provider approach with auto-detection fallback provides good flexibility for different deployment scenarios.
extras/speaker-recognition/docker-compose.yml (1)
29-32: LGTM! Environment variables properly configured.The new
TRANSCRIPTION_PROVIDERandPARAKEET_ASR_URLenvironment variables are well-integrated with sensible defaults that match the.env.templatedocumentation. This enables runtime provider selection for the new orchestrator architecture.extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/__init__.py (1)
1-6: LGTM! Clean package interface.The package initialization properly defines the public API with explicit
__all__and follows coding guidelines by placing all imports at the top. The module provides a clear abstraction for diarization providers.extras/speaker-recognition/run-test.sh (2)
64-102: Environment loading logic is well-structured.Once the syntax error at lines 60-61 is fixed, the environment loading logic properly handles both CI and local development scenarios. The CI path creates a
.envfile for downstream compatibility, and error messages clearly guide users on required credentials.
115-118: Good security practice logging token lengths.Reporting the length of
HF_TOKENandDEEPGRAM_API_KEYinstead of their actual values provides verification that credentials are loaded while maintaining security.extras/speaker-recognition/src/simple_speaker_recognition/api/service.py (3)
20-23: LGTM! Imports properly organized.The new transcription orchestrator imports are correctly placed at the top of the file with other local imports, following the coding guidelines.
39-39: LGTM! Settings field properly configured.The
diarization_modefield is well-documented with a sensible default of "auto" that aligns with the.env.templateconfiguration.
66-66: LGTM! Global variable follows existing pattern.The
transcription_orchestratorglobal follows the same pattern as other service resources (audio_backend,speaker_db) and is properly typed asOptionalto indicate potential unavailability.extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/base.py (2)
10-18: LGTM! Well-designed result model.The
DiarizationResultmodel is properly configured witharbitrary_types_allowed = Trueto support pyannote'sAnnotationtype. The optionalexclusive_diarizationfield and extensiblemetadatadictionary provide good flexibility for different diarization backends.
21-44: LGTM! Clean provider abstraction.The
BaseDiarizationProviderabstract class follows Python conventions with clear async method signatures and comprehensive docstrings. The abstraction provides a clean contract for implementing different diarization backends.extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py (1)
83-114: LGTM!The
get_transcription_orchestratorfactory function correctly builds the orchestrator with provider, diarization mode, and token configuration.extras/speaker-recognition/src/simple_speaker_recognition/services/diarization/pyannote.py (1)
101-108: LGTM with minor suggestion.The health check logic is correct. The static analysis suggests moving Line 105 to an
elseblock, but this is a minor style preference that doesn't affect correctness.extras/speaker-recognition/src/simple_speaker_recognition/services/orchestrator.py (3)
30-61: LGTM!The orchestrator initialization properly validates diarization modes, falls back gracefully on invalid input, and supports lazy-loading of the Pyannote provider.
76-140: LGTM!The
processmethod correctly orchestrates the transcription-diarization pipeline:
- Transcribes via the configured provider
- Determines diarization strategy based on mode and provider capabilities
- Runs standalone Pyannote if needed
- Enriches words with speaker labels
178-195: In-place mutation of input list.
_enrich_words_with_speakersmodifies thewordslist in-place by settingword.speaker. While it returns the list, the caller's original list is also mutated. This is currently fine since Line 137 reassignstranscription_result.words, but could be surprising.Consider whether this should create new
TranscriptionWordinstances to avoid side effects, or document that this mutates in-place.extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/utils.py (2)
34-95: LGTM!The
format_deepgram_responsefunction correctly converts the internalTranscriptionResultto Deepgram-compatible JSON format with proper handling of optional fields and speaker label conversion.
121-154: LGTM!The
get_speaker_statisticsfunction properly guards against empty diarization and division by zero, and uses pyannote'schart()method correctly.extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/parakeet.py (1)
35-63: LGTM!The
transcribe_httpmethod correctly converts Parakeet response toTranscriptionResultwith properTranscriptionWordconstruction.extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/base.py (3)
10-31: LGTM!
TranscriptionWordis well-designed with pyannoteSegmentfor timing, backward-compatiblestart/endproperties, and proper Pydantic configuration for arbitrary types.
33-57: LGTM!
TranscriptionResultprovides a clean data model with optional diarization fields, helper methods for speaker timeline/chart access, and sensible defaults.
59-94: LGTM!
BaseTranscriptionProviderdefines a clear abstract interface with sensible defaults forsupports_streamingandhas_native_diarization. The abstract methods cover HTTP transcription, streaming, parameter discovery, and health checks.extras/speaker-recognition/src/simple_speaker_recognition/api/routers/deepgram_wrapper.py (3)
57-66: LGTM!The
get_transcription_orchestrator()helper properly retrieves the orchestrator from the service module with appropriate error handling for uninitialized state.
90-112: LGTM!The
transcribe_with_providerfunction correctly integrates with the orchestrator pattern, builds provider-agnostic config, and converts results to Deepgram format for backward compatibility.
596-624: LGTM!The endpoint correctly uses the provider abstraction for transcription and conditionally applies speaker enhancement when
user_idis provided.extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py (4)
23-26: LGTM!Correctly identifies that Deepgram has built-in speaker diarization capabilities.
142-195: LGTM!The parameter building logic correctly handles type conversions (booleans to lowercase strings, lists to comma-separated values) and provides comprehensive coverage of Deepgram API parameters.
197-205: LGTM!Clear TODO with NotImplementedError for streaming support planned in Phase 3. This is appropriate for incremental feature development.
207-240: LGTM!The supported parameters list is comprehensive and consistent with the parameters handled in
_build_params.
| Accept the terms and conditions for | ||
| https://huggingface.co/pyannote/speaker-diarization-community-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format the bare URL as a markdown link.
Line 19 contains a bare URL that should be wrapped in markdown link syntax for proper formatting and compliance with markdown standards.
Apply this diff to format the URL correctly:
-Accept the terms and conditions for
-https://huggingface.co/pyannote/speaker-diarization-community-1
+Accept the terms and conditions for:
+[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Accept the terms and conditions for | |
| https://huggingface.co/pyannote/speaker-diarization-community-1 | |
| Accept the terms and conditions for: | |
| [pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
19-19: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In extras/speaker-recognition/README.md around lines 18 to 19, the bare URL
should be formatted as a markdown link; replace the plain URL with markdown link
syntax, for example using the repo name as the link text (e.g.
[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1))
so the URL is not bare and renders correctly in Markdown.
| # Load environment variables (CI or local) | ||
| if [ -f ".env" ]; then | ||
| print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical syntax error: orphaned if statement.
Lines 60-61 contain an unclosed if statement that causes the "syntax error: unexpected end of file" reported by the pipeline at line 171. This appears to be leftover debug code.
Apply this diff to fix the syntax error:
-
-if [ -f ".env" ]; then
-print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')"
-
# Load environment variables (CI or local)
if [ -f ".env" ] && [ -z "${HF_TOKEN:-}" ]; thenCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In extras/speaker-recognition/run-test.sh around lines 59 to 61 there is an
unclosed if statement causing a syntax error; close the conditional by adding a
matching "fi" after the print_info line (or remove the orphaned if/debug block
entirely) so the if [ -f ".env" ]; then ... is properly terminated and the
script no longer errors on unexpected end of file.
| except Exception as e: | ||
| log.error(f"Failed to initialize transcription orchestrator: {e}") | ||
| log.warning("Service will continue with limited functionality") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use logging.exception() for better debugging.
Per coding guidelines and learnings, use logging.exception() instead of logging.error() inside except blocks to automatically log the full stack trace. This improves debuggability when orchestrator initialization fails.
Based on learnings and static analysis.
Apply this diff:
except Exception as e:
- log.error(f"Failed to initialize transcription orchestrator: {e}")
+ log.exception("Failed to initialize transcription orchestrator")
log.warning("Service will continue with limited functionality")🧰 Tools
🪛 Ruff (0.14.8)
119-119: Do not catch blind exception: Exception
(BLE001)
120-120: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In extras/speaker-recognition/src/simple_speaker_recognition/api/service.py
around lines 119-121, replace the current except-block logging so that the
exception is logged with its full stack trace: use log.exception("Failed to
initialize transcription orchestrator") (logging.exception already records the
traceback) instead of log.error(f"...{e}"), and keep the subsequent
log.warning("Service will continue with limited functionality") unchanged.
| self.diar = Pipeline.from_pretrained( | ||
| "pyannote/speaker-diarization-3.1", token=hf_token | ||
| "pyannote/speaker-diarization-community-1", token=hf_token | ||
| ).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for dependency files in the extras/speaker-recognition directory
find extras/speaker-recognition -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "poetry.lock" -o -name "setup.py" \) -exec echo "=== {} ===" \; -exec cat {} \;Repository: chronicler-ai/chronicle
Length of output: 190
🌐 Web query:
pyannote speaker-diarization-community-1 huggingface model
💡 Result:
Short summary — pyannote/speaker-diarization-community-1 (Hugging Face)
-
What it is: a free community speaker-diarization pipeline (pyannote.audio v4.x) that runs on 16 kHz mono audio and outputs regular + exclusive diarization; improved speaker counting/assignment vs legacy models. [1]
-
Access: model card requires accepting access conditions on Hugging Face and using a Hugging Face token (or clone for offline use). [1][2]
-
Quick start (Python):
- pip install pyannote.audio
- create HF token (hf.co/settings/tokens) and accept model conditions
- example:
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-community-1", token="{HUGGINGFACE_TOKEN}")
diarization = pipeline("audio.wav")
for turn, speaker in diarization.speaker_diarization:
print(f"{speaker} {turn.start:.3f}-{turn.end:.3f}s")
(You can run the pipeline offline by cloning the repo with git lfs and loading from disk.) [1][2]
- Notes:
- Optional hosted/cloud variant runs on pyannoteAI servers (requires pyannoteAI API key). [3]
- You can control num_speakers, min/max_speakers, send pipeline to GPU, or feed in-memory waveforms. [1][2]
Sources:
[1] pyannote/speaker-diarization-community-1 model card (Hugging Face). (huggingface.co)
[2] pyannote-community/speaker-diarization-community-1 model card (Hugging Face). (huggingface.co)
[3] pyannote/speaker-diarization-community-1-cloud (hosted) model card (Hugging Face). (huggingface.co)
Citations:
- 1: https://huggingface.co/pyannote/speaker-diarization-community-1?utm_source=openai
- 2: https://huggingface.co/pyannote-community/speaker-diarization-community-1?utm_source=openai
- 3: https://huggingface.co/pyannote/speaker-diarization-community-1-cloud?utm_source=openai
Confirm that pyannote.audio 4.x is pinned in the project's dependencies.
The model name "pyannote/speaker-diarization-community-1" is correct for pyannote.audio 4.x and exists on Hugging Face. The code usage matches the official documentation. However, ensure that your project's requirements file (requirements.txt, pyproject.toml, or poetry.lock) explicitly pins pyannote.audio to version 4.x, as this upgrade is a breaking change from 3.x and the code no longer includes fallback logic for the older output structure.
| from .parakeet import ParakeetProvider | ||
|
|
||
| return ParakeetProvider(parakeet_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Imports inside functions violate coding guidelines.
All imports must be at the top of the file. Move DeepgramProvider and ParakeetProvider imports to the top-level imports section.
As per coding guidelines: "NEVER import modules in the middle of functions or files - ALL imports must be at the top."
Apply this diff:
from .base import BaseTranscriptionProvider, TranscriptionResult, TranscriptionWord
+from .deepgram import DeepgramProvider
+from .parakeet import ParakeetProvider
from .utils import (
extract_speaker_segments,
find_speaker_at_time,Then remove the inline imports within get_transcription_provider():
if provider_name == "parakeet":
parakeet_url = os.getenv("PARAKEET_ASR_URL")
if not parakeet_url:
raise ValueError("PARAKEET_ASR_URL required for Parakeet provider")
- from .parakeet import ParakeetProvider
-
return ParakeetProvider(parakeet_url)
elif provider_name == "deepgram":
deepgram_key = os.getenv("DEEPGRAM_API_KEY")
if not deepgram_key:
raise ValueError("DEEPGRAM_API_KEY required for Deepgram provider")
- from .deepgram import DeepgramProvider
-
return DeepgramProvider(deepgram_key)
# Auto-detection fallback (prefers Deepgram)
if os.getenv("DEEPGRAM_API_KEY"):
- from .deepgram import DeepgramProvider
-
return DeepgramProvider(os.getenv("DEEPGRAM_API_KEY"))
if os.getenv("PARAKEET_ASR_URL"):
- from .parakeet import ParakeetProvider
-
return ParakeetProvider(os.getenv("PARAKEET_ASR_URL"))Also applies to: 61-63, 67-69, 72-74
🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/__init__.py
around lines 53-55 (and also apply same change for lines 61-63, 67-69, 72-74),
there are inline imports of DeepgramProvider and ParakeetProvider inside
get_transcription_provider(); move those imports to the top-level import section
of the file and remove the inline imports, then update
get_transcription_provider() to directly reference the already-imported classes
(returning DeepgramProvider(...) or ParakeetProvider(...) as appropriate) so no
imports occur inside the function.
| if response.status != 200: | ||
| error_text = await response.text() | ||
| log.error(f"Deepgram API error: {response.status} - {error_text}") | ||
| raise Exception(f"Deepgram API error: {error_text}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling with proper exception context.
The error handling raises a generic Exception without preserving the HTTP response context. According to retrieved learnings, exceptions should be properly chained or use more specific exception types.
Apply this diff to improve the error handling:
if response.status != 200:
error_text = await response.text()
- log.error(f"Deepgram API error: {response.status} - {error_text}")
- raise Exception(f"Deepgram API error: {error_text}")
+ error_msg = f"Deepgram API error: {response.status} - {error_text}"
+ log.error(error_msg)
+ raise ValueError(error_msg)Based on learnings, prefer chaining exceptions with from e when re-raising, though in this case there's no caught exception to chain from. Using ValueError is more specific than generic Exception.
🧰 Tools
🪛 Ruff (0.14.8)
57-57: Create your own exception
(TRY002)
57-57: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
around lines 54-57, the current error handling raises a generic Exception
without preserving HTTP response details; change this to raise a more specific
ValueError containing the HTTP status and error body (use the already-read
error_text and response.status in the message) so callers get structured context
— do not swallow the original response, just include its status and text in the
ValueError message (no exception chaining required here).
| async def health_check(self) -> bool: | ||
| """Check Deepgram API accessibility.""" | ||
| try: | ||
| async with aiohttp.ClientSession() as session: | ||
| async with session.get( | ||
| f"{self.base_url}/v1/projects", | ||
| headers={"Authorization": f"Token {self.api_key}"}, | ||
| timeout=aiohttp.ClientTimeout(total=5), | ||
| ) as response: | ||
| return response.status == 200 | ||
| except Exception as e: | ||
| log.warning(f"Deepgram health check failed: {e}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging.exception() to capture full stack trace.
The exception handler logs a warning but doesn't capture the full stack trace. According to retrieved learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace for better debuggability.
Apply this diff:
except Exception as e:
- log.warning(f"Deepgram health check failed: {e}")
+ log.exception("Deepgram health check failed")
return FalseBased on learnings, as per guidance from AnkushMalaker in PR 178.
🧰 Tools
🪛 Ruff (0.14.8)
252-252: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In
extras/speaker-recognition/src/simple_speaker_recognition/services/transcription/deepgram.py
around lines 242 to 254, replace the log.warning call in the except block with
log.exception (or logging.exception if using module) so the full stack trace is
recorded; keep the existing message text (e.g., "Deepgram health check failed:
%s") and return False as before, ensuring the except block logs the exception
with stack trace and preserves behavior.
|
Should add streaming to this PR to solve #183 |
Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.