Skip to content

vincentamato/cohere-transcribe-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cohere Transcribe CLI

Speaker-diarized transcription CLI powered by Cohere Transcribe and pyannote

Prerequisites

  • Python 3.12+
  • uv
  • ffmpeg (required for YouTube downloads)

Setup

1. Install dependencies

uv sync

2. Authenticate with Hugging Face

Both the Cohere Transcribe and pyannote speaker-diarization models are gated. You must:

  1. Accept the terms on each model page linked above.
  2. Log in with the Hugging Face CLI:
uvx hf auth login

This will prompt you for a User Access Token with read access.

Usage

Transcribe a local audio file

uv run python main.py --audio recording.wav

Transcribe from YouTube

uv run python main.py --youtube "https://www.youtube.com/watch?v=VIDEO_ID"

Options

Flag Default Description
--audio Path to a local audio file
--youtube YouTube URL to download and transcribe
--language en Language code (e.g. en, fr, de, es, ja, zh)
--num-speakers auto Fixed number of speakers (auto-detected if omitted)
--backend auto ASR backend: mlx, cuda, cpu, or auto
--output transcription.txt Output file path
--merge-gap 0.35 Max gap (seconds) to merge same-speaker segments
--min-island 0.20 Min duration (seconds) for isolated speaker segments
--left-pad 0.35 Padding before each segment (seconds)
--right-pad 0.05 Padding after each segment (seconds)

Backend selection

By default (--backend auto), the CLI picks the best available backend:

  • MLX on Apple silicon
  • CUDA on systems with an NVIDIA GPU
  • CPU otherwise

You can override with --backend mlx, --backend cuda, or --backend cpu. Passing --backend cpu also forces diarization to run on CPU.

Example

uv run python main.py --audio meeting.wav --num-speakers 3 --output meeting.txt

Output (meeting.txt):

SPEAKER_00 [00:00.00 - 00:12.34]:
Welcome everyone to the meeting. Let's start with the first agenda item.

SPEAKER_01 [00:12.34 - 00:25.67]:
Thanks. I wanted to discuss the timeline for the next release.

About

Speaker-diarized transcription CLI powered by Cohere Transcribe and pyannote

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages