Transcribe Me is a CLI-driven Python application that transcribes audio files using either the OpenAI Whisper API or AssemblyAI.
graph TD
A[Load Config] --> B[Get Audio Files]
B --> C{Audio File Exists?}
C --Yes--> D{Use AssemblyAI?}
D --Yes--> E[Transcribe with AssemblyAI]
D --No--> F[Transcribe with OpenAI]
E --> G[Generate Additional Outputs]
F --> I[Save Transcription]
G --> I
I --> K[Clean Up Temporary Files]
K --> B
C --No--> L[Print Warning]
L --> B
Starting from version 1.0.0, you need to explicitly install the provider(s) you want to use. The package no longer installs all providers by default to reduce unnecessary dependencies. This helps keep your environment lean by only installing what you actually need.
- Audio Transcription: Transcribes audio files using either the OpenAI Whisper API or AssemblyAI. It supports both MP3 and M4A formats.
- AssemblyAI Features: When using AssemblyAI, provides additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
- Supports Audio Files: Supports audio files in
.m4aand.mp3formats. - Supports Docker: Can be run in a Docker container for easy deployment and reproducibility.
Tool has been tested with Python 3.12.
This has been tested with macOS, your mileage may vary on other operating systems like Windows, WSL or Linux.
-
Install Python. Recommended way is to use asdf:
brew install asdf asdf plugin add python asdf install python 3.12.0 asdf global python 3.12.0
-
Install FFmpeg using Homebrew:
brew install ffmpeg
-
Install the application using pip. You'll need to specify which provider(s) you want to use:
-
For OpenAI only:
pip install "transcribe-me[openai]" -
For AssemblyAI only:
pip install "transcribe-me[assemblyai]" -
For all providers:
pip install "transcribe-me[all]"
Or if you're installing from source:
# Clone the repository git clone https://github.com/echohello-dev/transcribe-me.git cd transcribe-me # Install with the desired providers pip install -e ".[openai]" # For OpenAI # or pip install -e ".[assemblyai]" # For AssemblyAI # or pip install -e ".[all]" # For all providers
-
-
Bootstrap your current directory with the configuration file:
transcribe-me install
This command will create a
.transcribe.yamlfile in your current directory and prompt you to enter your API keys for OpenAI and AssemblyAI if they are not already provided in environment variables. -
Set up your API keys (if not already done during installation):
# For OpenAI export OPENAI_API_KEY=your_openai_api_key # For AssemblyAI export ASSEMBLYAI_API_KEY=your_assemblyai_api_key
-
Place your audio files (mp3 or m4a format) in the
inputdirectory (or any directory specified in your configuration). -
Run the application:
transcribe-me
The application will process each audio file in the input directory and save the transcriptions to the output directory.
-
(Optional) Archive processed files after transcription:
transcribe-me archive
When running Transcribe Me, the provider used for transcription is determined by your configuration file. By default, OpenAI is used, but you can switch to AssemblyAI by setting use_assemblyai: true in your .transcribe.yaml file.
Make sure you've installed the appropriate provider package as described in the installation section. If you try to use a provider that isn't installed, you'll receive a helpful error message with instructions on how to install the missing dependency.
The transcribe-me command supports several options:
# Display help information
transcribe-me --help
# Specify a custom configuration file
transcribe-me --config /path/to/custom/config.yaml
# Run in verbose mode for detailed output
transcribe-me --verbose
# Run in debug mode for even more detailed logging
transcribe-me --debugThe .transcribe.yaml file controls the behavior of the application. Here's a comprehensive example with all available options:
# Transcription service selection
use_assemblyai: false # Set to true to use AssemblyAI instead of OpenAI
# Folder Configuration
input_folder: input # Directory containing audio files to transcribe
output_folder: output # Directory where transcriptions will be saved
archive_folder: archive # Directory for archived files (optional)
# AssemblyAI-specific options (when use_assemblyai is true)
assemblyai_options:
speech_model: nano # Options: base, nano, large
speaker_labels: true # Enable speaker diarization
summarization: true # Generate summary
sentiment_analysis: true # Generate sentiment analysis
iab_categories: true # Generate topic detection
# OpenAI-specific options (when use_assemblyai is false)
openai_options:
model: whisper-1 # Whisper model to useProcess only specific audio files:
# Transcribe a single file
transcribe-me --file path/to/your/audio.mp3
# Transcribe multiple files
transcribe-me --files file1.mp3,file2.mp3You can specify custom output formats in your configuration:
output_format:
include_timestamps: true # Include timestamps in transcription
include_speakers: true # Include speaker labels (AssemblyAI only)
text_only: false # Output only plain text (no JSON)For large audio files, the application automatically splits them into smaller chunks for processing with OpenAI:
splitting_options:
chunk_size_seconds: 600 # Split files into 10-minute chunks
overlap_seconds: 5 # 5-second overlap between chunksYou can also run the application using Docker. The Docker image comes with all providers pre-installed. If you're building your own Docker image, you can choose which providers to include.
-
Install Docker on your machine by following the instructions on the Docker website.
-
Pull the pre-built image:
docker pull ghcr.io/echohello-dev/transcribe-me:latest
Or build your own image with specific providers:
FROM python:3.12-slim # Install FFmpeg RUN apt-get update && apt-get install -y ffmpeg # Copy the application code COPY . /app WORKDIR /app # Install the package with the desired providers # Choose one of the following: RUN pip install -e ".[openai]" # For OpenAI only # RUN pip install -e ".[assemblyai]" # For AssemblyAI only # RUN pip install -e ".[all]" # For all providers ENTRYPOINT ["transcribe-me"]
-
Create a
.transcribe.yamlconfiguration file:touch .transcribe.yaml docker run \ --rm \ -v $(pwd)/.transcribe.yaml:/app/.transcribe.yaml \ ghcr.io/echohello-dev/transcribe-me:latest install -
Run the following command to run the application in Docker:
docker run \ --rm \ -e OPENAI_API_KEY \ -e ASSEMBLYAI_API_KEY \ -v $(pwd)/archive:/app/archive \ -v $(pwd)/input:/app/input \ -v $(pwd)/output:/app/output \ -v $(pwd)/.transcribe.yaml:/app/.transcribe.yaml \ ghcr.io/echohello-dev/transcribe-me:latestThis command mounts the
inputandoutputdirectories and the.transcribe.yamlconfiguration file into the Docker container. -
(Optional) We can also run the application using the provided
docker-compose.ymlfile:version: '3' services: transcribe-me: image: ghcr.io/echohello-dev/transcribe-me:latest environment: - OPENAI_API_KEY - ASSEMBLYAI_API_KEY volumes: - ./input:/app/input - ./output:/app/output - ./archive:/app/archive - ./.transcribe.yaml:/app/.transcribe.yaml
Run the following command to start the application using Docker Compose:
docker compose run --rm transcribe-me
This command mounts the
input,output,archive, and.transcribe.yamlconfiguration file into the Docker container. Seecompose.example.yamlfor an example configuration.Make sure to replace
OPENAI_API_KEYandASSEMBLYAI_API_KEYwith your actual API keys. Also make sure to create the.transcribe.yamlconfiguration file in the same directory as thedocker-compose.ymlfile.
The Transcribe Me application follows a straightforward workflow:
- Load Configuration: The application loads the configuration from the
.transcribe.yamlfile, which includes settings for input/output directories and transcription service. - Get Audio Files: The application gets a list of audio files from the input directory specified in the configuration.
- Check Existing Transcriptions: For each audio file, the application checks if there is an existing transcription file. If a transcription file exists, it skips to the next audio file.
- Transcribe Audio File: If no transcription file exists, the application transcribes the audio file using either the OpenAI Whisper API or AssemblyAI, based on the configuration.
- Generate Outputs:
- For OpenAI: The application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
- For AssemblyAI: The application generates additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
- Save Transcription and Outputs: The application saves the transcription and all generated outputs to separate files in the output directory.
- Clean Up Temporary Files: The application removes any temporary files generated during the transcription process.
- Repeat: The process repeats for each audio file in the input directory.
The application uses a configuration file (.transcribe.yaml) to specify settings such as input/output directories, API keys, models, and their configurations. The configuration file is created automatically when you run the transcribe-me install command.
Here is an example configuration file:
use_assemblyai: false # Set to true to use AssemblyAI instead of OpenAI for transcription
input_folder: input
output_folder: outputfreeze: Saves the installed Python package versions to therequirements.txtfile.install-cli: Installs the application as a command-line interface (CLI) tool.
- The application requires API keys for both OpenAI and Anthropic. These keys are not provided with the application and must be obtained separately.
- The application is designed to run on a single machine and does not support distributed processing. As a result, the speed of transcription and summary generation is limited by the performance of the machine it is running on.
- The application does not support real-time transcription or summary generation. It processes audio files one at a time and must complete the transcription and summary generation for each file before moving on to the next one.
-
Clone the repository.
-
Install the required tools using ASDF (for managing tool versions) and Homebrew (for installing dependencies):
- Install ASDF:
brew install asdf
- Install FFmpeg using Homebrew:
brew install ffmpeg
-
Install the package with pip. You can choose which providers to install:
-
For OpenAI only:
pip install -e ".[openai]" -
For AssemblyAI only:
pip install -e ".[assemblyai]" -
For all providers:
pip install -e ".[all]"
Or using uvx:
-
For OpenAI only:
uvx install -e ".[openai]" -
For AssemblyAI only:
uvx install -e ".[assemblyai]" -
For all providers:
uvx install -e ".[all]"
-
-
Install the Python dependencies and create a virtual environment:
make install
-
Run the
transcribe-me installcommand to create the.transcribe.yamlconfiguration file and provide your API keys for OpenAI and AssemblyAI:make transcribe-install
-
(Optional) Install the application as a command-line interface (CLI) tool:
make install-cli