Small command-line utilities for preparing audio and running speech recognition with Yandex SpeechKit.
.envstores local project defaults for SpeechKit and Object Storage credentials/settings.inspect_audio.pyinspects a local audio file and prints a JSON summary with a SpeechKit usage recommendation.object_storage_upload.pyuploads a local file to Yandex Object Storage using the S3-compatible API.object_storage_presign.pygenerates a private pre-signed download URL for an uploaded object.prepare_audio.pyconverts source media into SpeechKit-friendly formats withffmpeg.split_audio_by_size.pysplits a local audio file into chunks under a target file size.speechkit_sync_recognize.pysends a local audio file to the synchronous SpeechKit STT REST endpoint.speechkit_async_recognize_v3.pysubmits an Object Storage file URL to the async SpeechKit STT v3 API and can poll until completion.transcribe_file_async.pyprepares a local file, uploads it to Object Storage, creates a private URL, and runs async recognition.transcribe_local_in_parts.pysplits a large local file and transcribes each chunk through the synchronous SpeechKit endpoint.
- Python 3.11+
ffmpegfor audio conversionffprobefor richer audio inspection output- Yandex Cloud credentials:
YANDEX_API_KEYorAPI_KEYYANDEX_IAM_TOKENorIAM_TOKEN
- Yandex Object Storage static access credentials in
.envor process env:YANDEX_STORAGE_ACCESS_KEYorACCESS_KEYYANDEX_STORAGE_SECRET_KEYorSECRET_KEYYANDEX_STORAGE_BUCKET
Inspect a file:
python3 inspect_audio.py ./audio.wavConvert audio to mono 16 kHz WAV:
python3 prepare_audio.py ./input.mp3 ./output.wav --format wav --sample-rate 16000 --channels 1Split a large recording into chunks smaller than 20 MiB:
python3 split_audio_by_size.py ./meeting.mp3 ./chunks --max-size-mb 20Split a large local file and transcribe each chunk:
export YANDEX_API_KEY=your_api_key
python3 transcribe_local_in_parts.py ./meeting.mp3 ./meeting_workdir --max-size-mb 20 --lang ru-RURun synchronous recognition for a local file:
export YANDEX_API_KEY=your_api_key
python3 speechkit_sync_recognize.py ./output.wav --lang ru-RU --topic generalUpload a file to Object Storage:
python3 object_storage_upload.py ./meeting.ogg --object-key speechkit/meeting.oggCreate a private download URL:
python3 object_storage_presign.py --object-key speechkit/meeting.ogg --expires-in 86400Run async v3 recognition for an Object Storage URL:
export YANDEX_IAM_TOKEN=your_iam_token
python3 speechkit_async_recognize_v3.py \
--uri "https://storage.yandexcloud.net/bucket/path/audio.wav" \
--container-audio-type WAV \
--language-code ru-RU \
--pollRun the full async local-file pipeline:
python3 transcribe_file_async.py ./meeting.mp4 ./meeting_async_workdir- The synchronous API expects local file bytes and is suitable for shorter requests.
- The async v3 flow expects a remote object URL and is better for longer recordings.
- If Object Storage settings are present in
.envor process env, this project should always use the async path for file transcription. - The recommended async path in this repo is: local file ->
prepare_audio.py->object_storage_upload.py->object_storage_presign.py->speechkit_async_recognize_v3.py. prepare_audio.pycurrently supportswav,linear16,ogg-opus, andmp3output presets.- When the source file contains video,
prepare_audio.pynow maps only the first audio stream and explicitly drops non-audio streams before conversion. split_audio_by_size.pykeeps the original codec/container and usesffmpegstream copy where possible.- If a chunk still exceeds the size limit because of container boundaries or variable bitrate, the script recursively splits that chunk again.
speechkit_sync_recognize.py,speechkit_async_recognize_v3.py, andtranscribe_file_async.pyload defaults from.envautomatically.transcribe_local_in_parts.pynow refuses to run when Object Storage env is configured, to enforce async mode at the orchestration level.transcribe_local_in_parts.pystores chunks inwork_dir/chunks, per-part text inwork_dir/results, the merged transcript inwork_dir/transcript.txt, and a full run manifest inwork_dir/manifest.json.
MIT. See LICENSE.