docker-transcribers

A dockerized environment to run all kinds of ASR/transcription AI models.

You can run it via CLI, or as OpenAI Compatible Transcription API Server.

Supported local models:

openai-whisper
faster-whisper
glm-asr-nano-2512

Supported remote providers:

openai
assemblyai
lemonfoxai

more comming.

Requirements

Docker
GPU for local models
API Key for remote providers

Run in Docker

touch .env
# override env vars in .env, e.g.:
# PORT=8000
# TRANSCRIBER_BACKEND=glm
make build

# run as api server
make server
./test_server.sh

# or run via cli
make shell
# you should be in container now
./transcribers.py --help
./transcribers.py data/audio.mp3 --backend lemonfoxai --format srt

Clients

For a complete speech-to-text experience, you can use these frontend applications that are compatible with this GLM-ASR server:

NeuralWhisper - A modern web-based frontend for speech transcription with real-time capabilities
WhisperSqueak - A lightweight desktop application for audio transcription

Both frontends are designed to work seamlessly with this GLM-ASR server's OpenAI-compatible API endpoints.

To use it in Spokenly:

URL: http://192.168.20.9:8000
API Key: (empty)
Model: (empty)

Model

Uses the GLM-ASR-Nano-2512 model from the ZAI organization, which provides efficient speech recognition with minimal computational overhead.

The GLM-ASR project is developed by the ZAI team and represents state-of-the-art multimodal speech recognition capabilities.

Performance

Input audio is resampled to 16kHz (optimal for the model)
Supports up to 30-second chunks, automatically batched for longer audio
Inference runs in bfloat16 precision for efficiency

Acknowledgments

This project builds upon the excellent work of:

GLM-ASR - The underlying speech recognition model by the ZAI organization (zai-org/GLM-ASR-Nano-2512)
faster-whisper-server - Inspired by Fedir Zadniprovskyi's architecture for OpenAI-compatible speech API servers
FastAPI - For the excellent Python web framework
HuggingFace - For the Transformers library and model hub

License

MIT License - See LICENSE file for details

Contributing

Contributions are welcome. Please feel free to submit a pull request.

We especially welcome enhancements to the Dockerfile to make it smaller and more modern. If you have ideas for optimizing the Docker image (multi-stage builds, better layer caching, Alpine Linux compatibility, etc.), we'd love to see your contributions.

Citation

If you use GLM-ASR in your research, please cite the original GLM-ASR model from ZAI organization:

@misc{glm-asr,
  title={GLM-ASR: Global Large-scale Multimodal Model for Automatic Speech Recognition},
  author={ZAI Organization},
  year={2024},
  url={https://huggingface.co/zai-org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py
test_openai_api.sh		test_openai_api.sh
test_server.sh		test_server.sh
transcribers.py		transcribers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docker-transcribers

Requirements

Run in Docker

Clients

Model

Performance

Acknowledgments

License

Contributing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

docker-transcribers

Requirements

Run in Docker

Clients

Model

Performance

Acknowledgments

License

Contributing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages