Skip to content

guoqiao/docker-transcribers

Repository files navigation

docker-transcribers

A dockerized environment to run all kinds of ASR/transcription AI models.

You can run it via CLI, or as OpenAI Compatible Transcription API Server.

Supported local models:

  • openai-whisper
  • faster-whisper
  • glm-asr-nano-2512

Supported remote providers:

  • openai
  • assemblyai
  • lemonfoxai

more comming.

Requirements

  • Docker
  • GPU for local models
  • API Key for remote providers

Run in Docker

touch .env
# override env vars in .env, e.g.:
# PORT=8000
# TRANSCRIBER_BACKEND=glm
make build

# run as api server
make server
./test_server.sh

# or run via cli
make shell
# you should be in container now
./transcribers.py --help
./transcribers.py data/audio.mp3 --backend lemonfoxai --format srt

Clients

For a complete speech-to-text experience, you can use these frontend applications that are compatible with this GLM-ASR server:

  • NeuralWhisper - A modern web-based frontend for speech transcription with real-time capabilities
  • WhisperSqueak - A lightweight desktop application for audio transcription

Both frontends are designed to work seamlessly with this GLM-ASR server's OpenAI-compatible API endpoints.

To use it in Spokenly:

URL: http://192.168.20.9:8000
API Key: (empty)
Model: (empty)

Model

Uses the GLM-ASR-Nano-2512 model from the ZAI organization, which provides efficient speech recognition with minimal computational overhead.

The GLM-ASR project is developed by the ZAI team and represents state-of-the-art multimodal speech recognition capabilities.

Performance

  • Input audio is resampled to 16kHz (optimal for the model)
  • Supports up to 30-second chunks, automatically batched for longer audio
  • Inference runs in bfloat16 precision for efficiency

Acknowledgments

This project builds upon the excellent work of:

  • GLM-ASR - The underlying speech recognition model by the ZAI organization (zai-org/GLM-ASR-Nano-2512)
  • faster-whisper-server - Inspired by Fedir Zadniprovskyi's architecture for OpenAI-compatible speech API servers
  • FastAPI - For the excellent Python web framework
  • HuggingFace - For the Transformers library and model hub

License

MIT License - See LICENSE file for details

Contributing

Contributions are welcome. Please feel free to submit a pull request.

We especially welcome enhancements to the Dockerfile to make it smaller and more modern. If you have ideas for optimizing the Docker image (multi-stage builds, better layer caching, Alpine Linux compatibility, etc.), we'd love to see your contributions.

Citation

If you use GLM-ASR in your research, please cite the original GLM-ASR model from ZAI organization:

@misc{glm-asr,
  title={GLM-ASR: Global Large-scale Multimodal Model for Automatic Speech Recognition},
  author={ZAI Organization},
  year={2024},
  url={https://huggingface.co/zai-org}
}

About

A dockerized environment to run all kinds of ASR/transcription AI models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors