This project leverages the Whisper model for speech recognition, focusing on the Cantonese language. It is designed to transcribe Cantonese audio files into text, enhancing the accessibility and usability of speech recognition technologies for Cantonese speakers. Due to privacy concerns, our specific dataset is not included in this repository. Users are encouraged to use their own datasets by placing audio files in the designated data folder.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. Follow these simple steps to start transcribing Cantonese audio files.
Before installing the project, make sure you have Python and pip installed on your system. This project uses Flask to run the local API server, and it requires other dependencies listed in the requirements.txt file.
-
Clone the Project
Start by cloning the repository to your local machine:
git clone git@github.com:david188888/WhisperTranscriber.git cd WhisperTranscriber -
Install Dependencies
Install the required dependencies using pip:
pip install -r requirements.txt
-
Prepare the Data
Place your Cantonese audio files in the
datafolder. The audio files should be in.wavor.mp3format. -
Import the Whisper-Middle model
The model.safetensors file containing the model's parameters and data is not included due to its size. Follow the steps in proposs_model.md to import the model from Hugging Face Hub. -
Run the Transcriber Service
-
Post the audio file to api on the server
-
curl -X POST -F "file=@path_to_your_file.wav" http://whisper.kirisame.cc:8010/transcribe -
python api_post.py
-
- OpenAI for providing the Whisper model.
- Hugging Face for hosting the Whisper model on their model hub.