The purpose of the service is to provide a transcript and summary of audio recordings from meetings. The transcript will be broken down into semantic parts, with time stamps and speaker information, and abstracts for each part. There is also a web-service which can be used to test and demonstrate how the whole ASR-system works
- OpenAI Whisper models for transcription
- Pyannote model for diarization
- sberbank-ai/ruRoberta-large model for word embeddings
- cointegrated/rut5-base-absum model for summarization
- Flask framework for web-interface
Whole project was developed and tested with Python 3.10
- Clone repository:
git clone https://github.com/DefinitelyNik/ASR-service.git - Install Whisper:
pip install git+https://github.com/openai/whisper.gitor visit their repo and follow instruction there - Install Pyannote diarization model from their repo or hugginface page
- Install sberbank-ai/ruRoberta-large model(no huggingface page at the moment, so you can try to use another word embedding model for example)
- Install cointegrated/rut5-base-absum model from huggingface page
- Install PyTorch from their website (tested on cuda 11.8 but should work completely fine on other versions)
- Install other dependencies:
pip install Flask librosa numpy dotenv matplotlib scikit-learn transformers