GitHub - MyungBeomHer/SpeechToText: Whisper + LORA based Speech-to-text

AI based Speech-To-Text (Whisper + LORA)

팀원 : 허명범

프로젝트 주제

음성 인식 모델 개발

프로젝트 언어 및 환경

프로젝트 언어 : Pytorch

Dataset

Zeroth-korean Dataset

SETUP

conda create -n STT
conda activate STT
pip install -r requirements.txt

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

If you are error in the deocoder, you use this command

conda install -c conda-forge "ffmpeg>=6,<8" libsndfile
conda install -c conda-forge libffi

1st train

CUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node=2 main.py

next train (you already downloaded the dataset and safetensors of model. So you can not download them one more time.)

export HF_HOME=/data/.cache/huggingface
export HF_DATASETS_CACHE=$HF_HOME/datasets
CUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node=2 main.py

Model

model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID,use_safetensors=True)

target_modules = ["q_proj", "k_proj", "v_proj", "out_proj"]

from peft import LoraConfig, get_peft_model

OUTPUT_DIR = "./whisper-LORA/Encoder=FRU-Adapter+LORA_Decoder=LORA"

peft_cfg = LoraConfig(
    r=128,                 
    lora_alpha=64,       
    lora_dropout=0.05,    
    bias="none",
    target_modules=target_modules,
)
model = get_peft_model(model, peft_cfg)
for p in model.parameters():
    p.requires_grad_(False)
for n,p in model.named_parameters():
    if "lora_" in n:  
        p.requires_grad_(True)

main.py

Benchmark (Zeroth-Korean Dataset)

Model	Denoiser	Trainable Params	WER(↓)
Whisper	X	769M	3.64
Whisper	Facebook-denoiser	769M	4.40
Whisper	MetricGAN+	769M	23.87
Whisper	DemucsV4	769M	3.72
Whisper+LORA(ours)	X	75M	3.62

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
data		data
figure		figure
notebooks		notebooks
tests		tests
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
approach.png		approach.png
language-breakdown.svg		language-breakdown.svg
main.py		main.py
main_scratch.py		main_scratch.py
model-card.md		model-card.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
whisper_ko.py		whisper_ko.py
whisper_ko_denoise_Demucs.py		whisper_ko_denoise_Demucs.py
whisper_ko_denoise_MetricGANPlus.py		whisper_ko_denoise_MetricGANPlus.py
whisper_ko_denoise_facebook.py		whisper_ko_denoise_facebook.py
whisper_opt.py		whisper_opt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI based Speech-To-Text (Whisper + LORA)

프로젝트 주제

프로젝트 언어 및 환경

Dataset

SETUP

1st train

next train (you already downloaded the dataset and safetensors of model. So you can not download them one more time.)

Model

Reference Repo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI based Speech-To-Text (Whisper + LORA)

프로젝트 주제

프로젝트 언어 및 환경

Dataset

SETUP

1st train

next train (you already downloaded the dataset and safetensors of model. So you can not download them one more time.)

Model

Reference Repo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages