팀원 : 허명범
음성 인식 모델 개발
프로젝트 언어 : Pytorch
conda create -n STT
conda activate STT
pip install -r requirements.txtIt also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpegIf you are error in the deocoder, you use this command
conda install -c conda-forge "ffmpeg>=6,<8" libsndfile
conda install -c conda-forge libffiCUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node=2 main.pynext train (you already downloaded the dataset and safetensors of model. So you can not download them one more time.)
export HF_HOME=/data/.cache/huggingface
export HF_DATASETS_CACHE=$HF_HOME/datasets
CUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node=2 main.pymodel = WhisperForConditionalGeneration.from_pretrained(MODEL_ID,use_safetensors=True)
target_modules = ["q_proj", "k_proj", "v_proj", "out_proj"]
from peft import LoraConfig, get_peft_model
OUTPUT_DIR = "./whisper-LORA/Encoder=FRU-Adapter+LORA_Decoder=LORA"
peft_cfg = LoraConfig(
r=128,
lora_alpha=64,
lora_dropout=0.05,
bias="none",
target_modules=target_modules,
)
model = get_peft_model(model, peft_cfg)
for p in model.parameters():
p.requires_grad_(False)
for n,p in model.named_parameters():
if "lora_" in n:
p.requires_grad_(True)
- Benchmark (Zeroth-Korean Dataset)
| Model | Denoiser | Trainable Params | WER(↓) |
|---|---|---|---|
| Whisper | X | 769M | 3.64 |
| Whisper | Facebook-denoiser | 769M | 4.40 |
| Whisper | MetricGAN+ | 769M | 23.87 |
| Whisper | DemucsV4 | 769M | 3.72 |
| Whisper+LORA(ours) | X | 75M | 3.62 |

