Skip to content

Fix language model repeated scoring#12

Open
FieldsMedal wants to merge 1 commit intoSlyne:masterfrom
FieldsMedal:hotwords
Open

Fix language model repeated scoring#12
FieldsMedal wants to merge 1 commit intoSlyne:masterfrom
FieldsMedal:hotwords

Conversation

@FieldsMedal
Copy link

In this pr,fix language model score repeatedly. When hotwords_scorer->is_character_based and ext_scorer->is_character_based() is false,The language model and hot word scores will be repeatedly calculated. In fact, if the language model is word based , it will only call the scorer whenever space_id is detected. After modification,
we tested all possibilities on the dataset.

first audio

set beam_size=10, num_processes = 1,blank_id = 0,space_id = 45,cutoff_prob = 1(increase cutoff_prob to generate space
),alpha =0.5 ,beta=0.5,window_length=4. hot_words = {'换一': -3.40282e+38, '首歌': -100, '换首歌': 3.40282e+38}

编号 模型 热词is_character_based 语言模型is_character_based 解码结果(best path)
1 都不使用 * * 换一首歌
2 热词 TRUE * 换首歌a<unk>
3 FALSE * 换首歌<space>A<space>爱'爱<unk>
4 语言 * TRUE 换一首歌
5 * FALSE 换一首
6 热词+语言 TRUE TRUE 换换首歌<unk>
7 TRUE FALSE 一首
8 FALSE TRUE 换首歌<space>A<space>爱'爱<unk>
9 FALSE FALSE 换一首

No. 7 and No. 9 hot words did not take effect. When the language model is_character_based is false, Words generated between two spaces should be in 1-grams or is a prefix of 1-grams. hotwords '换首歌' not in 1-grams.

second audio

set beam_size=10, num_processes = 1,blank_id = 0,space_id = 45,cutoff_prob = 1(increase cutoff_prob to generate space
),alpha =0.5 ,beta=0.5,window_length=4. hot_words = {'极点': 550}.Set the space to <space> before compiling ctc_decoder.

编号 模型 热词is_character_based 语言模型is_character_based 解码结果(best path)
1 都不使用 * * 几点了
2 热词 TRUE * 极点极点点了
3 FALSE * 极点<space><space><space><space>
4 语言 * TRUE 几点啦
5 * FALSE 几点啦
6 热词+语言 TRUE TRUE 极点极点极点啦
7 TRUE FALSE 极点<space>极点<space>极点
8 FALSE TRUE 极点<space><space><space><space>
9 FALSE FALSE 极点<space>是<space>是<space>是<space>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant