Hi!
Thanks for the interesting paper!
I have a question regarding the prompt mechanism during tokenizer training.
The paper describes: "Specifically, during training, we randomly sample a prefix from the input mel-spectrogram by drawing a segment length 𝑙 ∼ Uniform(0, 0.25𝐿), where 𝐿 denotes the total number of frames in the mel-spectrogram. The prefix is preserved without any added noise, while the loss is computed solely on the noisy portion of the sequence."
I have a question regarding the inference.
Is a clean prefix prompt should be provided during the decoder inference?
It seems that providing the prefix prompt is more reasonable, while I just wish to double check.
Thanks!
Hi!
Thanks for the interesting paper!
I have a question regarding the prompt mechanism during tokenizer training.
The paper describes: "Specifically, during training, we randomly sample a prefix from the input mel-spectrogram by drawing a segment length 𝑙 ∼ Uniform(0, 0.25𝐿), where 𝐿 denotes the total number of frames in the mel-spectrogram. The prefix is preserved without any added noise, while the loss is computed solely on the noisy portion of the sequence."
I have a question regarding the inference.
Is a clean prefix prompt should be provided during the decoder inference?
It seems that providing the prefix prompt is more reasonable, while I just wish to double check.
Thanks!