At the default setting, when I upload a WAV of speech data, nearly all frames get voiced, and almost none get unvoiced, although the speech doesn't only contains vowels, but also consonants, which should lead to unvoiced frames.
Maybe this is because the voiced/unvoiced decision is made from the first parameter k1 if I'm correct, which I don't think is a reliable criterion.
Changing the Unvoiced Threshold in Buzzer Studio doesn't help here either... normally I'd assume that if I set it to 0, all frames should get voiced, and if I set it to 1, no frames should get voiced, but that's not what happens... there doesn't seem to be much of a difference how I set it, even if I set it to 9 (which can't be done with the arrows), still almost all frames get voiced.
I have a different algorithm in my code which I uploaded to the other issue, and I'll explain that algorithm here in more detail:
In my code there are multiple criteria which may cause a frame to be called unvoiced:
To understand this, my code does a de-emphasis on the wave data before encoding. But the wave data before the de-emphasis already gets analyzed on its energy content per frame, which is called original_energy. The energy of the de-emphasized version is then called energy_emphasized. Teh energy is calculated by calculating the sum of the squares of sample values throughout each frame and then taking the square root of this sum * 0.002.
To decide if a frame will be unvoiced, the following things are checked:
- If the value for the original energy of the frame is < 3, the frame will be unvoiced.
- If the value for the original energy of the frame divided by the value of its de-emphasized version is < 1.2, the frame will be unvoiced as well.
- If, on estimating the pitch of the frame, no strong pitch is found, the frame is called unvoiced as well. This is decided by the EstimatePeriod subroutine by multiple criteria which I haven't quite seen through myself. ;-)
At the default setting, when I upload a WAV of speech data, nearly all frames get voiced, and almost none get unvoiced, although the speech doesn't only contains vowels, but also consonants, which should lead to unvoiced frames.
Maybe this is because the voiced/unvoiced decision is made from the first parameter k1 if I'm correct, which I don't think is a reliable criterion.
Changing the Unvoiced Threshold in Buzzer Studio doesn't help here either... normally I'd assume that if I set it to 0, all frames should get voiced, and if I set it to 1, no frames should get voiced, but that's not what happens... there doesn't seem to be much of a difference how I set it, even if I set it to 9 (which can't be done with the arrows), still almost all frames get voiced.
I have a different algorithm in my code which I uploaded to the other issue, and I'll explain that algorithm here in more detail:
In my code there are multiple criteria which may cause a frame to be called unvoiced:
To understand this, my code does a de-emphasis on the wave data before encoding. But the wave data before the de-emphasis already gets analyzed on its energy content per frame, which is called original_energy. The energy of the de-emphasized version is then called energy_emphasized. Teh energy is calculated by calculating the sum of the squares of sample values throughout each frame and then taking the square root of this sum * 0.002.
To decide if a frame will be unvoiced, the following things are checked: