Pitching quality is very low -- are my expectations too high? #281

magicus · 2026-01-07T16:02:27Z

magicus
Jan 7, 2026

As the title says. Most part of the entire UltraSinger chain works perfectly on the songs I've tried -- the vocal/instrument separation is good, the transcription is correct and good, the timing of lyrics with the music is good. But the pitch is really bad. I've tried manually correcting but it is waaay to much work.

Do I just have too high expectations on how well pitching is supposed to work? Or is something wrong?

What I mean by "bad" is that what should be a single note is split into multiple "~" segments, like 5 or more, and each like a halftone or so off, jumping up and down the scale. It seems like the pitching is too sensitive, that it should consider this to be a longer note with a constant pitch. I tried changing the crepe window size but it did not do anthing to address the problem (unless I should go even higher, like 100-200? But that seems like it might miss the real, short, notes).

rakuri255 · 2026-01-08T00:24:58Z

rakuri255
Jan 8, 2026
Maintainer

Yeah i working on that bad ~ timings.
But had not much time in the last weaks.

Have you an example song?

0 replies

magicus · 2026-01-08T14:20:22Z

magicus
Jan 8, 2026
Author

All songs I tried performed poorly. Take this as an example: https://www.youtube.com/watch?v=6ZW2dWudUZM

It starts like this:

: 0 6 8 Each 
- 6
: 13 2 15 day
: 15 2 16 ~
: 18 5 17 ~ 
- 23
: 27 2 16 I 
- 30
: 33 15 16 live 
- 48
: 85 4 17 I 
: 90 12 17 want 
: 103 2 17 to
: 105 4 19 ~ 
: 112 5 15 be
: 117 2 16 ~
: 120 2 15 ~
: 122 2 16 ~
: 125 2 15 ~
: 127 2 16 ~
: 130 2 15 ~
: 132 1 16 ~ 
- 134
: 163 2 15 A 
- 166
: 169 5 17 day
: 174 5 16 ~
: 179 2 14 ~ 
: 184 2 16 to
: 186 4 14 ~ 
: 193 5 15 give
: 198 5 14 ~
: 203 4 15 ~ 
- 206
: 242 5 16 The 
- 247
: 251 7 16 best
: 259 2 17 ~
: 261 2 16 ~ 
: 267 2 14 of 
- 269
: 274 2 14 me
: 276 2 13 ~
: 279 2 14 ~
: 281 2 13 ~
: 284 2 15 ~
: 286 2 13 ~
: 289 2 15 ~ 
- 291
: 327 4 17 I'm 
- 331
: 341 12 16 only
: 353 2 15 ~
: 356 2 14 ~
: 358 2 16 ~
: 361 2 15 ~
: 363 2 17 ~
: 366 9 16 ~ 
: 376 2 16 one

(I've used default values for crepe: full and step size 10)

0 replies

magicus · 2026-01-08T18:26:11Z

magicus
Jan 8, 2026
Author

I'm still a noob when it comes to the machine learning stuff, so I don't know if this can be fixed by tweaking crepe..?

Otherwise, I was wondering if it would be helpful to introduce an intermediate step that analyses the output from crepe and tries to make sense of it, from a UltraStar point of view. Like trying to combine nearby notes into one and set the tone to the average. (I actually pondered to do this as post-processing on the ultrastar.txt file, but it seemed just too much work; if this should be done it needs to be done at an earlier step in the pipeline.)

0 replies

RobUmf · 2026-01-08T18:33:48Z

RobUmf
Jan 8, 2026

Hi everyone,

I’ve been working extensively on optimizing UltraSinger for a unique, "air-gapped" style environment (Linux Mint, Python 3.10) with intermittent connectivity. Through this process, I've identified a few areas for improvement that I’m currently "beta testing" in my own scripts:

Audio Quality (WAV to MP3): I’ve noticed significant over-modulation issues when the processed audio is converted back to MP3. I’m working on refined logic to handle these levels better during the final merge to prevent clipping and distortion.

Environment Stability: For those on Linux, I’ve found that Python 3.10 is the absolute "Sweet Spot" for the current dependency tree (Torch, Crepe, Demucs).

Offline Workflow: I am finalizing a "Dynamic Manager" script that allows for a fully offline installation using a local "Wheelhouse" and model cache. This ensures that once you have the "Brains" (models), you don't need to worry about network drops mid-process.

I’ve been using Gemini (Flash & Pro) extensively as a thought partner to bridge these gaps. If you're tackling specific bugs, I highly recommend using the "Fast" models (Flash) for iterative logic building, then moving to "Pro" tokens once you have a solid candidate for the fix.

I'll be sharing more as my beta tests conclude. Looking forward to making UltraSinger more robust across platforms!
Revised by Gemini look at edit if interested

Added: If asking Gemini included your computer model number. Windows/Linux and Python number for a more precise answer.

https://gemini.google.com/share/5374f90c2a77

Didn't like my harvester part of script. Making it more universal.

0 replies

rakuri255 · 2026-01-08T23:12:23Z

rakuri255
Jan 8, 2026
Maintainer

@magicus your issue with these ~ is an combination of many in the pipeline. Mostly to an hardcoded part. I hadnt time to look closer into it but i have some small improvements.

@RobUmf nice! Waiting for an PR 😄

0 replies

magicus · 2026-01-09T12:15:11Z

magicus
Jan 9, 2026
Author

Ah, so it is not just an issue with crepe? Like, the voice separation is too bad for crepe to be able to do its thing properly? I could try to fiddle with some more argument, and/or tweak the code. Can you give any pointer in what you're thinking causes this behavior?

4 replies

rakuri255 Jan 9, 2026
Maintainer

So, yes and no.
If the audio quality isn’t good—and as @RobUmf pointed out, the conversion between MP3 and WAV could be improved—this can cause issues with voice separation.

Because of these issues, Crepe may detect some strange notes. GPU performance also plays a role here, although I didn’t notice any major differences in the results.

Then, the hardcoded MIDI note writing comes into play. It tries to detect note variations within a single word, combining rapid note changes that occur faster than a human could actually sing.

I believe this hardcoded part is the main issue. It attempts to identify the shortest singable note based on the BPM, then merges notes depending on the longest note in that section.

You can try to tweak it yourself. Its the split_syllables_into_segments() function in UltraSinger.py

magicus Jan 9, 2026
Author

When searching on how other have solved this problem, I ran across crepe-notes, github: https://github.com/xavriley/crepe_notes. They take a different approach to converting crepe output into "proper" notes, by using confidence and sudden differences to find new note onset. It seems like that it kind of the missing piece here. I'll try to play around with it and see if I can understand how to integrate it in the UltraSinger pipeline.

magicus Jan 9, 2026
Author

Using the command-line version of crepe-notes, I have acheived a pretty good midi output file, using the split-out vocals from UltraSinger, and the following command-line: crepe_notes --min-duration 0.06 --min-velocity 18 --sensitivity 0.003 --use-smoothing. (Note that this has been optimized for my Whitney Houston demo song, it might not be an universally optimal fit.)

I've raised the minimum duration, the default was 30 ms which was too short -- but possibly 0.06 (=60 ms) is too long for songs with many short notes. The minimum velocity sets a cut-off limit for tones that are not strong enought, which I found to be a good indication that it was a misunderstanding by the model. I've experienced with values between 10 and 25. In the lower end, you get false positives (notes that should not be there), and in the upper end you get false negatives (real notes missing that was actually detected by crepe). I think what I'd really like to see is some kind of histogram over note velocities for the entire song, and try to tailor a suitable cutoff for each individual song based on the pattern of velocities.

The sensitivity finally. I'm not sure what the value really indicates. The default is 0.001; higher values seem to give more stable output, where parts that otherwise would get many short flickering notes is replaced by a single long note. But it also seems to mess with the detection in other ways. At values like 0.5 I basically get like guitar chords, an entire note stretching multiple bars. Values between 0.003 and 0.08 seemed to give good results.

The experimental smoothing did seem to alleviate some of the flickering as well, but it was not as clear how much it helped.

rakuri255 Jan 11, 2026
Maintainer

nice. will also have a look into it

magicus · 2026-01-09T19:53:18Z

magicus
Jan 9, 2026
Author

My plan is to test this with some more songs, and if it turns out to be good, I'll see if I can stitch it into UltraSinger.

2 replies

magicus Jan 10, 2026
Author

I've tested a couple of more songs, and it works quite well. At any rate, it heavily outperforms the current behavior of UltraSinger when I compare the generated midi files.

So I've started to look at the possibility to integrate crepe-notes. Unfortunately, the code is not really written to be used as a library in a processing pipeline, so it only exposes a single method which will output a midi file. :( So either I'll have to change UltraSinger to read that midi file back in, or it is not possible to use crepe-notes directly, but instead be inspired of its methodology, and re-implement that functionality in UltraSinger.

rakuri255 Jan 11, 2026
Maintainer

You can try to add this as addtion model. And then with some if/else magic integrate it. I think there is already an load midi file function implemented.

RobUmf · 2026-01-18T01:08:41Z

RobUmf
Jan 18, 2026

Don't know it helps the mid file or txt Do think it gets you one the right track.
And can create a baseline track, for the karaoke
Help using Gemini 3

🛠️ Solution: Using FFmpeg to fix Overmodulation & Volume using Cache WAVs
If you are experiencing low pitching quality or "bad" sounding MP3s, it might be due to how the final merge is handled. If you use --keep_cache, UltraSinger saves the separation as pcm_f32le (32-bit float) WAV files. These are "Perfect Quality" but often sound very quiet in standard players because they have high headroom.

When merging them back to MP3, FFmpeg's amix filter defaults to lowering the volume to prevent clipping, which makes them even quieter.

The Fix: Go to your cache directory: .../cache/separated/htdemucs_ft/[SongName]/

Run these commands to manually merge with the "Sweet Spot" levels:

Sing-a-long (Melody is LOUDER)
Use this when you want the strings/vocals to lead the way for better pitch guidance.

Bash

ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=1.3[v];[1:a]volume=0.7[i];[v][i]amix=inputs=2:duration=first:dropout_transition=0:normalize=0[aout]" -map "[aout]" -b:a 320k "Sing-a-long.mp3"
2. Karaoke Style (Background is LOUDER)
This makes the baseline the focus, with the melody tucked in as a subtle guide.

Bash

ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=0.3[v];[1:a]volume=1.0[i];[v][i]amix=inputs=2:duration=first:dropout_transition=0:normalize=0[aout]" -map "[aout]" -b:a 320k "Karaoke_Style.mp3"
3. Pure Baseline (Instrumental Boost)
Since the 32-bit WAVs are quiet, use this to boost the instrumental by 50% for a strong stand-alone track.

Bash

ffmpeg -i no_vocals.wav -af "volume=1.5" -b:a 320k "Baseline_Boosted.mp3"
4. The "Extra Boost" (For very quiet 32-bit tracks)
If your no_vocals.wav is still too quiet, you can push the amplitude multiplier higher. 32-bit float can handle this without the distortion you get in 16-bit.

Bash

ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=2.0[v];[1:a]volume=4.0[i];[v][i]amix=inputs=2:duration=first:normalize=0[aout]" -map "[aout]" -b:a 320k "Sing-a-long_Boosted.mp3"
🔍 Key Technical Logic
normalize=0: This is critical. It stops FFmpeg from automatically lowering your volume, preventing that "quiet" output issue.

pcm_f32le: By working with the cache files, you are using the high-fidelity source before it gets compressed/distorted by the internal script logic.

volume=X.X:

1.0 = Original Volume

0.5 = 50% Volume

4.0 = 400% Volume (Great for quiet strings/piano stems)

Using Python 3.10 and the htdemucs_ft model, you are getting the cleanest possible stems before the merge

1 reply

rakuri255 Jan 18, 2026
Maintainer

maybe worth an PR? 😄

RobUmf · 2026-01-20T06:29:18Z

RobUmf
Jan 20, 2026

I think I made my first pull request.
A mixer using the wav file in the cache. Three suggestions on settings but it is fully customizable.
Anyone wants a way to batch process your song library to make them portable?
I have months of music in my flip phone

1 reply

rakuri255 Jan 20, 2026
Maintainer

Nice 👍
PR: #288

RobUmf · 2026-01-21T02:48:27Z

RobUmf
Jan 21, 2026

I just may have the solution for midi
having problems getting the script to run UltraSinger using the correct start spot. Do I also make provisions for GPU?
Does any one mind I have a quote at start?
def print_intro():
print(f"\n{YEL}=================================================={NC}")
print(f"{CYAN}"There's something I want to share with you...{NC}")
print(f"And I think you all are very much going to like it")
print(f"You're going to love what you see"{NC}")
print(f"{YEL}=================================================={NC}")
time.sleep(1)

where I'm at
https://gemini.google.com/share/b4f2df67c6dd

0 replies

Uh oh!

Pitching quality is very low -- are my expectations too high? #281

Uh oh!

magicus Jan 7, 2026

Replies: 10 comments · 8 replies

Uh oh!

rakuri255 Jan 8, 2026 Maintainer

Uh oh!

magicus Jan 8, 2026 Author

Uh oh!

magicus Jan 8, 2026 Author

Uh oh!

Uh oh!

RobUmf Jan 8, 2026

Uh oh!

rakuri255 Jan 8, 2026 Maintainer

Uh oh!

magicus Jan 9, 2026 Author

Uh oh!

rakuri255 Jan 9, 2026 Maintainer

Uh oh!

magicus Jan 9, 2026 Author

Uh oh!

magicus Jan 9, 2026 Author

Uh oh!

rakuri255 Jan 11, 2026 Maintainer

Uh oh!

magicus Jan 9, 2026 Author

Uh oh!

magicus Jan 10, 2026 Author

Uh oh!

rakuri255 Jan 11, 2026 Maintainer

Uh oh!

Uh oh!

RobUmf Jan 18, 2026

Uh oh!

rakuri255 Jan 18, 2026 Maintainer

Uh oh!

Uh oh!

RobUmf Jan 20, 2026

Uh oh!

Uh oh!

rakuri255 Jan 20, 2026 Maintainer

Uh oh!

Uh oh!

RobUmf Jan 21, 2026

magicus
Jan 7, 2026

Replies: 10 comments 8 replies

rakuri255
Jan 8, 2026
Maintainer

magicus
Jan 8, 2026
Author

magicus
Jan 8, 2026
Author

RobUmf
Jan 8, 2026

rakuri255
Jan 8, 2026
Maintainer

magicus
Jan 9, 2026
Author

rakuri255 Jan 9, 2026
Maintainer

magicus Jan 9, 2026
Author

magicus Jan 9, 2026
Author

rakuri255 Jan 11, 2026
Maintainer

magicus
Jan 9, 2026
Author

magicus Jan 10, 2026
Author

rakuri255 Jan 11, 2026
Maintainer

RobUmf
Jan 18, 2026

rakuri255 Jan 18, 2026
Maintainer

RobUmf
Jan 20, 2026

rakuri255 Jan 20, 2026
Maintainer

RobUmf
Jan 21, 2026