Replies: 10 comments 8 replies
-
|
Yeah i working on that bad ~ timings. Have you an example song? |
Beta Was this translation helpful? Give feedback.
-
|
All songs I tried performed poorly. Take this as an example: https://www.youtube.com/watch?v=6ZW2dWudUZM It starts like this: (I've used default values for crepe: full and step size 10) |
Beta Was this translation helpful? Give feedback.
-
|
I'm still a noob when it comes to the machine learning stuff, so I don't know if this can be fixed by tweaking crepe..? Otherwise, I was wondering if it would be helpful to introduce an intermediate step that analyses the output from crepe and tries to make sense of it, from a UltraStar point of view. Like trying to combine nearby notes into one and set the tone to the average. (I actually pondered to do this as post-processing on the ultrastar.txt file, but it seemed just too much work; if this should be done it needs to be done at an earlier step in the pipeline.) |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone, I’ve been working extensively on optimizing UltraSinger for a unique, "air-gapped" style environment (Linux Mint, Python 3.10) with intermittent connectivity. Through this process, I've identified a few areas for improvement that I’m currently "beta testing" in my own scripts: Audio Quality (WAV to MP3): I’ve noticed significant over-modulation issues when the processed audio is converted back to MP3. I’m working on refined logic to handle these levels better during the final merge to prevent clipping and distortion. Environment Stability: For those on Linux, I’ve found that Python 3.10 is the absolute "Sweet Spot" for the current dependency tree (Torch, Crepe, Demucs). Offline Workflow: I am finalizing a "Dynamic Manager" script that allows for a fully offline installation using a local "Wheelhouse" and model cache. This ensures that once you have the "Brains" (models), you don't need to worry about network drops mid-process. I’ve been using Gemini (Flash & Pro) extensively as a thought partner to bridge these gaps. If you're tackling specific bugs, I highly recommend using the "Fast" models (Flash) for iterative logic building, then moving to "Pro" tokens once you have a solid candidate for the fix. I'll be sharing more as my beta tests conclude. Looking forward to making UltraSinger more robust across platforms! Added: If asking Gemini included your computer model number. Windows/Linux and Python number for a more precise answer. https://gemini.google.com/share/5374f90c2a77 Didn't like my harvester part of script. Making it more universal. |
Beta Was this translation helpful? Give feedback.
-
|
@magicus your issue with these ~ is an combination of many in the pipeline. Mostly to an hardcoded part. I hadnt time to look closer into it but i have some small improvements. @RobUmf nice! Waiting for an PR 😄 |
Beta Was this translation helpful? Give feedback.
-
|
Ah, so it is not just an issue with crepe? Like, the voice separation is too bad for crepe to be able to do its thing properly? I could try to fiddle with some more argument, and/or tweak the code. Can you give any pointer in what you're thinking causes this behavior? |
Beta Was this translation helpful? Give feedback.
-
|
My plan is to test this with some more songs, and if it turns out to be good, I'll see if I can stitch it into UltraSinger. |
Beta Was this translation helpful? Give feedback.
-
|
Don't know it helps the mid file or txt Do think it gets you one the right track. 🛠️ Solution: Using FFmpeg to fix Overmodulation & Volume using Cache WAVs When merging them back to MP3, FFmpeg's amix filter defaults to lowering the volume to prevent clipping, which makes them even quieter. The Fix: Go to your cache directory: .../cache/separated/htdemucs_ft/[SongName]/ Run these commands to manually merge with the "Sweet Spot" levels:
Bash ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=1.3[v];[1:a]volume=0.7[i];[v][i]amix=inputs=2:duration=first:dropout_transition=0:normalize=0[aout]" -map "[aout]" -b:a 320k "Sing-a-long.mp3" Bash ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=0.3[v];[1:a]volume=1.0[i];[v][i]amix=inputs=2:duration=first:dropout_transition=0:normalize=0[aout]" -map "[aout]" -b:a 320k "Karaoke_Style.mp3" Bash ffmpeg -i no_vocals.wav -af "volume=1.5" -b:a 320k "Baseline_Boosted.mp3" Bash ffmpeg -i vocals.wav -i no_vocals.wav -filter_complex "[0:a]volume=2.0[v];[1:a]volume=4.0[i];[v][i]amix=inputs=2:duration=first:normalize=0[aout]" -map "[aout]" -b:a 320k "Sing-a-long_Boosted.mp3" pcm_f32le: By working with the cache files, you are using the high-fidelity source before it gets compressed/distorted by the internal script logic. volume=X.X: 1.0 = Original Volume 0.5 = 50% Volume 4.0 = 400% Volume (Great for quiet strings/piano stems) Using Python 3.10 and the htdemucs_ft model, you are getting the cleanest possible stems before the merge |
Beta Was this translation helpful? Give feedback.
-
|
I think I made my first pull request. |
Beta Was this translation helpful? Give feedback.
-
|
I just may have the solution for midi where I'm at |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
As the title says. Most part of the entire UltraSinger chain works perfectly on the songs I've tried -- the vocal/instrument separation is good, the transcription is correct and good, the timing of lyrics with the music is good. But the pitch is really bad. I've tried manually correcting but it is waaay to much work.
Do I just have too high expectations on how well pitching is supposed to work? Or is something wrong?
What I mean by "bad" is that what should be a single note is split into multiple "~" segments, like 5 or more, and each like a halftone or so off, jumping up and down the scale. It seems like the pitching is too sensitive, that it should consider this to be a longer note with a constant pitch. I tried changing the crepe window size but it did not do anthing to address the problem (unless I should go even higher, like 100-200? But that seems like it might miss the real, short, notes).
Beta Was this translation helpful? Give feedback.
All reactions