[Proposal] Studio-Grade Pitching: Fixing "Low Quality" via Manual Refinement & Bypassing Separation #286

RobUmf · 2026-01-19T01:45:27Z

RobUmf
Jan 19, 2026

The Discovery: Signal Level = Pitch Accuracy

I’ve been testing in a Python 3.10 "Sweet Spot" environment and found a simple solution for the "Low Pitching Quality" many users experience. The issue isn't the AI models (CREPE/Whisper)—it’s Input Overmodulation (Clipping).

When a vocal stem is too "hot," the pitch detector sees distorted square waves instead of clean frequencies. This causes octave jumps and "shaky" notes. By manually leveling the vocals to -3dB before processing, the AI produces smooth, professional-grade MIDI data.

🛠️ The Technical Barrier: The "Separation Loop"

The current logic makes this manual improvement nearly impossible to implement efficiently. Even when using --disable_separation, the script:

Triggers Model Downloads: It still tries to verify/download .ph and .pht (MFA) models from the internet.
Ignores Pre-Separated Inputs: If I provide a cleaned vocals.wav, the script still forces a check of the Demucs/Separation logic.

🧪 Requested Improvement: The "Manual-Vocal" Short-Circuit

I am proposing a way to allow a "MIDI-Only Pass" using local assets, which allows for manual audio refinement before the AI "math" happens.

Target Workflow:

Bash

python3 src/UltraSinger.py \
-i "Refined_Vocals.wav" \
--crepe full --midi --disable_separation

Two key changes needed:

Force Local Models: A way to tell the script: "Models are in my local cache, do not check the internet for MFA/Whisper weights."
Input Recognition: If the input is already a .wav file, skip the separation logic and use an existing .txt as the timing reference (Source of Truth), updating only the Pitch/Note data.

🕵️‍♂️ Evidence: Signal vs. Quality

🚀 Conclusion

My expectations for pitch quality aren't too high—the software just needs to let us provide the high-quality signal it needs. Has anyone successfully modified the script to accept a manual vocal stem without triggering the model downloader or the separation check?

magicus · 2026-01-19T11:57:55Z

magicus
Jan 19, 2026

In your testing, what tooling have you used to process the vocal wav before pitching? Have you found any values that work best?

I have also been missing the ability to re-run partial steps of the processing pipeline. That is definitely something worth doing in its own right.

Also, please prompt your AI to provide much shorter and to the point texts to paste here, or skip it altogether... I think you have some good ideas but it gets drowned out by the slop.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Studio-Grade Pitching: Fixing "Low Quality" via Manual Refinement & Bypassing Separation #286

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Proposal] Studio-Grade Pitching: Fixing "Low Quality" via Manual Refinement & Bypassing Separation #286

Uh oh!

RobUmf Jan 19, 2026

The Discovery: Signal Level = Pitch Accuracy

🛠️ The Technical Barrier: The "Separation Loop"

🧪 Requested Improvement: The "Manual-Vocal" Short-Circuit

🕵️‍♂️ Evidence: Signal vs. Quality

🚀 Conclusion

Replies: 1 comment

Uh oh!

magicus Jan 19, 2026

RobUmf
Jan 19, 2026

magicus
Jan 19, 2026