Replies: 1 comment
-
|
In your testing, what tooling have you used to process the vocal wav before pitching? Have you found any values that work best? I have also been missing the ability to re-run partial steps of the processing pipeline. That is definitely something worth doing in its own right. Also, please prompt your AI to provide much shorter and to the point texts to paste here, or skip it altogether... I think you have some good ideas but it gets drowned out by the slop. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The Discovery: Signal Level = Pitch Accuracy
I’ve been testing in a Python 3.10 "Sweet Spot" environment and found a simple solution for the "Low Pitching Quality" many users experience. The issue isn't the AI models (CREPE/Whisper)—it’s Input Overmodulation (Clipping).
When a vocal stem is too "hot," the pitch detector sees distorted square waves instead of clean frequencies. This causes octave jumps and "shaky" notes. By manually leveling the vocals to -3dB before processing, the AI produces smooth, professional-grade MIDI data.
🛠️ The Technical Barrier: The "Separation Loop"
The current logic makes this manual improvement nearly impossible to implement efficiently. Even when using
--disable_separation, the script:Triggers Model Downloads: It still tries to verify/download
.phand.pht(MFA) models from the internet.Ignores Pre-Separated Inputs: If I provide a cleaned
vocals.wav, the script still forces a check of the Demucs/Separation logic.🧪 Requested Improvement: The "Manual-Vocal" Short-Circuit
I am proposing a way to allow a "MIDI-Only Pass" using local assets, which allows for manual audio refinement before the AI "math" happens.
Target Workflow:
Two key changes needed:
Force Local Models: A way to tell the script: "Models are in my local cache, do not check the internet for MFA/Whisper weights."
Input Recognition: If the input is already a
.wavfile, skip the separation logic and use an existing.txtas the timing reference (Source of Truth), updating only the Pitch/Note data.🕵️♂️ Evidence: Signal vs. Quality
Input Audio | AI Pitch Behavior | Final MIDI Result -- | -- | -- Raw/Clipped | AI "hallucinates" frequencies. | Jagged, unstable notes. Refined (-3dB) | CREPE sees clean peaks. | Clean, Studio-Grade MIDI.🚀 Conclusion
My expectations for pitch quality aren't too high—the software just needs to let us provide the high-quality signal it needs. Has anyone successfully modified the script to accept a manual vocal stem without triggering the model downloader or the separation check?
Beta Was this translation helpful? Give feedback.
All reactions