Skip to content

Conversation

@TheApeMachine
Copy link

  • Implement reference audio
  • Automatically download OpenMuQ/MuQ-MuLan-large (or other compatible pre-trained model)
  • Add deterministic selection within audio prompt, or random

Notes: Not sure what your intentions are with the reference audio, but it seems to work pretty well as far as I can tell. I'm opening it as a pull request to see what your thoughts are, but feel free to ignore/close it if not interesting, I just wanted to experiment with it :) Also, there may be some small pieces of code in this branch that is related to an analysis harness I was working on to try and pinpoint the reason for the AI "shimmer" that seems to be common in music generation models, so apologies for that. Finally, I had to base this on my other pull request's branch, as I do not have a CUDA compatible machine here at the moment, so I can only work on Metal.

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.
…odec model. Update `run_lyrics_transcription.py` to dynamically select device based on availability, and modify `HeartCodec` to determine device from input tensor or model parameters. Improve `HeartMuLaGenPipeline` to support autocast on MPS for better performance.
…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.
…ce on MPS. Update `pyproject.toml` to include the optimizer package directory. Enhance `HeartMuLaGenPipeline` to optionally enable Metal optimizations during model execution, improving performance for Llama blocks.
…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.
…add optional dependencies for MuQ-MuLan. Modify `README.md` to reflect new Python version recommendations and installation instructions for optional features. Enhance `run_music_generation.py` and `HeartMuLaGenPipeline` to support reference audio conditioning and auto-download of MuQ-MuLan, improving music generation capabilities.
@frink
Copy link

frink commented Jan 25, 2026

Is this to try to figure out how to do style transfer one song to a new one?

Are you getting shimmer in this model too?

The problem in Suno appeared to be the 10hz generation rate and the overfit on the highs dues to a lot of music having rise and fall of pads in that band. The fix then is to move to 32hz codec space and using an RNN type network (Think RWKV-X) instead of straight transformers or diffusion. But that means NEW model architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants