Feature/reference audio prompt #28

TheApeMachine · 2026-01-19T21:54:23Z

Implement reference audio
Automatically download OpenMuQ/MuQ-MuLan-large (or other compatible pre-trained model)
Add deterministic selection within audio prompt, or random

Notes: Not sure what your intentions are with the reference audio, but it seems to work pretty well as far as I can tell. I'm opening it as a pull request to see what your thoughts are, but feel free to ignore/close it if not interesting, I just wanted to experiment with it :) Also, there may be some small pieces of code in this branch that is related to an analysis harness I was working on to try and pinpoint the reason for the AI "shimmer" that seems to be common in music generation models, so apologies for that. Finally, I had to base this on my other pull request's branch, as I do not have a CUDA compatible machine here at the moment, so I can only work on Metal.

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.

…odec model. Update `run_lyrics_transcription.py` to dynamically select device based on availability, and modify `HeartCodec` to determine device from input tensor or model parameters. Improve `HeartMuLaGenPipeline` to support autocast on MPS for better performance.

…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.

…ce on MPS. Update `pyproject.toml` to include the optimizer package directory. Enhance `HeartMuLaGenPipeline` to optionally enable Metal optimizations during model execution, improving performance for Llama blocks.

…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.

…add optional dependencies for MuQ-MuLan. Modify `README.md` to reflect new Python version recommendations and installation instructions for optional features. Enhance `run_music_generation.py` and `HeartMuLaGenPipeline` to support reference audio conditioning and auto-download of MuQ-MuLan, improving music generation capabilities.

frink · 2026-01-25T22:17:35Z

Is this to try to figure out how to do style transfer one song to a new one?

Are you getting shimmer in this model too?

The problem in Suno appeared to be the 10hz generation rate and the overfit on the highs dues to a lot of music having rise and fall of pads in that band. The fix then is to move to 32hz codec space and using an RNN type network (Think RWKV-X) instead of straight transformers or diffusion. But that means NEW model architecture.

TheApeMachine added 6 commits January 19, 2026 13:04

Refactor music generation pipeline to support dynamic device and dtyp…

c2fc45c

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.

Refactor HeartMuLaGenPipeline to improve autocast handling and opti…

082f715

…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.

Implement Metal support for RMSNorm and RoPE operations, including ne…

b56ec87

…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/reference audio prompt #28

Feature/reference audio prompt #28

Uh oh!

TheApeMachine commented Jan 19, 2026

Uh oh!

frink commented Jan 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/reference audio prompt #28

Are you sure you want to change the base?

Feature/reference audio prompt #28

Uh oh!

Conversation

TheApeMachine commented Jan 19, 2026

Uh oh!

frink commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frink commented Jan 25, 2026 •

edited

Loading