Skip to content

Conversation

@TheApeMachine
Copy link

  • Adds Mac MPS/Metal support
  • Refactors hard-coded CUDA implementation to detect platform
  • Beginning of custom Metal kernels to improve performance on mac

Currently takes about 11 minutes on Macbook Pro M4-Max 128GB unified memory (96 GB assignable as VRAM).

Manual inspection of output (listening to .mp3) confirms it is working.

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.
…odec model. Update `run_lyrics_transcription.py` to dynamically select device based on availability, and modify `HeartCodec` to determine device from input tensor or model parameters. Improve `HeartMuLaGenPipeline` to support autocast on MPS for better performance.
…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.
…ce on MPS. Update `pyproject.toml` to include the optimizer package directory. Enhance `HeartMuLaGenPipeline` to optionally enable Metal optimizations during model execution, improving performance for Llama blocks.
…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.
@iamwavecut
Copy link

I tested it on an MBP 16" M2 Max 64GB: the default prompt took 24 minutes, with 33 GB of RAM allocated.

@TheApeMachine
Copy link
Author

@iamwavecut Damn... Well, let's start by saying "cool it works" :p But this is of course not super great to have to wait that long for a song.
There is still a lot that can be done, most notably writing more, and better custom metal kernels to fuse operations.

Ah, so that reminds me, did you have everything in place to support the custom metal kernels that are there? (you need xcode-tools, which I am sure you have, but also two additional libraries, I would have to look up) And set:

HEARTLIB_ENABLE_MPS_METAL=1
HEARTLIB_MPS_METAL_VERBOSE=1

I do see a couple of things right now, let me take a stab at making it faster...

@tonywestonuk
Copy link

Generated on a 'modest' MacBook Air m4 32gb. Memory pressure went to red for a few seconds, but.... it did it in about an hour, Im guessing a fair amount of thermal throttling.

But, wow. I didn't think it would work....but it did. Thanks for your efforts getting this going.

output.mp3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants