Clean Python API for YuE music generation.
All credit to the YuE team at M-A-P and HKUST Audio for the model, training, and research. This package is just a convenience wrapper.
Paper: YuE: Scaling Open Foundation Models for Long-Form Music Generation
Code in this repository was written by Claude (Anthropic) in collaboration with @atobey.
pip install yue-inferenceOr install from source:
git clone https://github.com/atobey/yue-inference
cd yue-inference
pip install -e .git clone https://github.com/atobey/yue-inference
cd yue-inference
uv venv
uv pip install -e ".[dev]"
source .venv/bin/activateYuE uses a two-stage generation pipeline. Models are downloaded automatically on first use, or you can pre-download them:
# Stage 1: 7B semantic model (pick your language)
huggingface-cli download m-a-p/YuE-s1-7B-anneal-en-cot # English (Chain-of-Thought)
huggingface-cli download m-a-p/YuE-s1-7B-anneal-en-icl # English (In-Context Learning)
huggingface-cli download m-a-p/YuE-s1-7B-anneal-zh-cot # Chinese
huggingface-cli download m-a-p/YuE-s1-7B-anneal-jp-kr-cot # Japanese/Korean
# Stage 2: 1B acoustic model
huggingface-cli download m-a-p/YuE-s2-1B-general
# Audio codec
huggingface-cli download m-a-p/xcodec_mini_inferfrom yue_inference import YuE
# Load models (downloads automatically if needed)
model = YuE.from_pretrained()
# Generate a song
audio = model.generate(
lyrics="""
[verse]
Walking through the city lights
Stars are shining oh so bright
[chorus]
This is my song, my melody
Dancing wild and feeling free
""",
genre="Pop, upbeat, female vocals",
)
# Save to file
audio.save("my_song.wav")
# Check duration
print(f"Generated {audio.duration_seconds:.1f}s of audio")Load the YuE model from HuggingFace.
model = YuE.from_pretrained(
stage1="m-a-p/YuE-s1-7B-anneal-en-cot", # Stage 1 model ID
stage2="m-a-p/YuE-s2-1B-general", # Stage 2 model ID
device="cuda", # Device (cuda/cpu)
dtype=torch.bfloat16, # Model dtype
)Generate audio from lyrics and genre tags.
audio = model.generate(
lyrics="[verse]\nHello world\n\n[chorus]\nGoodbye world",
genre="Pop, female vocals",
max_tokens=3000, # Max tokens for stage 1 (more = longer audio)
run_n_segments=2, # Number of lyric segments to process
seed=42, # Random seed for reproducibility
)Container for generated audio.
audio.save("output.wav") # Save to WAV file
audio.samples # Raw numpy array (float32)
audio.sample_rate # Sample rate (16000 Hz)
audio.duration_seconds # Duration in seconds| Stage 1 (Semantic) | Language | HuggingFace ID |
|---|---|---|
| English (CoT) | EN | m-a-p/YuE-s1-7B-anneal-en-cot |
| English (ICL) | EN | m-a-p/YuE-s1-7B-anneal-en-icl |
| Chinese (CoT) | ZH | m-a-p/YuE-s1-7B-anneal-zh-cot |
| Japanese/Korean | JP/KR | m-a-p/YuE-s1-7B-anneal-jp-kr-cot |
| Stage 2 (Acoustic) | HuggingFace ID |
|---|---|
| General | m-a-p/YuE-s2-1B-general |
YuE supports a wide variety of genre and style tags:
# Single genre
genre="Pop"
# Multiple tags
genre="Pop, upbeat, female vocals"
# Specific style
genre="Electronic, ambient, instrumental"
# Mood + genre
genre="Sad, acoustic, singer-songwriter"- Python 3.10+
- CUDA GPU with 24GB+ VRAM recommended (for 7B model)
- ~16GB for model weights
Apache 2.0 (same as YuE)
If you use this package, please cite the original YuE paper:
@article{yuan2025yue,
title={YuE: Scaling Open Foundation Models for Long-Form Music Generation},
author={Yuan, Ruibin and others},
journal={arXiv preprint arXiv:2503.08638},
year={2025}
}