The official PyTorch implementation of the paper "Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis".
Details
conda create -n event-t2m python==3.10.14
conda activate event-t2m
# install pytorch
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
# install requirements
pip install -r requirements.txtWe conduct experiments on the HumanML3D and KIT-ML datasets. For both datasets, you can download them by following the instructions in here.
You can download the completed HumanML3D-E dataset from here.
If you want to prepare the dataset from scratch, follow the steps below:
Since an LLM (Gemini 2.5 flash) was used for HumanML3D-E data preprocessing, an API key is required.
Please enter the issued API key on line 6 of src/tools/data_decompose.py.
GOOGLE_API_KEY = "" # your api key here- For processing,
python src/tools/data_decompose.pypython src/tools/data_preprocess_decomposed.py --dataset hml3d
python src/tools/data_preprocess_decomposed.py --dataset kitThis will add the following files to the directory:
./dataset/HumanML3D
├── ...
├── data_train.npy
├── data_val.npy
└── data_test.npy
Also, we have released test subsets based on the number of conditions for event-stratified evaluation.
./dataset/HumanML3D
├── ...
├── data_test_condition2.npy
├── data_test_condition3.npy
└── data_test_condition4.npy
Download and unzip dependencies from here.
Download and unzip pre-trained models from here.
./
├── checkpoints
| ├── hml3d.ckpt
| ├── kit.ckpt
├── deps
| ├── glove
| ├── t2m_guo
└── ...
Details
- For HumanML3D
python src/train.py trainer.devices=\"0,1\" logger=wandb data=hml3d_event_final \
data.batch_size=128 data.repeat_dataset=5 trainer.max_epochs=600 \
callbacks/model_checkpoint=t2m +model/lr_scheduler=cosine model.guidance_scale=4\
model.noise_scheduler.prediction_type=sample trainer.precision=bf16-mixed- For KIT-ML
python src/train.py trainer.devices=\"2,3\" logger=wandb data=kit_event_final \
data.batch_size=128 data.repeat_dataset=5 trainer.max_epochs=1000 \
callbacks/model_checkpoint=t2m +model/lr_scheduler=cosine model.guidance_scale=4\
model.noise_scheduler.prediction_type=sample trainer.precision=bf16-mixedDetails
Set model.metrics.enable_mm_metric to True to evaluate Multimodality. Setting model.metrics.enable_mm_metric to False an speed up the evaluation.
python src/eval.py trainer.devices=\"0,\" data=hml3d_event_final data.test_batch_size=128 \
model=event_final \
model.guidance_scale=4 model.noise_scheduler.prediction_type=sample\
model.denoiser.stage_dim=\"256\*4\" \
ckpt_path=\"checkpoints/hml3d.ckpt\" model.metrics.enable_mm_metric=true