ViTex

ViTex: Visual Texture Control for Multi-track Symbolic Music Generation via Discrete Diffusion Models

This repository contains the official implementation of ViTex, a discrete diffusion-based model for controllable multi-track symbolic music generation with visual texture conditioning.

🧩 Environment Setup

We recommend using Python 3.12.

Install the dependencies with:

pip install torch tensorboard librosa muspy accelerate pydub

📦 Dataset and Checkpoints

Our processed dataset and pretrained checkpoint are available here: 👉 Google Drive Folder

Contents:

d3pm.ckpt: pretrained discrete diffusion model weights
all_pkl.tar.gz: training set
pkl_test.tar.gz: test set

🚀 Inference

Download and extract the checkpoint and test set.
Modify the file paths in utils/inference_utils.py, specifically in the functions:
- get_model()
- get_dataset() Update these paths to match where you extracted the files.
Run the provided inference script:
```
bash run.sh
```
This script randomly samples chord progressions and ViTex conditions from the test set, and calls pipeline.py to generate new music examples.

You can modify pipeline.py or the command-line arguments to adjust configurations such as:

Conditional vs. unconditional generation
Guidance scales for each condition
Inpainting or continuation modes

🧠 Training

Preparing the Training Dataset

If you wish to use our preprocessed dataset, simply extract the provided archive. To train on your own MIDI collection, follow these steps inside the data_preprocess folder:

Filter invalid MIDI files
```
python filter.py
```
This filters out non-4/4 tracks, extreme BPM values, missing drum tracks, etc. (See the process_midi() function for detailed filtering rules.)
Normalize tempo and instrumentation
```
python normalize.py
```
This step standardizes the BPM to 120 and maps instruments to a predefined set of 12 categories.
Preprocess multi-track data
```
python preprocess_multi.py
```
Extracts chord and instrumentation information and saves them as .pkl files.
Split into training and test sets
```
python split.py
```
Splits the preprocessed data at the song level.

After these steps, you’ll have a directory containing .pkl files ready for training.

Running Training

Set the path to your dataset in the training_config section of train.py, then launch training:

accelerate launch train.py

If you are using multiple GPUs, configure the accelerate config settings according to your hardware setup before launching.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
chord_extractor		chord_extractor
data_preprocess		data_preprocess
mir_eval		mir_eval
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
check_valid.py		check_valid.py
compute_obj_metric.py		compute_obj_metric.py
config.py		config.py
control_eval.py		control_eval.py
control_test.py		control_test.py
d3pm.py		d3pm.py
d3pm_eval.py		d3pm_eval.py
dataloader.py		dataloader.py
dataset.py		dataset.py
doa_eval.py		doa_eval.py
figure.ipynb		figure.ipynb
main.py		main.py
pipeline.py		pipeline.py
run.sh		run.sh
select_test_samples.py		select_test_samples.py
train.py		train.py
uncond_test.py		uncond_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViTex

🧩 Environment Setup

📦 Dataset and Checkpoints

🚀 Inference

🧠 Training

Preparing the Training Dataset

Running Training

About

Uh oh!

Releases

Packages

Languages

F1shYi/ViTex

Folders and files

Latest commit

History

Repository files navigation

ViTex

🧩 Environment Setup

📦 Dataset and Checkpoints

🚀 Inference

🧠 Training

Preparing the Training Dataset

Running Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages