VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

This repository contains the data preprocessing pipelines, supervised fine-tuning (SFT) scripts, and evaluation framework used in the VectorGym benchmark paper.

Overview

VectorGym benchmarks large multimodal language models on SVG code generation tasks. The primary task is text-to-SVG: given a natural language description, generate valid, visually accurate SVG code.

The pipeline covers:

Data annotation — generating captions for SVG images using a vision-language model
Fine-tuning — LoRA-based SFT of Qwen2.5-VL-32B-Instruct
Evaluation — multi-metric validation (L2, LPIPS, SSIM, FID, CLIP Score, DINO Score, token length)

Repository Structure

svg-research/
├── configs/
│   ├── generation/
│   │   └── text2svg.yaml              # Inference/validation config
│   └── sft/lora/qwen_2.5vl_instruct_32b/
│       └── text2svg_config.yaml       # LoRA fine-tuning config
├── eval/
│   ├── metrics/                       # Individual metric implementations
│   ├── svg_validator_base.py          # Abstract validator base class
│   ├── svg_validator_hf.py            # HuggingFace inference backend
│   ├── vllm_svg_validator.py          # vLLM inference backend
│   └── validate.py                    # Validation CLI entry point
├── train/
│   ├── train.py                       # SFT training script
│   └── util.py                        # Model loading and data preprocessing
├── utils/
│   ├── annotate.py                    # SVG caption generation (vLLM)
│   ├── dataset_utils.py               # Dataset splitting and Hub upload
│   ├── svg_util.py                    # SVG rendering and processing
│   └── utils.py                       # Shared helpers
├── scripts/
│   └── upload_lora_adapters.py        # Upload adapters to Hugging Face Hub
└── pyproject.toml

Installation

Python >= 3.11 is required.

git clone https://github.com/alys28/svg-research.git
cd svg-research
pip install -e .

For training with DeepSpeed and WandB logging:

pip install -e ".[train]"

Data Preprocessing

Annotating SVGs with Captions

utils/annotate.py uses a vLLM-hosted vision-language model to generate natural language descriptions for SVG images. These captions serve as the text prompts during SFT.

Managing Dataset Splits

utils/dataset_utils.py provides utilities for chunking large datasets and uploading splits to the Hugging Face Hub:

from utils.dataset_utils import split_into_one_repo, copy_splits_to_repo

The training data is hosted at svg-hub/svg-stack-annotated-sample.

Fine-Tuning

Training uses TRL's SFTTrainer with LoRA (via PEFT) applied to all attention and MLP projection layers of Qwen2.5-VL-32B-Instruct.

Configuration

Edit configs/sft/lora/qwen_2.5vl_instruct_32b/text2svg_config.yaml to set training hyperparameters. Key defaults:

Parameter	Value
Base model	`Qwen/Qwen2.5-VL-32B-Instruct`
LoRA rank	64
LoRA alpha	32
Learning rate	5e-5
Batch size (per device)	4
Gradient accumulation	8
Max sequence length	16384
Precision	bfloat16
LR scheduler	cosine

Running Training

python train/train.py --config configs/sft/lora/qwen_2.5vl_instruct_32b/text2svg_config.yaml

Checkpoints are saved every 250 steps. Training metrics (loss, gradient norm, learning rate) are logged to WandB and exported to plots/.

Uploading Adapters

After training, upload LoRA adapters to the Hugging Face Hub:

python scripts/upload_lora_adapters.py \
  --folder_path outputs/checkpoint-2500 \
  --repo_id svg-hub/qwen_2.5vl_instruct_text2svg_ckpt_2500 \
  --token $HF_TOKEN

Evaluation

Configuration

Edit configs/generation/text2svg.yaml. Key settings:

Parameter	Value
Base model	`Qwen/Qwen2.5-VL-32B-Instruct`
PEFT adapter	`svg-hub/qwen_2.5vl_instruct_text2svg_ckpt_2500`
Dataset split	`test_00`
Rasterize / render size	512 × 512
Temperature	0.7
Max new tokens	16384

Running Validation

python eval/validate.py --config configs/generation/text2svg.yaml

CLI flags override any config value, e.g.:

python eval/validate.py --config configs/generation/text2svg.yaml --batch_size 8 --temperature 0.9

Metrics

Metric	Description
L2	Pixel-level Euclidean distance
LPIPS	Learned perceptual image patch similarity
SSIM	Structural similarity index
FID	Fréchet Inception Distance
CLIP Score	Semantic text–image alignment
DINO Score	Feature-based image similarity
Token Length	SVG code token count

Results are logged to WandB and written to the output directory specified in the config.

Inference Backends

Two backends are supported via the validator registry:

SVGHFValidator — HuggingFace transformers (default)
VLLMValidator — vLLM for faster batch inference

Set generation_engine in the generation config to switch backends.

Environment Variables

Create a .env file in the project root with:

HF_TOKEN=your_huggingface_token
WANDB_API_KEY=your_wandb_key

Citation

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Overview

Repository Structure

Installation

Data Preprocessing

Annotating SVGs with Captions

Managing Dataset Splits

Fine-Tuning

Configuration

Running Training

Uploading Adapters

Evaluation

Configuration

Running Validation

Metrics

Inference Backends

Environment Variables

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
configs		configs
eval		eval
plots		plots
scripts		scripts
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
logs.json		logs.json
pyproject.toml		pyproject.toml
typescript		typescript

Folders and files

Latest commit

History

Repository files navigation

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Overview

Repository Structure

Installation

Data Preprocessing

Annotating SVGs with Captions

Managing Dataset Splits

Fine-Tuning

Configuration

Running Training

Uploading Adapters

Evaluation

Configuration

Running Validation

Metrics

Inference Backends

Environment Variables

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages