Skip to content

alys28/svg-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

This repository contains the data preprocessing pipelines, supervised fine-tuning (SFT) scripts, and evaluation framework used in the VectorGym benchmark paper.


Overview

VectorGym benchmarks large multimodal language models on SVG code generation tasks. The primary task is text-to-SVG: given a natural language description, generate valid, visually accurate SVG code.

The pipeline covers:

  • Data annotation — generating captions for SVG images using a vision-language model
  • Fine-tuning — LoRA-based SFT of Qwen2.5-VL-32B-Instruct
  • Evaluation — multi-metric validation (L2, LPIPS, SSIM, FID, CLIP Score, DINO Score, token length)

Repository Structure

svg-research/
├── configs/
│   ├── generation/
│   │   └── text2svg.yaml              # Inference/validation config
│   └── sft/lora/qwen_2.5vl_instruct_32b/
│       └── text2svg_config.yaml       # LoRA fine-tuning config
├── eval/
│   ├── metrics/                       # Individual metric implementations
│   ├── svg_validator_base.py          # Abstract validator base class
│   ├── svg_validator_hf.py            # HuggingFace inference backend
│   ├── vllm_svg_validator.py          # vLLM inference backend
│   └── validate.py                    # Validation CLI entry point
├── train/
│   ├── train.py                       # SFT training script
│   └── util.py                        # Model loading and data preprocessing
├── utils/
│   ├── annotate.py                    # SVG caption generation (vLLM)
│   ├── dataset_utils.py               # Dataset splitting and Hub upload
│   ├── svg_util.py                    # SVG rendering and processing
│   └── utils.py                       # Shared helpers
├── scripts/
│   └── upload_lora_adapters.py        # Upload adapters to Hugging Face Hub
└── pyproject.toml

Installation

Python >= 3.11 is required.

git clone https://github.com/alys28/svg-research.git
cd svg-research
pip install -e .

For training with DeepSpeed and WandB logging:

pip install -e ".[train]"

Data Preprocessing

Annotating SVGs with Captions

utils/annotate.py uses a vLLM-hosted vision-language model to generate natural language descriptions for SVG images. These captions serve as the text prompts during SFT.

Managing Dataset Splits

utils/dataset_utils.py provides utilities for chunking large datasets and uploading splits to the Hugging Face Hub:

from utils.dataset_utils import split_into_one_repo, copy_splits_to_repo

The training data is hosted at svg-hub/svg-stack-annotated-sample.


Fine-Tuning

Training uses TRL's SFTTrainer with LoRA (via PEFT) applied to all attention and MLP projection layers of Qwen2.5-VL-32B-Instruct.

Configuration

Edit configs/sft/lora/qwen_2.5vl_instruct_32b/text2svg_config.yaml to set training hyperparameters. Key defaults:

Parameter Value
Base model Qwen/Qwen2.5-VL-32B-Instruct
LoRA rank 64
LoRA alpha 32
Learning rate 5e-5
Batch size (per device) 4
Gradient accumulation 8
Max sequence length 16384
Precision bfloat16
LR scheduler cosine

Running Training

python train/train.py --config configs/sft/lora/qwen_2.5vl_instruct_32b/text2svg_config.yaml

Checkpoints are saved every 250 steps. Training metrics (loss, gradient norm, learning rate) are logged to WandB and exported to plots/.

Uploading Adapters

After training, upload LoRA adapters to the Hugging Face Hub:

python scripts/upload_lora_adapters.py \
  --folder_path outputs/checkpoint-2500 \
  --repo_id svg-hub/qwen_2.5vl_instruct_text2svg_ckpt_2500 \
  --token $HF_TOKEN

Evaluation

Configuration

Edit configs/generation/text2svg.yaml. Key settings:

Parameter Value
Base model Qwen/Qwen2.5-VL-32B-Instruct
PEFT adapter svg-hub/qwen_2.5vl_instruct_text2svg_ckpt_2500
Dataset split test_00
Rasterize / render size 512 × 512
Temperature 0.7
Max new tokens 16384

Running Validation

python eval/validate.py --config configs/generation/text2svg.yaml

CLI flags override any config value, e.g.:

python eval/validate.py --config configs/generation/text2svg.yaml --batch_size 8 --temperature 0.9

Metrics

Metric Description
L2 Pixel-level Euclidean distance
LPIPS Learned perceptual image patch similarity
SSIM Structural similarity index
FID Fréchet Inception Distance
CLIP Score Semantic text–image alignment
DINO Score Feature-based image similarity
Token Length SVG code token count

Results are logged to WandB and written to the output directory specified in the config.

Inference Backends

Two backends are supported via the validator registry:

  • SVGHFValidator — HuggingFace transformers (default)
  • VLLMValidator — vLLM for faster batch inference

Set generation_engine in the generation config to switch backends.


Environment Variables

Create a .env file in the project root with:

HF_TOKEN=your_huggingface_token
WANDB_API_KEY=your_wandb_key

Citation

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

About

Data pre-processing + SFT pipelines for: VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages