Polar-VQ Codec

A neural image codec exploring Polar Vector Quantization — decomposing latent representations into radius (magnitude) and direction (unit hypersphere), then applying Product Quantization + Residual VQ for learned image compression.

Status: Research prototype. Demonstrates a working end-to-end neural codec pipeline with a novel quantization approach. Does not yet match JPEG at equal bitrate — see honest assessment and what we learned.

Key Idea

Traditional VQ treats all dimensions equally. Polar-VQ separates each latent vector into:

Radius r = ‖v‖ — how strong a feature is (4-bit scalar quantization)
Direction d = v / ‖v‖ — what kind of feature it is (codebook lookup on the unit hypersphere)

This separation is motivated by the observation that in high-dimensional spaces, most information is in the direction of vectors (cosine similarity), while magnitude varies smoothly and can be cheaply quantized.

Latent vector v ∈ ℝ²⁵⁶
    │
    ├── Radius: r = ‖v‖           → 4-bit scalar (16 learned levels)
    │
    └── Direction: d = v/‖v‖      → Product Quantization (8 heads × 32-D)
         │                            → 4-stage Residual VQ
         │                               Stage 1: Spherical (cosine sim)
         │                               Stages 2-4: Cartesian (L2)
         └── Entropy coding via checkerboard context model

Architecture

Component	Design	Parameters
Encoder	CNN: 3→128→192→256, 8× downsample	3 strided conv + 1 refinement
Polar-VQ	Radius quant + PQ + RVQ	8 heads × 32-D, 1024 codebook entries, 4 stages
Context Model	Checkerboard two-pass entropy predictor	Spatial CNN + per-stage MLPs
Decoder	Transposed CNN, Sigmoid output	Mirror of encoder

See docs/ARCHITECTURE.md for the full technical deep-dive.

Results

Evaluated on the Kodak dataset (24 images, 768×512).

Current Best (V4 — 15 epochs of 100)

Metric	Polar-VQ V4	JPEG Q90 (similar BPP)
BPP	3.35	2.35
PSNR	34.60 dB	38.03 dB
MS-SSIM	0.9857	0.9930

Training Evolution

Version	BPP	PSNR (dB)	MS-SSIM	Training Data	Key Change
V1	2.36	26.06	0.9488	DIV2K (800)	Baseline implementation
V2	2.56	15.00	0.6965	DIV2K (800)	❌ Aggressive λ warmup → rate collapse
V3	4.44	26.32	0.9609	DIV2K (800)	Context pre-training, gradual warmup
V4	3.35	34.60	0.9857	COCO+DIV2K (124K)	Bug fixes, larger dataset, AMP

V4 achieved +8.5 dB PSNR over V1 and is still training (epoch 15 of 100). Full results and analysis in docs/RESULTS.md.

Installation

git clone https://github.com/acieslik/Polar-VQ-Codec.git
cd Polar-VQ-Codec
pip install -r requirements.txt

# For .pvq file compression/decompression:
pip install -e ".[full]"

Requires Python 3.10+ and PyTorch 2.0+ with CUDA.

Quick Start

Download Data

# Kodak benchmark only (~15 MB)
python scripts/download_data.py --kodak-only

# Full training data (COCO 2017 Unlabeled + DIV2K, ~22 GB)
python scripts/download_data.py

Train

# Default: 3-stage curriculum, 100 epochs
python scripts/train.py --data-dir data/train --epochs 100 --target-lambda 0.01

# With validation on Kodak every 5 epochs
python scripts/train.py --data-dir data/train --target-lambda 0.01 \
    --val-dir data/kodak --val-interval 5

# Resume from checkpoint
python scripts/train.py --data-dir data/train --resume checkpoints/latest.pth

Training stages:

Stage	Epochs	Focus	λ
A	0–10	Geometric foundation (MSE only)	1e-6
B	10–60	Rate-Distortion optimization	0 → target λ (15-epoch warmup)
C	60–100	Perceptual fine-tuning (MS-SSIM+L1)	target λ / 20

Benchmark

python scripts/benchmark.py --dataset data/kodak --weights checkpoints/latest.pth

Generates R-D curves (BPP vs PSNR/MS-SSIM) comparing against JPEG, WebP, and PNG.

Compress / Decompress

python scripts/compress.py photo.png photo.pvq --weights checkpoints/latest.pth
python scripts/decompress.py photo.pvq decoded.png --weights checkpoints/latest.pth

Project Structure

Polar-VQ-Codec/
├── polar_vq/                  # Core library
│   ├── encoder.py             # CNN encoder (8× downsample)
│   ├── decoder.py             # CNN decoder (Sigmoid output)
│   ├── quantizer.py           # PolarVQ: radius + PQ + hybrid RVQ
│   ├── context_model.py       # Checkerboard entropy predictor
│   ├── codec.py               # Full pipeline + .pvq bitstream
│   └── losses.py              # MSE / MS-SSIM+L1 R-D loss
├── scripts/
│   ├── train.py               # Multi-stage training with curriculum
│   ├── benchmark.py           # R-D curves vs JPEG/WebP/PNG
│   ├── compress.py            # Image → .pvq
│   ├── decompress.py          # .pvq → image
│   └── download_data.py       # Dataset downloader
├── docs/
│   ├── ARCHITECTURE.md        # Technical deep-dive
│   ├── RESULTS.md             # Benchmark analysis (V1–V4)
│   └── ROADMAP.md             # Future directions
├── tests/                     # Unit tests (pytest)
├── checkpoints/               # Saved model weights
└── results/                   # Benchmark outputs

Current Limitations

This is an honest assessment of where the project stands:

Not yet competitive with JPEG. At 3.35 BPP, JPEG achieves ~38 dB vs our 34.6 dB. The context model hasn't learned to compress the index entropy well enough — the raw bit budget (5.06 BPP) is too close to the output.
Only one operating point per trained model. JPEG/WebP can sweep quality parameters at encode time. Each Polar-VQ quality level requires a separately trained model (~4 days each on a single GPU).
Training is ongoing. V4 has only completed 15 of 100 epochs. Stage B (rate optimization) has just begun — BPP is actively declining (5.13 → 3.35 over 5 epochs).
Simple CNN architecture. State-of-the-art neural codecs use attention mechanisms, hyperprior networks, and deeper architectures. Our 7-layer CNN is deliberately minimal.
GPU required for decode. No hardware decoder exists — inference requires PyTorch + CUDA.

What We Learned

The iterative development from V1 to V4 produced several insights about training VQ-based neural codecs:

Dataset diversity matters more than quantity for VQ. Switching from 800 DIV2K images to 124K COCO+DIV2K images gave the largest single improvement (+8.5 dB). Codebooks need semantic diversity to populate properly.
λ warmup is critical. Jumping to full rate penalty caused catastrophic collapse (V2: 15 dB). A 15-epoch linear warmup into Stage B prevented this.
Detach BPP gradients during geometric training. Stage A trains encoder/decoder/codebooks without rate pressure, letting the quantizer establish a stable latent topology first.
Dead codebook restarts only in Stage A. Restarting entries during rate optimization (Stage B) destabilizes the learned distributions.
The PQ scale factor is not optional. Each head of a unit vector has expected norm 1/√num_heads, not 1. Without scaling, RVQ stages waste capacity correcting a scale error.

Roadmap

The Polar-VQ quantization approach may be more impactful outside image compression. See docs/ROADMAP.md for analysis of:

Completing the image codec (hyperprior, attention, multi-λ sweep)
LLM weight compression — Polar-VQ preserves directional information that scalar quantizers (GPTQ, AWQ) discard
Vector database compression — cosine similarity is the standard metric, and Polar-VQ directly optimizes for angular preservation
Satellite/medical imaging — high-dimensional multispectral data is a natural fit

Citation

If you use this code in your research, please cite:

@software{polar_vq_codec,
  title={Polar-VQ Codec: Neural Image Compression with Hyperspherical Vector Quantization},
  year={2026},
  url={https://github.com/acieslik/Polar-VQ-Codec}
}

License

AGPL-3.0 — see LICENSE. Free for research and personal use. Commercial use requires opening your source code or obtaining a commercial license.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
checkpoints_v4		checkpoints_v4
data/kodak		data/kodak
docs		docs
polar_vq		polar_vq
results		results
results_v2		results_v2
results_v3		results_v3
results_v4		results_v4
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polar-VQ Codec

Key Idea

Architecture

Results

Current Best (V4 — 15 epochs of 100)

Training Evolution

Installation

Quick Start

Download Data

Train

Benchmark

Compress / Decompress

Project Structure

Current Limitations

What We Learned

Roadmap

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Polar-VQ Codec

Key Idea

Architecture

Results

Current Best (V4 — 15 epochs of 100)

Training Evolution

Installation

Quick Start

Download Data

Train

Benchmark

Compress / Decompress

Project Structure

Current Limitations

What We Learned

Roadmap

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages