Skip to content

bskrzypek/jammy_flows

 
 

Repository files navigation

jammy_flows

This package implements (conditional) PDFs with Joint Autoregressive Manifold (MY) normalizing-flows. It grew out of work for the paper Unifying supervised learning and VAEs - automating statistical inference in high-energy physics [arXiv:2008.05825] and includes the paper's described methodology for coverage calculation on joint manifolds. For Euclidean manifolds, it includes an updated implementation of the offical implementation of Gaussianization flows [arXiv:2003.01941], where now the inverse is differentiable (adding Newton iterations to the bisection) and made more stable using better approximations of the inverse Gaussian CDF.

The package has a simple syntax that lets the user define a PDF and get going with a single line of code that should just work. To define a 10-d PDF, with 4 Euclidean dimensions, followed by a 2-sphere, followed again by 4 Euclidean dimensions, one could for example write

import jammy_flows

pdf=jammy_flows.pdf("e4+s2+e4", "gggg+n+gggg")

The first argument describes the manifold structure, the second argument the flow layers for a particular manifold. Here "g" and "n" stand for particular normalizing flow layers that are pre-implemented (see Features below). The Euclidean parts in this example use 4 "g" layers each. drawing

Have a look at the script that generates the above animation or at the example notebook.

Getting started

Non-Conditional PDF

Lets say we have Euclidean 3-d samples "x" from a 3-d target distribution we want to approximate with a PDF p(x). First we initalize a 3-d PDF:

import jammy_flows

pdf=jammy_flows.pdf("e3", "gggg")

We chose to do so with 4 Gaussianization flow layers. Next we just loop through the batches and minimize negative log-probability, assuming we have a torch optimizer:

# target_samples = array of 3-d samples from the target distribution
# batch_size: the given batch size
# num_batches_per_epoch: number of batches in the epoch
# optimizer: the given pytorch optimizer

batch_size=10

for ind in range(num_batches_per_epoch):
    this_label_batch=target_samples[ind:ind+batch_size]

    optimizer.zero_grad()

    log_pdf,_,_=pdf(this_label_batch)

    neg_log_loss=-log_pdf.mean()

    neg_log_loss.backward()

    optimizer.step()

### after training ###

## evaluation
# target_point = point "x" to evaluate the PDF at
# log_pdf, base_log_pdf, base_point = pdf(target_point)

## sampling
# target_sample, base_sample, target_log_pdf, base_log_pdf = pdf.sample(samplesize=1000)

Conditional PDF

Let's now say we want to describe a 3-d conditional PDF p(x;y) that depends on some 2-dimensional input 'y'.

import jammy_flows

pdf=jammy_flows.pdf("e3", "gggg", conditional_input_dim=2)

Here we have to specify the conditional input dimension, and then in the training loop add the input. The rest is very similar to the non-conditional PDF:

# target_samples = array of 3-d samples from the target distribution
# input_data = array of 2-d input data of same length as target_samples
# batch_size: the given batch size
# num_batches_per_epoch: number of batches in the epoch
# optimizer: the given pytorch optimizer

for ind in range(num_batches_per_epoch):
    this_label_batch=target_samples[ind:ind+batch_size]
    this_data_batch=input_data[ind:ind+batch_size]

    optimizer.zero_grad()

    log_pdf,_,_=pdf(this_label_batch, conditional_input=this_data_batch)

    neg_log_loss=-log_pdf.mean()

    neg_log_loss.backward()

    optimizer.step()

### after training ###

## evaluation
# target_point = point "x" to evaluate the PDF at
# log_pdf, base_log_pdf, base_point = pdf(target_point, conditional_input=some_input)

## sampling, shape of 'some_input' defines number of samples in conditional pdf
# target_sample, base_sample, target_log_pdf, base_log_pdf = pdf.sample(conditional_input=some_input) 

Documentation

Hopefully following soon. In the meantime check out the example script and example notebook.

Features

General

  • Autoregressive conditional structure is taken care of behind the scenes

  • Coverage is straightforward. Everything, including spherical dimensions, is based on a Gaussian base distribution (arXiv:2008.0582).

  • Bisection & Newton iterations for differentiable inverse (used for Gaussianization flow/Moebius flow/Exponential map flow)

  • amortizable MLPs that use low-rank approximations

  • unit tests that make sure backwards / and forward flow passes of implemented flow-layers agree

  • include log-lambda as an additional flow parameter to define parametrized Poisson-Processes

  • easily extendible: define new Euclidean / spherical flow layers by subclassing Euclidean or spherical base classes

  • Interval flows

  • Categorical variables for joint regression/classification

Euclidean flows:

  • Gaussianization flow (described in arXiv:2003.01941) ("g")
  • Hybrid of Nonlinear Scalings and HouseHolder Rotations ("Polynomial Stretch flow") ("p")

Spherical flows:

S1:

S2:

Interval Flows:

Requirements

  • pytorch (>=1.7)
  • numpy (>=1.18.5)
  • scipy (>=1.5.4)
  • matplotlib (>=3.3.3)
  • torchdiffeq (>=0.2.1)

The package has been built and tested with these versions, but might work just fine with older ones.

Installation

pip install git+https://github.com/thoglu/jammy_flows.git

Contributions

If you want to implement your own layer or have bug / feature suggestions, just file an issue.

About

A package to describe joint autoregressive (conditional) normalizing-flow PDFs on manifolds with coverage control.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%