Normalization.jl

This package allows you to easily normalize an array over any combination of dimensions, with a bunch of methods (z-score, sigmoid, centering, minmax, etc.) and modifiers (robust, mixed, NaN-safe).

Usage

Each normalization method is a subtype of AbstractNormalization. Each AbstractNormalization subtype has its own estimators and forward methods that define how parameters are calculated and the normalization formula. Each AbstractNormalization instance contains the concrete parameter values for a normalization, fit to a given input array.

You can work with AbstractNormalizations as either types or instances. The type approach is useful for concise code, whereas the instance approach is useful for performant mutations. In the examples below we use the ZScore normalization, but the same syntax applies to all Normalizations.

Fit to a type

X = randn(100, 10)
N = fit(ZScore, X; dims=nothing) # eltype inferred from X
N = fit(ZScore{Float32}, X; dims=nothing) # eltype set to Float32
N isa AbstractNormalization && N isa ZScore # Returns a concrete AbstractNormalization

Fit to an instance

X = randn(100, 10)
N = ZScore{Float64}(; dims=2) # Initializes with empty parameters
N isa AbstractNormalization && N isa ZScore # Returns a concrete AbstractNormalization
!isfit(N)

fit!(N, X; dims=1) # Fit normalization in-place, and update the `dims`
Normalization.dims(N) == 1

Normalization and denormalization

With a fit normalization, there are two approaches to normalizing data: in-place and out-of-place.

_X = copy(X)
normalize!(_X, N) # Normalizes in-place, updating _X
Y = normalize(X, N) # Normalizes out-of-place, returning a new array
normalize(X, ZScore; dims=1) # For convenience, fits and then normalizes

For most normalizations, there is a corresponding denormalization that transforms data to the original space.

Z = denormalize(Y, N) # Denormalizes out-of-place, returning a new array
Z ≈ X
denormalize!(Y, N) # Denormalizes in-place, updating Y

Both syntaxes allow you to specify the dimensions to normalize over. For example, to normalize each 2D slice (i.e. iterating over the 3rd dimension) of a 3D array:

X = rand(100, 100, 10)
N = fit(ZScore, X; dims=[1, 2])
normalize!(X, N) # Each [1, 2] slice is normalized independently
all(std(X; dims=[1, 2]) .≈ 1) # true

Normalization methods

Any of these normalizations will work in place of ZScore in the examples above:

Normalization	Formula	Description
`ZScore`	$(x - \mu)/\sigma$	Subtract the mean and scale by the standard deviation (aka standardization)
`Sigmoid`	$(1 + \exp(-\frac{x-\mu}{\sigma}))^{-1}$	Map to the interval $(0, 1)$ by applying a sigmoid transformation
`MinMax`	$(x-\inf{x})/(\sup{x}-\inf{x})$	Scale to the unit interval
`Center`	$x - \mu$	Subtract the mean
`UnitEnergy`	$x/\sqrt{\sum x^2 \cdot \Delta t}$	Scale to have unit energy
`UnitPower`	$x/\sqrt{\langle x^2 \rangle}$	Scale to have unit average power
`HalfZScore`	$\sqrt{1-2/\pi} \cdot (x - \inf{x})/\sigma$	Normalization to the standard half-normal distribution
`OutlierSuppress`	$\max(\min(x, \mu + 5\sigma), \mu - 5\sigma)$	Clip values outside of $\mu \pm 5\sigma$

Note

MinMax and constant inputs
When all values are identical (min == max), the standard MinMax formula is undefined; MinMax therefore returns NaN values for such arrays. If you prefer a bounded fallback, use MinMaxClip, which maps constant inputs to the midpoint of the unit interval and clips out-of-range values to [0, 1].

Normalization modifiers

What if the input data contains NaNs or outliers? We provide AbstractModifier types that can wrap an AbstractNormalization to modify its behavior.

Any concrete modifier type Modifier <: AbstractModifier (for example, NaNSafe) can be applied to a concrete normalization type Normalization <:AbstractNormalization:

    N = NaNSafe{ZScore} # A combined type with a free `eltype` of `Any`
    N = NaNSafe{ZScore{Float64}} # A concrete `eltype` of `Float64`

Any AbstractNormalization can be used in the same way as an AbstractModifier.

NaN-safe normalizations

If the input array contains any NaN values, the ordinary normalizations given above will fit with NaN parameters and return NaN arrays. To circumvent this, any normalization can be made 'NaN-safe', meaning it ignores NaN values in the input array, using the NaNSafe modifier.

Robust modifier

The Robust modifier can be used with any AbstractNormalization that has mean and standard deviation parameters. The Robust modifier converts the mean to median and std to iqr/1.35, giving a normalization that is less sensitive to outliers.

Mixed modifier

The Mixed modifier defaults to the behavior of Robust but uses the regular parameters (mean and std) if the iqr is 0.

Properties and traits

The following are common methods defined for all AbstractNormalization subtypes and instances.

Type traits

Normalization.estimators(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the estimators N as a tuple of functions
forward(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the forward normalization function (e.g. $x$ -> $x - \mu / \sigma$ for the ZScore)
inverse(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}})returns the inverse normalization function e.g.forward(N)(ps...) |> InverseFunctions.inverse`
eltype(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the eltype of the normalization parameters

Concrete properties

Normalization.dims(N::<:AbstractNormalization) returns the dimensions of the normalization. The dimensions are determined by dims and correspond to the mapped slices of the input array.
params(N::<:AbstractNormalization) returns the parameters of N as a tuple of arrays. The dimensions of arrays are the complement of dims.
isfit(N::<:AbstractNormalization) checks if all parameters are non-empty

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github		.github
ext		ext
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Normalization.jl

Usage

Fit to a type

Fit to an instance

Normalization and denormalization

Normalization methods

Normalization modifiers

NaN-safe normalizations

Robust modifier

Mixed modifier

Properties and traits

Type traits

Concrete properties

About

Uh oh!

Releases 19

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

brendanjohnharris/Normalization.jl

Folders and files

Latest commit

History

Repository files navigation

Normalization.jl

Usage

Fit to a type

Fit to an instance

Normalization and denormalization

Normalization methods

Normalization modifiers

NaN-safe normalizations

Robust modifier

Mixed modifier

Properties and traits

Type traits

Concrete properties

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages