Skip to content

Understanding the Codec

vbloom-x3 edited this page Oct 4, 2025 · 1 revision

Understand the Codec

Welcome, sweet voyager - This page is a mathematical walkthrough of the BLOS codec. We go from first principles (what prediction even is) to the exact algorithms (Levinson–Durbin, Rice mapping, bit-packing), and the practical tweaks (per-frame normalization, weighted autocorrelation, escape/bypass strategies) that make the implementation behave in the real world.


Table of contents

  1. Intuition & goals

  2. Linear Prediction — theory and practice

    • Autocorrelation
    • Normal equations
    • Levinson–Durbin recursion
    • Prediction error, residuals, prediction gain
  3. Mid/Side (M/S) decorrelation

  4. Residual coding: Rice (and Golomb) basics

    • Signed ↔ unsigned mapping
    • Unary + binary remainder coding
    • Selecting parameter k or m
  5. Frame structure & file layout (bit-level)

  6. Practical enhancements used in BLOS

    • Per-frame normalization (energy equalization)
    • Weighted autocorrelation (weighted LPC)
    • Adaptive escape threshold (transient bypass)
    • Residual scaling for Rice efficiency
    • Coefficient quantization & storage
  7. Complexity, numerical stability & performance tips

  8. Worked mini-example (toy frame)

  9. Tuning knobs & recommended defaults

  10. Further reading & references


1 — Intuition & goals

BLOS is a framed LPC-based codec with mid/side decorrelation and Rice (Golomb-like) residual coding. The high-level idea is:

  • Use a linear predictor to estimate the next sample from prior samples.
  • Store the residual (actual − predicted) which tends to have much smaller magnitude distribution than raw audio samples.
  • Entropy-code those residuals with a small, fast scheme (Rice/Rice-like), using per-sample escape for large transients.
  • Repeat per frame to keep latency low and allow per-frame adaptation.

The math below shows why linear prediction helps and how we compute it efficiently and stably.


2 — Linear Prediction (LPC)

2.1 Autocorrelation and the normal equations

Given a real-valued discrete-time signal (x[n]), an order-(p) linear predictor produces the estimate

$$\hat{x}[n] = -\sum_{k=1}^{p} a_k , x[n-k] ; ,$$

(where the sign convention varies — the code uses coefficients such that residual = (x[n] - \hat{x}[n])). The residual (prediction error) is

$$e[n] = x[n] - \hat{x}[n] = x[n] + \sum_{k=1}^p a_k x[n-k].$$

The autocorrelation method chooses (a_k) to minimize the total squared error over a frame of length (N):

$$E = \sum_{n=p}^{N-1} e[n]^2 = \sum_{n=p}^{N-1} \Big(x[n] + \sum_{k=1}^p a_k x[n-k]\Big)^2.$$

Differentiating with respect to each (a_j) yields the normal equations in Toeplitz form:

$$\sum_{k=1}^p a_k r[|j-k|] = -r[j],\quad j=1\dots p$$

where the autocorrelation sequence is

$$r[m] = \sum_{n=m}^{N-1} x[n] x[n-m].$$

(If you prefer unbiased estimates divide by number of terms; our implementation uses the raw sums.)

2.2 Levinson–Durbin recursion (efficient solver)

Solving the Toeplitz system directly is (O(p^3)). Levinson–Durbin gives an (O(p^2)) recurrence producing (a^{(p)}) from (a^{(p-1)}):

For order (i) compute reflection (Parcor) coefficient (k_i):

$$k_i = \frac{r[i] + \sum_{j=1}^{i-1} a_j^{(i-1)} r[i-j]}{E_{i-1}}$$

then update coefficients:

$$a_j^{(i)} = a_j^{(i-1)} + k_i a_{i-j}^{(i-1)},,\quad j=1\dots i-1$$ $$a_i^{(i)} = k_i$$

and update the prediction error energy

$$E_i = E_{i-1} (1 - k_i^2).$$

Levinson–Durbin is stable for well-behaved autocorrelation sequences and is the standard choice in audio codecs.

2.3 Prediction error and gain

After computing coefficients ({a_k}), the residuals are

$$e[n] = x[n] - \sum_{k=1}^p a_k x[n-k].$$

We measure prediction gain as

$$G = 10 \log_{10} \frac{\sigma_x^2}{\sigma_e^2}$$

where (\sigma_x^2) and (\sigma_e^2) are variances of the original signal and residuals. Larger gains mean the predictor removed more redundancy and residuals are easier to entropy-code.


3 — Mid/Side (M/S) decorrelation

M/S transform converts left/right to mid/side:

[ M = \left\lfloor\frac{L+R}{2}\right\rfloor, \quad S = L - R. ]

This often reduces inter-channel redundancy: if L and R are similar, M concentrates most of the energy while S is small. We encode M and S independently (with separate LPCs and Rice parameters) and reconstruct L/R at decode:

[ L = M + \left\lfloor\frac{S}{2}\right\rfloor, \quad R = M - \left\lfloor\frac{S}{2}\right\rfloor. ]

(Using integer arithmetic preserves lossless-style symmetry in our lossy-ish pipeline.)


4 — Residual coding: Rice (and Golomb-style) basics

Rice coding is a subset of Golomb coding where the Golomb parameter is a power of two. The coding maps signed integers to unsigned with the standard mapping:

$$u = (s \ll 1) \oplus (s >> 31)$$

(Equivalently: non-negative (s) map to even (2s), negative (s) map to odd (2|s|-1)).

Rice code with parameter (k) encodes (u) as quotient (q = u >> k) in unary followed by remainder (r = u & (2^k - 1)) in binary. The message length is (q + 1 + k) bits.

Choosing (k): if (E[u] \approx \mu), then a good rule is (k \approx \log_2(\mu)). In practice we compute mean absolute residual and choose (k = \lfloor \log_2(\mu + 1) \rfloor) clamped to reasonable bounds.

Escape (bypass): if residual magnitude is huge (transients), we avoid long unary runs by writing a 1-bit escape flag and storing the raw 16-bit sample. This both controls worst-case growth and reduces audible artifacts.


5 — Frame structure & bit layout (precise)

All multi-byte integers in BLOS are little-endian. The stream layout is:

Global header

[0]  4 bytes  : magic "BLOS"
[4]  4 bytes  : sample rate (uint32)
[8]  8 bytes  : total frames (uint64)
[16] 1 byte   : channels (uint8)
[17] 4 bytes  : frame_size (uint32)
[21] 4 bytes  : num_frames (uint32)

Per-frame (repeated for each frame)

uint32 frame_len;      // usually FRAME_SIZE
uint8  order;          // LPC order
uint8  kA;             // Rice k for M
uint8  kB;             // Rice k for S
int16  seedsM[order];  // seed samples (int16)
int16  seedsS[order];
double coeffsM[order]; // double precision LPC coeffs
double coeffsS[order];
uint32 payload_size;   // bytes of following payload
payload bytes:         // either raw arrays or compressed bitstream

Payload (compressed) — M residuals then S residuals, bit-packed:

For n = order..frame_len-1

  • 1 bit flag: 1 -> raw sample (followed by 16 bits int16), 0 -> Rice-coded residual
  • If flag==0: Rice code with kA or kB accordingly

After encoding both channels, the bit-writer is flushed and the payload is byte-aligned.


6 — Practical enhancements used in BLOS (math + rationale)

6.1 Per-frame normalization (energy equalization)

Problem: frames with very low energy produce very small residuals, while loud frames make quantization and escape behavior different. To make Rice and coefficients behave more uniformly, we scale the frame before LPC computation:

Let frame RMS be

$$\mathrm{RMS} = \sqrt{\frac{1}{N} \sum_{n=0}^{N-1} x[n]^2 }.$$

Pick a norm_target (e.g. 10000). Compute scale = norm_target / RMS, clamp to keep values within int16 range, and compute LPC on (x'[n] = x[n] \cdot scale). Store scale as a float in the frame header so the decoder can undo it.

Effect: predictor sees a more consistent-level signal → more consistent residual statistics → better Rice parameter selection.

6.2 Weighted autocorrelation (weighted LPC)

To reduce quiet-sound bias, weight samples in autocorrelation:

$$r_w[k] = \sum_{n=k}^{N-1} w[n] x[n] x[n-k],$$

where the weights are chosen to favor quiet samples, e.g.

$$w[n] = \frac{1}{1 + \beta \cdot \frac{|x'[n]|}{\max |x'|}}$$

with small (\beta) (e.g. 0.5). This gives low-amplitude samples more influence when determining coefficients so the predictor doesn't ignore them.

6.3 Adaptive escape threshold

Let frame_rms be pre-normalization RMS. The escape threshold is:

$$T = \max( T_{min}, \alpha \cdot \mathrm{frame_rms} )$$

(T_min keeps threshold above a safe floor). This means quiet frames have lower thresholds, so rare but large residuals in quiet parts still trigger escape (which helps avoid artifacts), while loud frames tolerate larger residuals before escaping.

6.4 Residual scaling before Rice

Residuals often still have RMS larger than we want for Rice coding. We scale residuals by

$$rscale = \max(1, \frac{\mathrm{resid_rms}}{R_{target}})$$

and store rscale per-frame. Encoder Rice-encodes rounded res/rscale. Decoder multiplies back when reconstructing. This reduces code-lengths while introducing controlled quantization.

6.5 Coefficient quantization & storage

We store LPC coefficients as 64-bit doubles per-frame to preserve prediction accuracy. For smaller files you can quantize to float32 or fixed-point (e.g. Q15) and save bytes. Tradeoff is small quantization noise vs fewer bytes.


7 — Complexity, numerical stability & performance tips

  • Levinson–Durbin is (O(p^2)) per frame. Prediction calculation per-sample is (O(p)). So encoding complexity per frame is (O(N p + p^2)).
  • Keep p moderate (e.g. 4..12) for reasonable CPU on mobile/old CPUs.
  • Use double internal math for Levinson to reduce numerical drift, but store coefficients in float if you need smaller files.
  • When computing autocorrelation, use windowing if you desire smoother coefficient transitions; however, Hanning windows can worsen compression for transients — BLOS omits windowing by default and relies on small frames instead.
  • Always clamp energies to avoid divide-by-zero.

8 — Worked mini-example (toy frame)

Assume a small frame of 8 samples (toy):

(x = [1000, 980, 1020, 1010, 995, 1005, 1008, 1002]) and order (p=2).

  1. Compute autocorrelations (r[0], r[1], r[2]).
  2. Levinson–Durbin yields (a_1, a_2).
  3. Compute residuals (e[n] = x[n] - a_1 x[n-1] - a_2 x[n-2]) for (n\ge p).
  4. Measure residual RMS and choose Rice k (\approx \lfloor \log_2(\mu+1)\rfloor).
  5. Encode residuals: small residuals become short Rice codes.

This little example demonstrates how correlated samples give small residuals — low entropy.

(You can compute numeric values by hand or run a tiny Python snippet to see the actual coefficients and residuals.)


9 — Tuning knobs & recommended defaults

  • FRAME_SIZE: 512 (default). For very transient, percussive music choose 256. For smooth material choose 1024+.
  • MIN_ORDER / MAX_ORDER: typically 2..12. Higher improves harmonic tracking at CPU cost.
  • ESCAPE_THRESHOLD: defaults chosen experimentally; lower it to reduce clicks (store more raw samples) at cost of size.
  • TARGET_RESID_RMS: choose 8..16 to make rice efficient.
  • beta (weighting): 0.3..0.7 is a good sweet spot.

10 — Further reading & references

  • Burrus: "Signals and Systems" (autocorrelation fundamentals)
  • Proakis: "Digital Signal Processing" (LPC chapters)
  • Levinson, N. (1947). The Wiener RMS error criterion in filter design and prediction. Proc. IRE.
  • Rice / Golomb coding references — Golomb 1966; Rice is special power-of-two case.

Appendix: Useful formulas reference

  • Signed→unsigned mapping for Rice/Golomb:

$$u(s) = \begin{cases} 2s & s \ge 0 \ -2s - 1 & s < 0 \end{cases}$$

  • Rice code length for value (u) and parameter (k):

$$\ell(u) = (u >> k) + 1 + k$$

  • Levinson update (compact): see section above