Skip to content

dahlp94/sdm-car

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

126 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDM-CAR: Spectral-Density–Modulated Conditional Autoregressive Models

Designing Flexible and Scalable Spatial Gaussian Models

Spatial models often force a tradeoff between interpretability (classical CAR), flexibility (Gaussian processes), and computational scalability.

SDM-CAR is a modular framework designed to make these tradeoffs explicit and controllable. Rather than fixing a neighborhood-based precision matrix, the framework parameterizes spatial dependence in the graph spectral domain, allowing controlled movement between rigid CAR models and flexible, nonparametric spectral priors—while retaining scalable inference.

The system supports:

  • exact recovery of classical CAR models,
  • progressively more flexible spectral families,
  • collapsed variational inference (VI) for fast experimentation,
  • collapsed Metropolis-within-Gibbs MCMC for validation,
  • and a filter-agnostic experimental pipeline for fair comparison.

All inference—VI or MCMC, across any filter family—is executed through a single, filter-agnostic runner.


Design decisions and tradeoffs

SDM-CAR was designed around several practical constraints:

Scalability vs Expressiveness

Dense Gaussian process models scale as $O(n^3)$ and are impractical for large spatial graphs. Classical CAR models are computationally efficient but structurally rigid.

Decision: Represent spatial covariance in the graph spectral domain so that:

  • covariance is diagonal in the eigenbasis,
  • inference reduces to elementwise operations,
  • flexibility is introduced via $F(\lambda)$ instead of dense matrices.

Flexibility vs Identifiability

Highly flexible spectral parameterizations can introduce ridges and non-identifiability.

Decision:

  • enforce positivity through constrained parameterizations,
  • structure MCMC proposals in parameter blocks,
  • implement ridge diagnostics and spectrum error metrics,
  • benchmark VI against collapsed MCMC.

Approximate vs Exact Inference

Variational inference is fast but may underestimate uncertainty. MCMC is accurate but computationally heavier.

Decision:

  • implement both collapsed VI and collapsed MCMC,
  • ensure they operate on identical abstractions,
  • directly quantify discrepancies between them.

1. Overview

Conditional Autoregressive (CAR) models are widely used for spatially indexed data, but they rely on fixed neighborhood-based precision structures that restrict flexibility and impose strong structural assumptions.

SDM-CAR replaces fixed CAR precision matrices with parametric spectral filters of a graph Laplacian, yielding a flexible covariance model of the form

$$ \Sigma_\phi = U \mathrm{diag}\big(F(\lambda;\theta)\big) U^\top, $$

where $L = U \mathrm{diag}(\lambda) U^\top$ is the Laplacian of a user-defined graph and $F(\lambda;\theta) \ge 0$ is a learnable spectral filter.

This formulation:

  • strictly generalizes classical CAR models,
  • provides interpretable parameters controlling variance, range, and smoothness,
  • recovers CAR exactly as a special case,
  • separates graph construction from covariance modeling,
  • enables scalable inference via spectral diagonalization.

Importantly, SDM-CAR is graph-based rather than distance-based. Spatial dependence is defined relative to the spectrum of an arbitrary graph Laplacian — not directly as a function of Euclidean distance. This allows the framework to operate on any domain where a meaningful graph structure can be defined.

Both collapsed variational inference (VI) and collapsed Metropolis-within-Gibbs MCMC are implemented under a shared abstraction, enabling principled comparison between approximate and exact inference.


Graph construction in this repository

The current implementation supports:

  • k-nearest-neighbor (kNN) graph construction on regular grids,
  • weighted Laplacian construction,
  • full eigendecomposition for spectral diagonalization,
  • filter-agnostic inference over arbitrary Laplacians.

All experiments in this repository are conducted on grid-based graphs constructed via kNN, demonstrating that:

  • SDM-CAR does not require explicit covariance kernels of the form $k(|x_i - x_j|)$,
  • spatial smoothness is controlled entirely in the spectral domain,
  • model flexibility arises from $F(\lambda;\theta)$ rather than fixed precision templates.

Future work

Because SDM-CAR depends only on the graph Laplacian, the framework naturally extends to:

  • irregular spatial lattices (e.g., administrative region adjacency graphs),
  • transportation or road networks,
  • social and communication networks,
  • feature-similarity graphs (e.g., kNN in embedding space),
  • community-structured or modular graphs,
  • non-Euclidean domains such as brain connectivity networks.

Planned future directions include:

  • experiments on non-geometric graph constructions,
  • robustness analysis under graph rewiring,
  • learned or data-driven graph structures,
  • sparse eigensolvers for large-scale graphs,
  • structured priors over graph spectra. These extensions would further demonstrate the generality of the spectral framework beyond grid-based spatial settings.

2. Model formulation

Let

$$ L = U \mathrm{diag}(\lambda) U^\top $$

denote the eigendecomposition of a graph Laplacian constructed from spatial locations.

The spatial random effect is defined as

$$ \phi = U z, \qquad z \sim \mathcal{N}\left(0, \mathrm{diag}(F(\lambda;\theta))\right), $$

where $F(\lambda;\theta) \ge 0$ is a parametric spectral filter.

This induces the covariance

$$ \Sigma_\phi = U \mathrm{diag}(F(\lambda;\theta)) U^\top. $$

The observation model is

$$ y = X\beta + \phi + \varepsilon, \qquad \varepsilon \sim \mathcal N(0,\sigma^2 I). $$

All inference is performed after analytically marginalizing $\phi$.


3. CAR as a special case

When the spectral filter is chosen as

$$ F(\lambda) = \frac{\tau^2}{\lambda + \rho_0}, $$

the resulting covariance satisfies

$$ \Sigma_\phi^{-1} \propto L + \rho_0 I, $$

which corresponds exactly to a proper CAR model.

This guarantees that SDM-CAR strictly contains classical CAR as a special case and allows direct empirical validation against established spatial models.


4. Implemented spectral filter families

All filters are implemented under a unified interface and support both VI and MCMC.

Filter family Spectrum $F(\lambda)$ Interpretation
Classic CAR $\tau^2 / (\lambda + \varepsilon_{\text{car}})$ Classical intrinsic CAR (fixed ridge)
Inverse-linear CAR $\tau^2 / (\lambda + \rho_0)$ Proper CAR with learnable ridge
Leroux CAR $\tau^2 / \big((1-\rho) + \rho \lambda\big)$ Convex blend of IID and CAR
Matérn-like $\tau^2 (\lambda + \rho_0)^{-\nu}$ Learnable smoothness exponent
Polynomial / Rational Low-order polynomial or rational functions of $\lambda$ Structured parametric flexibility
Multiscale bump mixture $\tau^2 \sum_{k=1}^{K} w_k \exp \left(a_k - \tfrac12 \left(\frac{\log(\lambda+\varepsilon_{\text{car}})-m_k}{s_k}\right)^2\right)$ Mixture of localized spectral bumps (multi-scale spatial structure)
Log-spline $\tau^2(\lambda+\rho_0)^{-1} \exp{s(\lambda)}$ Semi-nonparametric spectral correction

Where:

  • Multiscale bump mixture models the spectrum as a weighted sum of Gaussian bumps in log-frequency space, enabling the model to capture localized spectral energy and multi-scale spatial structure.
  • The bump centers $m_k$ are constrained to the valid log-frequency domain [ \log(\lambda + \varepsilon_{\text{car}}) \in [\log(\varepsilon_{\text{car}}), \log(\lambda_{\max} + \varepsilon_{\text{car}})] ] via a sigmoid transformation.
  • Mixture weights $w_k$ are obtained through a softmax transformation of unconstrained logits.
  • Width parameters $s_k$ are constrained to be positive via a softplus transform with a minimum width for numerical stability.
  • $s(\lambda)$ in Log-spline is a B-spline expansion over $[0, \lambda_{\max}]$.
  • Polynomial/Rational filters allow low-degree flexible approximations to unknown spectra.
  • Leroux provides a proper CAR with bounded spectrum.
  • Classic CAR fixes the ridge parameter to a known $\varepsilon_{\text{car}}$.

Unified design

Each filter family defines:

  • unconstrained parameterization,
  • positivity-preserving transforms,
  • KL divergence to priors (for VI),
  • block structure for MCMC proposals,
  • pack() / unpack() API for sampler compatibility,
  • compatibility with the benchmark registry system.

5. Inference methods

5.1 Collapsed Variational Inference

  • Spatial effect $\phi$ integrated out analytically
  • Exact conditional Gaussian posterior for $\beta$
  • Monte Carlo integration only over spectral hyperparameters
  • Efficient for large graphs and rapid experimentation

5.2 Collapsed Metropolis-within-Gibbs MCMC

  • Spatial effect analytically marginalized
  • Gibbs updates for regression coefficients
  • Random-walk MH updates for spectral hyperparameters
  • Blockwise proposals aligned with filter structure
  • Used as a gold standard for validation

Both inference methods operate on the same model and filter abstractions.


6. Repository structure

The repository is organized to cleanly separate modeling and inference logic from experimental configuration and execution.

sdm-car/
│
├── sdmcar/                     # Core research library (model + inference)
│   ├── graph.py                # Graph construction and Laplacian eigendecomposition
│   ├── filters.py              # Spectral filter families (VI + MCMC compatible)
│   ├── models.py               # Collapsed variational inference engine
│   ├── mcmc.py                 # Collapsed Metropolis-within-Gibbs sampler
│   ├── diagnostics.py          # Posterior diagnostics and visualization
│   └── utils.py                # Shared utilities (transforms, KLs, helpers)
│
├── examples/
│   ├── run_benchmark.py            # Universal benchmark runner (VI + MCMC comparison)
│   ├── run_misspec_demo.py         # Spectral misspecification experiments
│   ├── run_surface_block_missing.py# Block-missing surface reconstruction experiments
│   │
│   └── benchmarks/
│       ├── base.py                 # CaseSpec / FilterSpec abstractions
│       ├── registry.py             # Global filter registry
│       ├── matern.py               # Matérn-like SDM-CAR filter family
│       ├── invlinear_car.py        # Proper CAR (inverse-linear spectrum)
│       ├── classic_car.py          # Intrinsic CAR with fixed ridge ε
│       ├── leroux.py               # Leroux CAR (convex blend of IID and CAR)
│       ├── polyrational.py         # Polynomial / rational spectral filters
│       ├── logspline.py            # Log-spline semi-nonparametric spectral filter
│       └── __init__.py             # Auto-registration of benchmark modules
│
├── examples/figures/
│   ├── benchmarks/                 # VI vs MCMC spectrum recovery results
│   ├── misspec/                    # Misspecified-truth experiments
│   └── surface_block_missing/      # Block-missing imputation outputs├── examples/figures/benchmarks # Auto-generated figures and summaries
│
└── README.md

7. Design philosophy

7.1 sdmcar/: model- and inference-level code only

Everything under sdmcar/ is experiment-agnostic and mirrors the mathematical structure of the model.

  • graph.py Constructs spatial graphs, Laplacians, and eigendecompositions. This is the only place where spatial structure enters the model.

  • filters.py Defines spectral covariance families $F(\lambda;\theta)$. Filters expose a common interface used by both VI and MCMC and optionally define parameter blocks for joint MCMC proposals.

  • models.py Implements collapsed variational inference with exact marginalization of spatial effects and analytic posteriors for regression coefficients.

  • mcmc.py Implements collapsed Metropolis-within-Gibbs MCMC. The sampler is constructed directly from a fitted VI model, ensuring consistency between inference methods.

  • diagnostics.py Posterior diagnostics and visualization utilities.

Nothing in sdmcar/ is aware of specific experiments, benchmarks, ablations, or datasets.


7.2 examples/benchmarks/: declarative filter and case definitions

All filter families and experimental configurations are defined declaratively, without embedding inference logic.

  • base.py

    • FilterSpec: defines a filter family
    • CaseSpec: defines a specific experimental configuration (baseline, fixed parameters, ablations)
  • registry.py Maintains a global registry mapping filter names to FilterSpecs, enabling dynamic discovery from the command line.

  • matern.py, invlinear_car.py, leroux.py, polyrational.py, classic_car.py, logspline.py Each file defines a filter family, its valid cases, and registers itself automatically on import.

Adding a new filter family or ablation case does not require modifying inference code or experiment runners.


7.3 Experiment runners (execution layer)

All experiments are executed through filter-agnostic entry points under examples/.


7.3.1 run_benchmark.py: correctly specified CAR recovery

Run via:

python -m examples.run_benchmark --filter <name> --cases <ids>

This runner:

  1. builds a spatial graph and Laplacian,
  2. generates synthetic data under a CAR ground truth,
  3. runs collapsed variational inference,
  4. initializes and runs collapsed MCMC from the VI solution,
  5. produces diagnostics, plots, and summaries.

This setting evaluates:

  • parameter recovery,
  • spectrum recovery,
  • VI–MCMC agreement.

The runner is filter-agnostic and case-agnostic.


7.3.2 run_misspec_demo.py: spectral misspecification analysis

Run via:

python -m examples.run_misspec_demo --truth <type> --filters <names>

This experiment:

  1. constructs a graph Laplacian,
  2. generates data under a deliberately misspecified spectrum $F_{\text{true}}(\lambda)$,
  3. fits multiple filter families,
  4. compares log-spectrum RMSE and surface recovery error,
  5. benchmarks VI against MCMC under misspecification.

This evaluates:

  • robustness to model misspecification,
  • ability of flexible filters (e.g. log-spline, rational) to capture non-CAR structure,
  • stability of approximate inference.

7.3.3 run_surface_block_missing.py: structured missingness and imputation

Run via:

python -m examples.run_surface_block_missing --filter <name> --case <id>

This runner:

  1. generates deterministic latent surfaces (e.g. $f_1, f_2$),
  2. constructs a graph Laplacian over the full grid,
  3. removes structured blocks of observations,
  4. performs iterative VI-based reconstruction,
  5. optionally performs conditional Gaussian imputation,
  6. evaluates MSE and correlation on held-out regions.

This setting evaluates:

  • spatial extrapolation capability,
  • robustness to structured missing data,
  • sensitivity to graph construction and filter family,
  • performance across different missing block locations.

7.4 Architectural principle

The repository cleanly separates:

Layer Responsibility
sdmcar/ Mathematical model + inference
examples/benchmarks/ Declarative filter definitions
examples/run_*.py Experimental protocols
examples/figures/ Generated outputs

This separation ensures:

  • inference code is reusable across experiments,
  • filters are interchangeable,
  • experiments are reproducible,
  • extensions (new graphs, new filters, new protocols) do not require rewriting core logic.

8. Outputs and reproducibility

All experimental results are written automatically to structured directories under:

examples/figures/

The exact subdirectory depends on the experiment type:

examples/figures/
├── benchmarks/<filter>/<case>/
├── misspec/truth_<type>/<filter>/<case>/
└── surface_block_missing/

Benchmark experiments

examples/figures/benchmarks/<filter>/<case>/

Contain:

  • spectrum recovery plots (linear + log scale),
  • posterior mean surface comparisons,
  • VI–MCMC diagnostics,
  • parameter summaries.

Spectral misspecification experiments

examples/figures/misspec/truth_<type>/<filter>/<case>/

Contain:

  • true vs estimated spectra,
  • log-spectrum RMSE comparisons,
  • VI–MCMC performance summaries.

Block-missing surface experiments

examples/figures/surface_block_missing/

Contain:

  • latent surface visualizations,
  • observed surface with structured missing blocks,
  • imputed surfaces across outer iterations,
  • MSE and correlation summaries across missing regions,
  • metadata files recording experiment settings.

Reproducibility guarantees

This structure ensures:

  • no manual bookkeeping,
  • deterministic results given a fixed random seed,
  • clean separation between modeling code and experiment outputs,
  • direct VI–MCMC comparison for validation,
  • consistent evaluation across filters and experimental regimes.

All experiment scripts are filter-agnostic and registry-driven, allowing systematic comparison without modifying inference code.


9. Extensibility

New spectral filters can be added by:

  1. Implementing a filter class in sdmcar/filters.py,
  2. Defining experimental cases in examples/benchmarks/<name>.py,
  3. Registering the filter via FilterSpec.

No changes to inference code or the runner are required.


10. Intended use

This repository is intended for:

  • methodological research in spatial statistics and GMRFs,
  • development of structured covariance models on graphs,
  • reproducible comparison of CAR and CAR-generalized models.

It is not optimized as a production library. Instead, it is structured to demonstrate architectural decisions, inference tradeoffs, and robustness under misspecification—core concerns in research and advanced ML system design.


11. Citation

@misc{sdmcar2026,
  title  = {Spectral-Density-Modulated Conditional Autoregressive Models},
  author = {Pratik Dahal},
  year   = {2026},
  note   = {Contact: pd006@uark.edu, mapratikdahal@gmail.com}
}

About

Spectral CAR models with collapsed variational inference and MCMC on graphs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages