Skip to content

"Predicted covariance" is not always what the user thinks it is. Perhaps a better API? #400

@odunbar

Description

@odunbar

Issue

Currently the API for predictions can be done by predict

function predict(
emulator::Emulator{FT},
new_inputs::AM;
transform_to_real = false,
mlt_kwargs...,
) where {FT <: AbstractFloat, AM <: AbstractMatrix}

with transform_to_real=false by default, this is neatly how the MCMC can obtain both the posterior mean $$G(u)$$, and the estimated observational noise $$\Gamma(u)$$, for use in the MCMC objective (here a negative log likelihood):

$$ L(u, y) = (y-G(u))^\top \Gamma(u)^{-1} (y-G(u)) + \log\det(\Gamma(u)) $$

Thus y_pred, y_cov = predict(emulator, new_inputs) is not returning the predicted (GP mean and covariance $$C(u)$$) pair, but in fact the (GP mean, and the predicted observational covariance $$\Gamma(u)$$) pair.
These are typically related in the encoded space as $$\Gamma(u) = C(u) + I$$. But often users also call with transform_to_real=true to get a prediction, and here this harder to disaggregate the additional effect of our (linear encoder E), so $$C(u)$$ is only obtainable as y_cov - E*E' calling our data encoder.

This i think would result in users interpreting the obs covariances as the posterior covariances, and therefore will often see something overly broad in their interpretation.

Possible Solution

It would be nice to perhaps have a switch e.g. (obs_or_post_cov="obs"...) or use different API predict_with_noise and predict for example. and to add documentation (and/or doc strings) to clearly distinguish these cases

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions