Skip to content

[BUG] isotonic-regression calibration applies a proportion-space model to raw std values (domain mismatch) #516

@smcolby

Description

@smcolby

CommitteeRegressor.calibrate_uncertainty(method="isotonic-regression") fits an isotonic regression model in proportion space (inputs and outputs both in [0, 1]) but then applies it directly to raw standard deviation values, which have entirely different semantics and range. The calibrated std returned by _predict is therefore meaningless when this method is used.

Note also that calibrate_uncertainty defaults to method="isotonic-regression", so any caller relying on the default is affected.

Root cause

In openadmet/models/active_learning/committee.py:

Step 1 — fitting (_isotonic_regression_calibration):

y_exp_props, y_obs_props = uct.metrics_calibration.get_proportion_lists_vectorized(
    y_pred_mean[:, i], y_pred_std[:, i], y[:, i]
)
iso_model = uct.recalibration.iso_recal(y_exp_props, y_obs_props).predict

get_proportion_lists_vectorized returns expected and observed coverage proportions — both arrays live in [0, 1]. iso_recal fits an IsotonicRegression mapping exp_props → obs_props, also in [0, 1].

Step 2 — applying (_get_calibration_function):

return lambda x: np.stack(
    [self._calibration_model["isotonic-regression"][i](x[:, i])
     for i in range(x.shape[-1])],
    axis=1,
)

x here is the raw ensemble std array (e.g. values 0.2–1.5 for a pEC50 model). The proportion-space isotonic model is applied directly to these values. The output is a proportion-like number with no uncertainty semantics, which is then returned as the "calibrated std" and used to drive acquisition.

The correct uncertainty_toolbox approach

uncertainty_toolbox provides get_interval_recalibrator for the isotonic path — it keeps everything in proportion space and produces calibrated intervals, not calibrated std. There is no uct utility that maps isotonic recalibration back to the std domain, because the natural std-space recalibrator is a scalar multiplier — exactly what "scaling-factor" (get_std_recalibrator) does.

Proposed fix

Option A (minimal): Remove "isotonic-regression" from _calibration_methods and keep only "scaling-factor", which is already correct (optimize_recalibration_ratioscale_factor * std).

Option B (full fix): Replace the isotonic calibration with an interval-based implementation that stores (y_pred_mean, iso_model) and produces calibrated prediction intervals at query time, rather than a calibrated std. This is a more significant API change since _predict currently returns (mean, std).

Current workaround

Use method="scaling-factor" explicitly:

committee.calibrate_uncertainty(X_cal, y_cal, method="scaling-factor")

This is mathematically correct and equivalent to uncertainty_toolbox's own get_std_recalibrator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions