Skip to content

Fix sigmoid calibration for Pipeline-wrapped estimators#1247

Open
huntermills707 wants to merge 1 commit intoonnx:mainfrom
huntermills707:main
Open

Fix sigmoid calibration for Pipeline-wrapped estimators#1247
huntermills707 wants to merge 1 commit intoonnx:mainfrom
huntermills707:main

Conversation

@huntermills707
Copy link
Copy Markdown

Fix sigmoid calibration for Pipeline-wrapped estimators

Fixes #1246

Problem

When CalibratedClassifierCV(method='sigmoid') wraps a Pipeline, the ONNX export produces incorrect probabilities. The raw_scores option is set on the Pipeline by the calibrated classifier converter, but the Pipeline converter never propagates it to the inner classifier. The sigmoid calibration expit(-(a*x + b)) is then applied to probability outputs instead of raw decision function scores.

Fix

Propagate raw_scores from the Pipeline to its classifier steps in convert_pipeline (skl2onnx/operator_converters/pipelines.py). The pattern for reading and forwarding raw_scores to sub-estimators was mimicked from skl2onnx/operator_converters/bagging.py (lines 30–32).

Changes

  • skl2onnx/operator_converters/pipelines.py: Read the Pipeline's raw_scores option and forward it to classifier steps that support it.
  • tests/test_sklearn_calibrated_classifier_cv_converter.py: Add test_model_calibrated_classifier_cv_sigmoid_pipeline covering both ensemble=True and ensemble=False.

Verification

Reproducer output after fix:

[OK] bare classifier                 max_diff=0.000000  broken=0/500
[OK] Pipeline(identity+clf)          max_diff=0.000000  broken=0/500

No regressions in existing pipeline or calibrated classifier tests.

…teps

This change ensures that if the `raw_scores` option is set for a
Pipeline,
it is correctly passed down to its constituent classifiers or
sub-pipelines.
This is particularly important for models like `CalibratedClassifierCV`
when they wrap a Pipeline, as they might rely on raw scores for
calibration.

- Updated `skl2onnx/operator_converters/pipelines.py` to check for and
  propagate the `raw_scores` option.
- Added a regression test in
`tests/test_sklearn_calibrated_classifier_cv_converter.py`
  verifying `CalibratedClassifierCV` functionality when using a Pipeline
  as the base estimator.

Signed-off-by: Hunter Mills <hunter.mills@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CalibratedClassifierCV sigmoid calibration incorrect when base estimator is a Pipeline

1 participant