Conversation
- core.py: fix TPOT metrics/model mismatch by fitting the full pipeline once and computing metrics from the saved pipeline - evo_learn/config.py: filter unknown YAML keys in from_yaml to prevent TypeError on unexpected config entries - evo_learn/logging_utils.py: handle non-float metric values in log_experiment_end with graceful format fallback - evo_learn/cross_validation.py: fix KeyError from conditional roc_auc key by intersecting metric keys across all folds - evo_learn/explainability.py: fix feature name mismatch in plot_summary/plot_feature_importance and wrong data source in explain_prediction by storing transformed test data and names Co-authored-by: TurboRx <187360786+TurboRx@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes 5 production bugs across the AutoML pipeline that caused crashes, silent errors, and metric/model mismatches. The fixes improve correctness and reliability of core functionality including TPOT model training, configuration loading, cross-validation aggregation, logging formatting, and SHAP explainability.
Changes:
- Fixed TPOT pipeline to ensure metrics and saved model describe the same fitted object (previously metrics came from raw-data fit, model from preprocessed-data refit)
- Added defensive filtering in YAML config loading to handle unknown keys gracefully instead of crashing
- Changed CV metric aggregation to use set intersection, preventing KeyError when metrics like roc_auc are conditionally computed
- Added try-except fallback for non-float metric values in logging to prevent TypeError crashes
- Fixed SHAP feature name extraction to use transformed feature names that match SHAP values dimensions
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| core.py | Unified TPOT training pipeline - create pipeline first, fit once, compute metrics and save the same model. Added defensive try-except for predict_proba with shape validation. |
| evo_learn/config.py | Added dataclass field filtering when loading YAML configs to ignore unknown keys with warnings instead of raising TypeError. |
| evo_learn/cross_validation.py | Use set intersection across folds to aggregate only metrics present in all folds, fixing KeyError from conditionally-computed metrics like roc_auc. |
| evo_learn/logging_utils.py | Added try-except fallback when formatting metric values, preventing TypeError when metrics are strings or other non-float types. |
| evo_learn/explainability.py | Store transformed test data and feature names to ensure SHAP plots use correct feature names matching transformed dimensions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
…plain_model loop to sampled SHAP rows - explain_prediction: switch from label-based indexing (X[sample_idx]) to iloc for DataFrames so non-sequential indices from .sample() don't cause KeyError - explain_model: iterate over range(min(3, n_shap_rows)) instead of range(min(3, len(X_test))) so indices are always valid within the (potentially sampled) _X_test_transformed Co-authored-by: TurboRx <187360786+TurboRx@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
… coloring shap.summary_plot requires the actual feature data matrix (features=) to color points in dot/violin plots by feature value. Previously only feature_names was passed, which broke feature-value coloring. Now passes features=self._X_test_transformed alongside feature_names. Co-authored-by: TurboRx <187360786+TurboRx@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
…ected_value In explain_prediction, when the explainer returns a list of SHAP values (multi-class), expected_value is also array-like (one per class). Passing the full array as base_values caused shape mismatches and incorrect waterfall plots. Now resolves base_values to a scalar by selecting the element matching the chosen class_idx (1 for binary/positive class, clamped to the array length), with a fallback to the first element when class_idx is unavailable or out of range. Co-authored-by: TurboRx <187360786+TurboRx@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core.py: TPOT metrics/model mismatchevo_learn/config.py:from_yamlcrashes on unknown YAML keysevo_learn/logging_utils.py:log_experiment_endcrashes with non-float metric valuesevo_learn/cross_validation.py: KeyError whenroc_aucmissing from some foldsevo_learn/explainability.py(initial): feature name mismatch and wrong data source in SHAP plotsevo_learn/explainability.py(follow-up): useilocfor positional indexing inexplain_prediction; boundexplain_modelloop to sampled SHAP row countevo_learn/explainability.py(follow-up 2): passfeatures=self._X_test_transformedtoshap.summary_plotevo_learn/explainability.py(follow-up 3): resolvebase_valuesto a scalar inexplain_prediction— multi-class explainers return array-likeexpected_value; select the element matchingclass_idxto prevent shape mismatches in waterfall plots💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.