The StackingEnsemble class is designed to build multi-layered stacking and blending models, providing a robust framework for ensemble learning, particularly suited for regression tasks. This class allows users to implement two distinct ensemble strategies: stacking (with K-fold out-of-fold predictions) and blending (with a hold-out validation set). It also includes extensive input validation and error handling to guide the user in case of incorrect inputs or issues during fitting and predicting.
-
layers:list of lists- A list of lists, where each inner list contains models (i.e., estimators) for a particular layer in the ensemble.
- Each model should be a scikit-learn compatible model, meaning it must implement the
fit()andpredict()methods. - Example:
[[Model1, Model2], [Model3, Model4]]would define two layers, with two models in each layer. - Note: The order of layers matters. Models in later layers will use predictions from models in earlier layers as input features.
-
meta_model:estimator- A single scikit-learn compatible model that combines the predictions from the final layer into a final prediction.
- This model typically performs regression or classification on the predictions from the previous layer's models (depending on the task).
- Example: A
LinearRegression()orRandomForestRegressor()might serve as a good meta-model for regression tasks.
-
n_folds:int, default=5- Specifies the number of folds for K-fold cross-validation, which is used for generating out-of-fold predictions during the stacking process.
- The default is 5, but users can choose any value greater than or equal to 2.
- Note: Only used when
blending=False.
-
blending:bool, default=False- If
True, the model uses a hold-out validation set for blending instead of K-fold cross-validation. - In blending mode, a portion of the training data is reserved as a hold-out set (specified by
blend_size) and used for training the base models, while predictions for the final meta-model are made on this hold-out set. - Default: False (indicating stacking mode).
- If
-
blend_size:float, default=0.2- Specifies the proportion of the training data to hold out for blending (i.e., used as a validation set in blending mode).
- The value must be between 0 and 1, where a value of 0.2 means 20% of the data is used as the hold-out set.
- Required: Only used if
blending=True.
-
random_state:int, default=None- A seed value for controlling the randomness in splitting the dataset (for cross-validation in stacking or train/hold-out split in blending).
- Default: None (which means the random state is not fixed).
layer_models_:list- A list that stores the fitted models for each layer after the
fit()method is called. - This includes the base models from each layer and their predictions used as inputs for the subsequent layers.
- A list that stores the fitted models for each layer after the
This method initializes the ensemble class and validates input parameters.
layers,meta_model,n_folds,blending,blend_size,random_state(See Parameters section above for detailed descriptions.)
ValueError: Iflayersis not a non-empty list of non-empty lists, or if any model inlayersdoesn't havefitorpredictmethods.ValueError: Ifmeta_modeldoesn't havefitorpredictmethods.ValueError: Ifn_foldsis less than 2 or not an integer.ValueError: Ifblend_sizeis not between 0 and 1 whenblending=True.
Fits the stacking ensemble model to the provided training data (X, y). This method processes each layer of models and trains them accordingly using either stacking (K-fold CV) or blending (hold-out set).
X:pandas.DataFrameornumpy.ndarray- The feature matrix containing training data.
y:pandas.Series,numpy.ndarray, orlist- The target vector containing the labels or outputs for each sample.
self:object- The fitted
StackingEnsembleobject.
- The fitted
TypeError: IfXis not a pandas DataFrame or numpy array, or ifyis not a pandas Series, numpy array, or list.ValueError: If the number of samples inXandydoes not match.RuntimeError: If an error occurs during the fitting process, such as failure to split data correctly, errors in model training, or predictions.
- Input Validation: Ensures
Xandyare of correct types and dimensions. - Layer-wise Training:
- For each layer, the method either generates out-of-fold predictions using K-fold cross-validation (stacking) or trains models on the training set and generates predictions on a hold-out set (blending).
- Final Model Training: The meta-model is trained using the predictions from the final layer as input.
Uses the fitted ensemble model to make predictions on new data (X).
X:pandas.DataFrameornumpy.ndarray- The feature matrix for which predictions are needed.
y_pred:numpy.ndarray- The predicted values based on the ensemble model.
TypeError: IfXis not a pandas DataFrame or numpy array.RuntimeError: If an error occurs during prediction (e.g., failure in model predictions).
- Layer-wise Prediction: For each layer, predictions are made using the models from that layer.
- Meta-Model Prediction: The final predictions are obtained by passing the predictions from the last layer through the meta-model.
Prints the entire structure of the ensemble model, including each layer, the models within each layer, and the meta-model, in a detailed tree format.
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
# Define models for the ensemble layers
layer_1_models = [LinearRegression(), RandomForestRegressor(n_estimators=50)]
layer_2_models = [SVR(kernel='rbf', C=1.0, epsilon=0.1)]
# Meta model
meta_model = LinearRegression()
# Create an instance of the StackingEnsemble
ensemble = StackingEnsemble(layers=[layer_1_models, layer_2_models], meta_model=meta_model)
# Example training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit the ensemble
ensemble.fit(X_train, y_train)
# Make predictions
y_pred = ensemble.predict(X_test)
# Print the ensemble structure
ensemble.print_structure()Example Output:
Stacking Model Structure:
Meta Model: LinearRegression
- Parameters:
{'fit_intercept': True, 'normalize': False}
Layer 1:
- Model 1:
LinearRegression- Parameters:
{'fit_intercept': True, 'normalize': False}
- Parameters:
- Model 2:
RandomForestRegressor- Parameters:
{'n_estimators': 50}
- Parameters:
Layer 2:
- Model 1:
SVR- Parameters:
{'kernel': 'rbf', 'C': 1.0, 'epsilon': 0.1}
- Parameters:
Blending Enabled: False
Returns only the parameters that were explicitly changed by the user for a given model.
-
Parameters:
model: The model whose parameters you want to check.
-
Returns:
- A dictionary of changed parameters or
"No changes (using defaults)"if no changes were made.
- A dictionary of changed parameters or
model:sklearnmodel- The model to inspect for changed parameters.
dictorstr: A dictionary of changed parameters (key-value pairs), or a string indicating that no changes were made (i.e., using the default parameters).
{
'n_estimators': 50
}