Skip to content

caret xgb na.action=na.pass does not work while using recipes #1381

@EmilHvitfeldt

Description

@EmilHvitfeldt

copied from: tidymodels/recipes#636

The same dataset with an NA value does not work with caret-recipes pipeline whereas the single caret does not cause any issues.

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

data(cars)
cars$Mileage[100] <- NA
## Without recipes 
train(Price ~., 
      trControl = trainControl(
        method = 'CV',
        number = 3 #Reduced the number of CV-folds. Otherwise we would get a bunch of warnings
      ),
      data = cars,
      tuneLength = 1,
      method = "xgbLinear",
      objective = "reg:squarederror",
      na.action = na.pass)
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> eXtreme Gradient Boosting 
#> 
#> 804 samples
#>  17 predictor
#> 
#> No pre-processing
#> Resampling: Cross-Validated (3 fold) 
#> Summary of sample sizes: 536, 536, 536 
#> Resampling results:
#> 
#>   RMSE      Rsquared   MAE     
#>   2442.675  0.9394766  1674.597
#> 
#> Tuning parameter 'nrounds' was held constant at a value of 50
#> Tuning
#>  'alpha' was held constant at a value of 0
#> Tuning parameter 'eta' was
#>  held constant at a value of 0.3

## With recipes
rec <- recipe(Price ~., data = cars)
train(rec,
      data = cars,
      trControl = trainControl(
        method = 'CV',
        number = 3
      ),
      tuneLength = 1,
      method = "xgbLinear",
      objective = "reg:squarederror",
      na.action = na.pass)
#> 
#> Attaching package: 'xgboost'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold1: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold2: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold3: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in train_rec(rec = x, dat = data, info = trainInfo, method = models, :
#> There were missing values in resampled performance measures.
#> Something is wrong; all the RMSE metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :1     NA's   :1     NA's   :1
#> Error: Stopping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions