-
Notifications
You must be signed in to change notification settings - Fork 631
Description
Hi,
I was testing preProcess imputations and noticed that when I apply the method to a data, it changes the original values even if those were not missing in "knnImpute" method, but the behaviour for "badImpute" is as expected, that is, it leaves known values as those are and imputes missing ones. Is this a bug? Does this affect downstream tasks when knnImpute is used? Thank you
Example code:
df = data.frame(matrix(rnorm(600, 0,1), nrow = 150, ncol = 4))
df[,1][sample(1:150, 30)] <- NA
df[,2][sample(1:150, 30)] <- NA
pp1 = preProcess(df, method = "bagImpute")
pp2 = preProcess(df, method = "knnImpute")
df1 = predict(pp1, df)
df2 = predict(pp2, df)
round(cbind("original" = df$X1, "bagImpute" = df1$X1, "knnImpute" = df2$X1),4)
original bagImpute knnImpute
[1,] 0.5374 0.5374 __ 0.5743 __ => why knnImpute changes the values which were not missing to start with?..
[2,] 0.5118 0.5118 0.5449
[3,] -0.2418 -0.2418 -0.3195
[4,] -0.6785 -0.6785 -0.8204
[5,] 0.8837 0.8837 0.9716
[6,] 1.3987 1.3987 1.5622
[7,] -0.1245 -0.1245 -0.1849
[8,] 0.8306 0.8306 0.9106
[9,] -0.9723 -0.9723 -1.1575
[10,] NA 0.1627 -0.1757
[11,] 0.1003 0.1003 0.0729
[12,] -0.2509 -0.2509 -0.3300
[13,] 0.8210 0.8210 0.8996
[14,] -2.4520 -2.4520 -2.8547
[15,] NA 0.0507 -0.1487
[16,] NA -0.3666 -0.0224