Skip to content

preProcess with knnImpute returns skewed non-missing values #1405

@dianashams

Description

@dianashams

Hi,
I was testing preProcess imputations and noticed that when I apply the method to a data, it changes the original values even if those were not missing in "knnImpute" method, but the behaviour for "badImpute" is as expected, that is, it leaves known values as those are and imputes missing ones. Is this a bug? Does this affect downstream tasks when knnImpute is used? Thank you

Example code:
df = data.frame(matrix(rnorm(600, 0,1), nrow = 150, ncol = 4))
df[,1][sample(1:150, 30)] <- NA
df[,2][sample(1:150, 30)] <- NA
pp1 = preProcess(df, method = "bagImpute")
pp2 = preProcess(df, method = "knnImpute")
df1 = predict(pp1, df)
df2 = predict(pp2, df)
round(cbind("original" = df$X1, "bagImpute" = df1$X1, "knnImpute" = df2$X1),4)

 original bagImpute knnImpute

[1,] 0.5374 0.5374 __ 0.5743 __ => why knnImpute changes the values which were not missing to start with?..
[2,] 0.5118 0.5118 0.5449
[3,] -0.2418 -0.2418 -0.3195
[4,] -0.6785 -0.6785 -0.8204
[5,] 0.8837 0.8837 0.9716
[6,] 1.3987 1.3987 1.5622
[7,] -0.1245 -0.1245 -0.1849
[8,] 0.8306 0.8306 0.9106
[9,] -0.9723 -0.9723 -1.1575
[10,] NA 0.1627 -0.1757
[11,] 0.1003 0.1003 0.0729
[12,] -0.2509 -0.2509 -0.3300
[13,] 0.8210 0.8210 0.8996
[14,] -2.4520 -2.4520 -2.8547
[15,] NA 0.0507 -0.1487
[16,] NA -0.3666 -0.0224

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions