Description
Renaming multiple columns at once is a hassle,
especially when you want to use column accessors or the compiler plugin.
The options currently are:
df.rename { all() }.into {
when (it.name) {
"arter" -> "species"
"ø" -> "island"
"næblængde_mm" -> "bill_length_mm"
"næbdybde_mm" -> "bill_depth_mm"
"luffelængde_mm" -> "flipper_length_mm"
"kropsmasse_g" -> "body_mass_g"
"køn" -> "sex"
"måledato" -> "measurement_date"
else -> error("for ${it.name}")
}
}
This is unsafe and might break, plus the compiler plugin cannot interpret it.
df.rename(
"arter" to "species",
"ø" to "island",
"næblængde_mm" to "bill_length_mm",
"næbdybde_mm" to "bill_depth_mm",
"luffelængde_mm" to "flipper_length_mm",
"kropsmasse_g" to "body_mass_g",
"køn" to "sex",
"måledato" to "measurement_date",
)
While the compiler plugin can interpret this code, it can still contain typos, so I'd count it as unsafe too.
df.rename { all() }.into(
"species",
"island",
"bill_length_mm",
"bill_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"measurement_date"
)
Order dependent, which is seldom a good thing
df
.rename { arter }.into("species")
.rename { ø }.into("island")
.rename { næblængde_mm }.into("bill_length_mm")
.rename { næbdybde_mm }.into("bill_depth_mm")
.rename { luffelængde_mm }.into("flipper_length_mm")
.rename { kropsmasse_g }.into("body_mass_g")
.rename { køn }.into("sex")
.rename { måledato }.into("measurement_date")
The safest solution, but a hassle to type out every time, and not very readable
df.select {
cols(
arter into "species",
ø into "island",
næblængde_mm into "bill_length_m",
næbdybde_mm into "bill_depth_mm",
luffelængde_mm into "flipper_lenth_mm",
kropsmasse_g into "body_mass_g",
køn into "sex",
måledato into "measurement_date",
)
}
Funnily enough, the most readable solution doesn't even use the rename
operation ;P that should be a good indication we need improvement.
I'd suggest:
df.renameAll {
arter into "species"
ø into "island"
næblængde_mm into "bill_length_m"
næbdybde_mm into "bill_depth_mm"
luffelængde_mm into "flipper_lenth_mm"
kropsmasse_g into "body_mass_g"
køn into "sex"
måledato into "measurement_date"
}
We cannot reuse rename {}
because it doesn't return a DataFrame
.
We would need to carefully shadow into
from the ColumnsSelectionDsl, but otherwise, we should be fine.