Skip to content

Add renameAll {} DSL #1168

Open
Open
@Jolanrensen

Description

@Jolanrensen

Renaming multiple columns at once is a hassle,
especially when you want to use column accessors or the compiler plugin.

The options currently are:

df.rename { all() }.into {
    when (it.name) {
        "arter" -> "species"
        "ø" -> "island"
        "næblængde_mm" -> "bill_length_mm"
        "næbdybde_mm" -> "bill_depth_mm"
        "luffelængde_mm" -> "flipper_length_mm"
        "kropsmasse_g" -> "body_mass_g"
        "køn" -> "sex"
        "måledato" -> "measurement_date"
        else -> error("for ${it.name}")
    }
}

This is unsafe and might break, plus the compiler plugin cannot interpret it.

df.rename(
    "arter" to "species",
    "ø" to "island",
    "næblængde_mm" to "bill_length_mm",
    "næbdybde_mm" to "bill_depth_mm",
    "luffelængde_mm" to "flipper_length_mm",
    "kropsmasse_g" to "body_mass_g",
    "køn" to "sex",
    "måledato" to "measurement_date",
)

While the compiler plugin can interpret this code, it can still contain typos, so I'd count it as unsafe too.

df.rename { all() }.into(
    "species",
    "island",
    "bill_length_mm",
    "bill_depth_mm",
    "flipper_length_mm",
    "body_mass_g",
    "sex",
    "measurement_date"
)

Order dependent, which is seldom a good thing

df
    .rename { arter }.into("species")
    .rename { ø }.into("island")
    .rename { næblængde_mm }.into("bill_length_mm")
    .rename { næbdybde_mm }.into("bill_depth_mm")
    .rename { luffelængde_mm }.into("flipper_length_mm")
    .rename { kropsmasse_g }.into("body_mass_g")
    .rename { køn }.into("sex")
    .rename { måledato }.into("measurement_date")

The safest solution, but a hassle to type out every time, and not very readable

df.select {
    cols(
        arter into "species",
        ø into "island",
        næblængde_mm into "bill_length_m",
        næbdybde_mm into "bill_depth_mm",
        luffelængde_mm into "flipper_lenth_mm",
        kropsmasse_g into "body_mass_g",
        køn into "sex",
        måledato into "measurement_date",
    )
}

Funnily enough, the most readable solution doesn't even use the rename operation ;P that should be a good indication we need improvement.

I'd suggest:

df.renameAll {
    arter into "species"
    ø into "island"
    næblængde_mm into "bill_length_m"
    næbdybde_mm into "bill_depth_mm"
    luffelængde_mm into "flipper_lenth_mm"
    kropsmasse_g into "body_mass_g"
    køn into "sex"
    måledato into "measurement_date"
}

We cannot reuse rename {} because it doesn't return a DataFrame.
We would need to carefully shadow into from the ColumnsSelectionDsl, but otherwise, we should be fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions