Documentation and tests for the `first` and `firstOrNull` functions #1547

Allex-Nik · 2025-11-06T13:38:23Z

Allex-Nik · 2025-11-06T14:11:54Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+        df.drop(df.nrow).firstOrNull { isHappy } shouldBe null
+    }
+
+    @Test


For now I haven't added a test on an empty dataframe since if we do emptyDf.groupBy { age }.first(), we do not get the group column. This is a known issue, Jolan fixed it, but it is not merged yet.

related to #1531

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

Allex-Nik · 2025-11-06T14:20:21Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+    }
+
+    @Test
+    fun `first on GroupBy with predicate`() {


Might be an issue here:

df.groupBy { isHappy }.first{ age > 10 }

works fine: ReducedGroupBy contains columns isHappy, name (firstName, lastName), age, city, weight, but

df.groupBy { isHappy }.first{ age > 100 }

returns ReducedGroupBy without the name column.

For example, this test passes, but it seems to me that it should not:

grouped.first { age > 100 }.values()[0] shouldBe dataFrameOf("isHappy", "age", "city", "weight")(true, null, null, null)[0].

That's why for now I haven't added a test for a predicate that doesn't match any row.

Same reason as #1531 I assume?

Maybe, but it is not completely clear to me why it is name that disappears, and, for example, for

val students = dataFrameOf( "name" to columnOf("Alice", "Bob"), "age" to columnOf(15, 20), "group" to columnOf(1, 2) ) students.groupBy { name }.count { name == "Charlie" }

we get the column count with 0s.
Or when we do students.groupBy { name }.first { age == 30 }, no column disappears.

students.groupBy { name }.count { name == "Charlie" }

is a shortcut for

students.groupBy { name }.aggregate { count { name == "Charlie" } into "count" }

count() is an aggregator; it operates on each individual group and aggregates all their values per column into single values, usually based on statistics; in this case a count. The result is a normal DataFrame again.

whereas first()/last()/minBy/maxBy are reducers, they work per row, producing a ReducedGroupBy, offering a couple different ways of turning their result back into a dataframe:

You can concat() the reduced rows, turn them into("") {} a columnGroup, or take the values {} of specific columns (which, as opposed to concat, produce nulls if no value is present) and then concatenate them (Notebooks show the .values() version of ReducedGroupBy if you display it)

I agree this distinction is quite arbitrary and unclear looking just at the function names. There's yet another set of functions that operate on the entire GroupBy, like sortBy {} and forEach {}, complicating things even further... I would vote for a GroupBy overhaul but we can only do that in 1.1 most likely: #686 (comment)

Allex-Nik · 2025-11-06T14:25:44Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+        )[0]
+    }
+
+    @Test


I faced an issue here:

ReducedPivot (i.e. pivot.first()) and ReducedPivotGroupBy are not rendered correctly in notebooks when there is null in any row resulting from first().

If I replace null in such a row with some value, it is displayed correctly.

how do they look when they are rendered incorrectly?

I meant this format.

Ah yes, another one of #1546 in the wild :) It's been fixed

Allex-Nik · 2025-11-06T14:30:14Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+        )[0]
+    }
+
+    @Test


I took a simpler dataframe here for readability. Otherwise it was more laborious for a reader to validate the result.

Allex-Nik · 2025-11-06T14:32:33Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

        ).shouldAllBeEqual()
    }
+
+    @Test


Is it fine to put these tests in the same class with FirstColumnsSelectionDsl? Or is it better to put them in a different class?
If it is better to put them in a different class, is it worth inheriting from TestBase to reuse df?

It's ok to put it here. It's common to have functions with same name in the same file, even if it's CS DSL and DF API

koperagen · 2025-11-06T18:38:13Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+/**
+ * Returns the first value in this [DataColumn].
+ *
+ * @param T The type of the values in the [DataColumn].


Line seems redundant to me because param T is always inferred from DataColumn

Yes, I'd omit (everywhere) a @param with type parameter, and add @return!

koperagen · 2025-11-06T18:40:45Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+/**
+ * Selects the first row from each group of the given [GroupBy]
+ * and returns a [ReducedGroupBy] containing these rows
+ * (one row per group, each row is the first row in its group).


or null if group is empty

I have faced an issue in this case that might be unexpected behavior.

I started with df.take(0).groupBy { age }.first() - it doesn't return null (returns ReducedGroupBy), which is probably fine because there are no groups at all.

But now I have tried

val grouped = df.groupBy { age } grouped.updateGroups { if (it == grouped.groups[0]) { it.take(0) } else it }.first()

to make the first group empty. And applying first in this case causes an exception:
The problem is found in one of the loaded libraries: check library renderers java.lang.IllegalStateException: Can not insert column `age` because column with this path already exists in DataFrame.

This problem does not occur if every group has at least one row, or if I remove the column age from every group. For example, this works:

grouped.updateGroups { val new = it.remove { age } if (it == grouped.groups[0]) { new.take(0) } else new }.first()

We get null for an empty group and the first row for others.

But is it expected behavior that we get such an error about conflicting columns? Or maybe I am just obtaining an empty group incorrectly.

The df in this case is:

val df = dataFrameOf( "name" to columnOf("Alice", "Bob", "Charlie"), "age" to columnOf(15, 20, 25), )

Or, to use a bit more natural example, we can make a fullJoin of df with val ages = dataFrameOf("age" to columnOf(30)), then group by age, and filter out the row with null value in the group for the key 30. And applying first in this case causes the same exception.

I am reporting this just in case it is a not known issue :)

Your first issue can also be reproduced in notebook by

df.groupBy { age }.updateGroups { it }.first()

I'm not entirely sure why..

Ah! df.groupBy { age }.updateGroups { it }.first().concat() works fine again. So the issue is in the renderer for ReducedGroupBy in notebooks

Can be reproduced outside notebooks with df.groupBy { age }.updateGroups { it }.first().values()

koperagen · 2025-11-06T18:41:28Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * employees.groupBy { jobTitle }.first()
+ * ```
+ *
+ * @param T The type of the values in the [GroupBy].


I think these lines are redundant because they are always infered

koperagen · 2025-11-06T18:46:01Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * ```kotlin
+ * // Select the first row for each city.
+ * // Returns a ReducedPivot with one column per city and the first row from the group in each column.
+ * df.pivot { city }.first()


Please see if you can come up with representative example. Like, in what situation you'd use this function? What df typically it will be and what ideas one can draw from the result? Will be good if example can convey this

koperagen · 2025-11-06T18:50:23Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * the structure remains unchanged — only the contents of each group
+ * are replaced with the first row from that group.
+ *
+ * Equivalent to `reduce { firstOrNull() }`.


reduce is internal function, people won't be able to use it like this

koperagen · 2025-11-06T18:54:56Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * Reduces this [Pivot] by selecting the first row from each group.
+ *
+ * Returns a [ReducedPivot] where:
+ * - each column corresponds to a [pivot] group — if multiple pivot keys were used,


I think text explanations of pivot make it more scary than it is. For first i suggest to not include common pivot logic and refer to pivot kdoc instead
Reference to website with HTML tables or ascii tables might do a better job conveying what's going on

you may also look at #1554; Andrei and I have been discussing a lot about how to explain Pivot from text. It may have some good example phrases you can reuse :)

AndreiKingsley

Great job!
Regarding overloads for GroupBy and Pivot - I'm working on a general KDoc system for these operations, so I'll be reworking them anyway in the future, so you can leave them as they are.

AndreiKingsley · 2025-11-07T14:27:40Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+/**
+ * Returns the first value in this [DataColumn].
+ *
+ * @param T The type of the values in the [DataColumn].


Yes, I'd omit (everywhere) a @param with type parameter, and add @return!

AndreiKingsley · 2025-11-07T14:27:53Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * Returns the first value in this [DataColumn].
+ *
+ * @param T The type of the values in the [DataColumn].
+ *


Please, add here and in all other places "See also" section with related operations. For example

See also [firstOrNull], [last], [take].

Jolanrensen · 2025-11-10T11:03:46Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ *
+ * ### Example
+ * ```kotlin
+ * // Select from the column "age" the first value where the age is greater than 17


"select" is confusing, as we also have the select operation. I'd say "get"

same applies below

Jolanrensen · 2025-11-10T11:04:23Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * ```
+ *
+ * @param T The type of the values in the [DataColumn].
+ * @param predicate A lambda expression used to select a value


*the first value

I see that it might look a bit ambiguous (especially because of "used"), but when I wrote this, my idea was that the logic of determining the first value is outside the scope of the predicate, and the predicate as a function just returns true if the input satisfies the condition. Do you think we still need to change it to "the first value"?

Maybe it's not that important, but I am clarifying because this part occurs in several places in this file and in last.kt as well :)

It may be important information to know that it will stop calling the predicate after the first true is is found.
That's not conveyed when you say it's "used to select a value" :)
So yes, you explained in the broad sense what a "predicate" is, but I think it will be more valuable if you describe more specifically what it's used for in this specific case.

(similarly when you have a function that takes age: Int and you start explaining what an Int is ;P)

I see, will fix it, thank you!

Jolanrensen · 2025-11-10T11:05:24Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * This predicate takes a value from the [DataColumn] as an input
+ * and returns `true` if the value satisfies the condition or `false` otherwise.
+ *
+ * @throws [NoSuchElementException] if the [DataColumn] contains no element matching the [predicate]


Helpful! We don't add these enough. Though I would add "@see firstOrNull" somewhere around here so people will know how to avoid this exception

holds for the other functions as well

Jolanrensen · 2025-11-10T11:06:51Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

 // region DataFrame

+/**
+ * Returns the first row in this [DataFrame].


you could link to DataRow from "row" :) may be helpful

Jolanrensen · 2025-11-10T11:36:13Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * ### Example
+ * ```kotlin
+ * // Select the first row for each city where the population is greater than 100 000.
+ * df.pivot { city }.first { population > 100000 }


100_000 is better readable ;P (and compiles!)

or 10e5 or something ;P

Jolanrensen · 2025-11-10T11:39:35Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * students.pivot { faculty }.groupBy { enrollmentYear }.first { age > 21 }
+ * ```
+ *
+ * @param predicate A lambda expression used to select a value


oh you can actually also link to RowFilter :) it has some useful kdoc too

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

Jolanrensen · 2025-11-10T12:41:43Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+    }
+
+    @Test
+    fun `first on GroupBy with predicate`() {


Same reason as #1531 I assume?

Jolanrensen · 2025-11-10T12:42:41Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+    @Test
+    fun `first on GroupBy with predicate`() {
+        val grouped = df.groupBy { isHappy }
+        val reducedGrouped = grouped.first{ it["age"] as Int > 17 && it["city"] != "Moscow" }


Don't forget to lint :) I recommend using the KtLint plugin. You can also call ktLintFormat from gradle

Jolanrensen · 2025-11-10T12:43:25Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+        )[0]
+    }
+
+    @Test


how do they look when they are rendered incorrectly?

zaleslaw · 2025-11-12T14:01:36Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * ```
+ *
+ * @param T The type of the [DataFrame].
+ * @param predicate A lambda expression used to select a value


I could only suggest to make it more readable yet with adding [] brackets for the parameter names @param [predicate] A lambda....

Allex-Nik · 2025-11-12T16:03:26Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * @throws [NoSuchElementException] if the [DataColumn] contains no element matching the [predicate]
+ * (including the case when the [DataColumn] is empty).
+ */
 public fun <T> DataColumn<T>.first(predicate: (T) -> Boolean): T = values.first(predicate)


Is there any reason we do not make this function inline? last in the same case is inline (the same goes for DataColumn<T>.firstOrNull(predicate: (T) -> Boolean))

Jolanrensen · 2025-11-13T13:43:54Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

 // region GroupBy

+/**
+ * Selects the first row from each group of the given [GroupBy]


(Now that I understand a bit more about GroupBy) maybe it makes sense to phrase it like:

"[Reduces][GroupByDocs.Reducing] the groups by taking the first from each..."

This points people a bit more in the right direction.

Jolanrensen · 2025-11-13T13:46:22Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

+ * ### Example
+ * ```kotlin
+ * // Select the first employee from each group formed by the job title
+ * employees.groupBy { jobTitle }.first()


A ReducedGroupBy is rarely the end. It may make sense to add concat(), into(), or values() after it

Jolanrensen · 2025-11-13T13:47:59Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

 public fun <T, G> GroupBy<T, G>.first(): ReducedGroupBy<T, G> = reduce { firstOrNull() }

+/**
+ * Selects from each group of the given [GroupBy] the first row satisfying the given [predicate],


I would also use "Reduces each group..." here

Allex-Nik added 2 commits November 6, 2025 14:21

Add documentation for the first and firstOrNull functions

87ca3b7

Add tests for the first and firstOrNull functions

f58f12c

Allex-Nik requested review from AndreiKingsley, Jolanrensen and zaleslaw November 6, 2025 13:38

Allex-Nik commented Nov 6, 2025

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt Show resolved Hide resolved

Allex-Nik commented Nov 6, 2025

View reviewed changes

koperagen reviewed Nov 6, 2025

View reviewed changes

AndreiKingsley approved these changes Nov 7, 2025

View reviewed changes

AndreiKingsley self-requested a review November 7, 2025 14:19

AndreiKingsley requested changes Nov 7, 2025

View reviewed changes

Jolanrensen requested changes Nov 10, 2025

View reviewed changes

zaleslaw reviewed Nov 12, 2025

View reviewed changes

Allex-Nik commented Nov 12, 2025

View reviewed changes

Allex-Nik mentioned this pull request Nov 12, 2025

Add documentation and tests for the last and lastOrNull functions #1561

Open

Jolanrensen reviewed Nov 13, 2025

View reviewed changes

+                      )[0]
+                  }
+                  @Test

Documentation and tests for the first and firstOrNull functions #1547

Are you sure you want to change the base?

Documentation and tests for the first and firstOrNull functions #1547

Uh oh!

Conversation

Allex-Nik commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Allex-Nik Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Allex-Nik Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jolanrensen Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koperagen Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Allex-Nik Nov 7, 2025 • edited by Jolanrensen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jolanrensen Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koperagen Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreiKingsley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Documentation and tests for the `first` and `firstOrNull` functions #1547

Documentation and tests for the `first` and `firstOrNull` functions #1547

Allex-Nik commented Nov 6, 2025 •

edited

Loading

Allex-Nik Nov 6, 2025 •

edited

Loading

Allex-Nik Nov 12, 2025 •

edited

Loading

Jolanrensen Nov 13, 2025 •

edited

Loading

koperagen Nov 6, 2025 •

edited

Loading

Allex-Nik Nov 7, 2025 •

edited by Jolanrensen

Loading

Jolanrensen Nov 10, 2025 •

edited

Loading

koperagen Nov 6, 2025 •

edited

Loading

Jolanrensen Nov 11, 2025 •

edited

Loading