feat: support nested STRUCT and ARRAY data display in anywidget mode by shuoweil · Pull Request #2359 · googleapis/python-bigquery-dataframes

shuoweil · 2025-12-29T17:59:37Z

Implements flattening and expansion for complex data types in the interactive display for anywidget mode.

Key Features:

Automatic Flattening: STRUCT columns are flattened into intuitive dot-notation columns (e.g., user.name).
Array Expansion: ARRAY columns are expanded into multiple rows with visual grouping.
Visual Continuity: Continuation rows for arrays are styled for better parent-row context.

verified at:

vs code notebook: screen/3ST4m9xN9w3iqD9
colab notebook: screen/7NG4LiTEPuAC27F

Fixes #<438181139> 🦕

…bles

review-notebook-app · 2025-12-29T17:59:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast · 2026-01-05T22:54:33Z

bigframes/display/_flatten.py

+
+def flatten_nested_data(
+    dataframe: pd.DataFrame,
+) -> tuple[pd.DataFrame, dict[str, list[int]], list[str], set[str]]:


Tuple is hard to understand. Can we use a frozen dataclass, instead?

tswast · 2026-01-06T22:32:01Z

bigframes/display/_flatten.py

+            )
+
+            new_cols_to_add[new_col_name] = pd.Series(
+                new_list_array.to_pylist(),


to_pylist() can be quite expensive to call. If we already have a pyarrow array, I don't think it's necessary to convert it.

Done. I've removed the .to_pylist() calls and now pass the Arrow arrays directly to pandas for better performance.

bigframes/display/_flatten.py

tswast · 2026-01-06T22:38:23Z

bigframes/display/_flatten.py

+
+            new_cols_to_add[new_col_name] = pd.Series(
+                new_list_array.to_pylist(),
+                dtype=pd.ArrowDtype(pa.list_(field.type)),


I'm confused. Why are we creating a list type here? Could you explain in comments what the purpose is? I thought we were flattening based on the function name.

Good point. I've added a comment to clarify that the function is transforming an array<struct<...>> into separate array columns.

tswast · 2026-01-06T22:40:38Z

bigframes/display/_flatten.py

+    for orig_idx in dataframe.index:
+        non_array_data = non_array_df.loc[orig_idx].to_dict()
+        array_values = {}
+        max_len_in_row = 0
+        non_na_array_found = False
+
+        for col_name in array_columns:
+            val = dataframe.loc[orig_idx, col_name]


This is looping through each value in Python, which is going to be very slow. Please use native code such as https://arrow.apache.org/docs/python/generated/pyarrow.compute.list_flatten.html to avoid such loops.

Thanks for the suggestion. I've refactored the array explosion logic to use a much faster vectorized approach with pandas.explode and merge, which removes the Python loops entirely.

tswast · 2026-01-06T22:41:18Z

bigframes/display/_flatten.py

+            continue
+
+        # Create one row per array element, up to max_len_in_row
+        for array_idx in range(max_len_in_row):


This is looping through each element of each array in Python, which is going to be even slower.

I have completely refactored _explode_array_columns to use a vectorized approach with pandas.explode and merge. This eliminated all Python loops, including the slow inner loop you pointed out, significantly improving performance.

tswast · 2026-01-13T14:36:09Z

bigframes/display/_flatten.py

+                return "struct"
+            if pa.types.is_list(pa_type):
+                return (
+                    "array_of_struct"
+                    if pa.types.is_struct(pa_type.value_type)
+                    else "array"
+                )
+        return "clear"


These magic strings worry me. Could you create an enum for category, instead?

https://docs.python.org/3/library/enum.html

Done. I've replaced the strings with a private _ColumnCategory Enum.

tswast · 2026-01-13T14:37:04Z

bigframes/display/_flatten.py

+        continuation_rows: A set of row indices that are continuation rows.
+        cleared_on_continuation: A list of column names that should be cleared on continuation rows.


It's not 100% clear to me what is meant by "continuation". I assume that it means rows post-flattening that correspond to the second element of an array and beyond? Please expand these docstrings further.

You are right. I've updated the docstrings in FlattenResult to explicitly clarify that "continuation rows" refer to the 2nd element onwards of an exploded array, and "cleared" columns are those (typically scalars) that are replicated but shouldn't be visually repeated.

tswast · 2026-01-13T14:40:04Z

bigframes/display/_flatten.py

+    """The result of flattening a DataFrame.
+
+    Attributes:
+        dataframe: The flattened DataFrame.


Please add some comments about what happens to the original index columns. Based on the description of the other fields, I assume that a unique index is created post-flatten?

I've updated the docstrings and the implementation. The original index (including named Index and MultiIndex) is preserved and duplicated across the exploded rows. This serves as the visual grouping key for the table display.

tswast · 2026-01-13T14:40:53Z

bigframes/display/_flatten.py

+
+
+@dataclasses.dataclass(frozen=True)
+class ColumnClassification:


Please put a leading _ in front of class names that aren't intended to be used outside of this module.

tswast · 2026-01-13T14:43:19Z

bigframes/display/html.py

+    continuation_rows: set[int] | None,
+    clear_on_continuation: list[str],


Same here, add some more explanation to the docstrings. To keep it shorter, you could reference bigframes/display/_flatten.py so that folks can look there for the complete explanation.

Done. I updated the docstrings to reference bigframes.display._flatten.FlattenResult for the detailed definitions.

tswast · 2026-01-13T14:44:08Z

bigframes/display/table_widget.js

Neat feature!

tswast · 2026-01-13T14:46:16Z

bigframes/display/_flatten.py

Please create a test_flatten.py file with a few tests that check some of the flattening logic directly without the HTML rendering part. Specifically, let's focus on what happens to index/multiindex columns, as that's my main worry / question.

Done. I created tests/unit/display/test_flatten.py. I moved the logic-specific tests there and added dedicated test cases (test_flatten_preserves_original_index, test_flatten_preserves_multiindex) to verify that indices are correctly preserved and duplicated during the flattening process.

TrevorBergeron · 2026-01-23T02:09:05Z

bigframes/display/_flatten.py

+
+    classification = _classify_columns(result_df)
+
+    # Process ARRAY-of-STRUCT columns into multiple ARRAY columns (one per struct field).


why do we need special logic for array of struct? why can we not achieve through just aplying array logic and then struct logic? Also, might we want to just keep on recursively unpacking stuff until there is not more array/struct left?

You are correct that we could achieve this by applying array logic (explode) first and then struct logic, but that would require a second pass (loop) because the explosion would produce new STRUT columns that need flattening.

The current approach (Transpose Array -> Flatten Structs -> Explode Arrays) allows us to:

keep the pipeline linear: we resolve the nesting structure in a single pass without needing recursion or re-classification loops.

Optimize performance: we flatten the struct fields column-wise before expanding the row count via explosion.

For recursion, I agree that a recursive visitor is the correct long-term solution for arbitrary nesting depths (e.g., ARRAY<STRUCT>). For this PR, I aimed to support the common BQ ARRAY pattern within the current architecture, but we should definitely refactor to full recursion if we need to support depper/arbitrary nesting.

TrevorBergeron · 2026-01-23T02:21:50Z

bigframes/display/_flatten.py

+        continuation_rows: A set of row indices in the flattened DataFrame that are
+            "continuation rows". These are additional rows created to display the
+            2nd to Nth elements of an array. The first row (index i-1) contains
+            the 1st element, while these rows contain subsequent elements.
+        cleared_on_continuation: A list of column names that should be "cleared"
+            (displayed as empty) on continuation rows. Typically, these are
+            scalar columns (non-array) that were replicated during the explosion
+            process but should only be visually displayed once per original row group.


Might need to individually mark continuation rows rather than take the intersection of a row set and column set

Thanks for the suggestion. Currently, we enforce synchronous explosion (all arrays align), so the "continuation" status effectively applies to the whole row. When we support independent array explosions, we will definitely need to track.

shuoweil added 4 commits December 29, 2025 16:36

refactor(display): use CSS classes in HTML tables

f20cde5

refactor(display): use CSS classes in HTML tables

19e2c4f

feat(display): support nested STRUCT and ARRAY data in interactive ta…

4b68243

…bles

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

8a7609a

shuoweil self-assigned this Dec 29, 2025

shuoweil requested review from a team as code owners December 29, 2025 17:59

shuoweil requested a review from tswast December 29, 2025 17:59

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 29, 2025

shuoweil added 2 commits December 29, 2025 18:09

chore: remove unreached code

ceca74d

refactor: code refactor

63e4a3c

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Dec 29, 2025

shuoweil added 4 commits December 29, 2025 22:06

refactor: resue pandas struct.explode()

3affd92

refactor: revert the refactor

c53da80

Merge branch 'main' into shuowei-anywidget-ui-improve

fa37000

test: merge notebook

60785f3

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from f583833 to 60785f3 Compare January 2, 2026 21:28

tswast reviewed Jan 5, 2026

View reviewed changes

shuoweil added 3 commits January 6, 2026 00:32

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

0a88b10

feat: use dataclass for flatten_nested_data

f32a53f

feat: Refactor HTML rendering and document JS tests

3944249

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from 2bb97d3 to 3944249 Compare January 6, 2026 03:40

shuoweil requested a review from tswast January 6, 2026 03:44

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

ce59668

tswast requested changes Jan 6, 2026

View reviewed changes

Fix: Improve performance of nested data flattening

41df7b3

shuoweil requested a review from tswast January 7, 2026 00:47

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

a34802e

shuoweil marked this pull request as ready for review January 9, 2026 21:45

shuoweil added 2 commits January 12, 2026 22:11

Merge main to shuowei-anywidget-nested-strcut-array

7763818

test: rerun notebook to verify the merge

27ae231

tswast reviewed Jan 13, 2026

View reviewed changes

shuoweil added 4 commits January 13, 2026 19:27

Merge commit '798af4a30' into shuowei-anywidget-nested-strcut-array

f74f82a

refactor: replace magic strings for col categories with a private Enum

03eba5e

refactor: replace magic strings for col categories with a private Enum

eea0a87

test: rerun notebook

ca19957

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from 8eb7211 to ca19957 Compare January 13, 2026 19:59

shuoweil added 2 commits January 13, 2026 20:02

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

4e9eaa4

docs: rerun test

cb7ae87

shuoweil requested a review from tswast January 13, 2026 20:06

test: update year

fb2d029

shuoweil requested review from GarrettWu, TrevorBergeron, chelsea-lin, jialuoo, sycai and tswast and removed request for tswast January 20, 2026 19:37

TrevorBergeron reviewed Jan 23, 2026

View reviewed changes

shuoweil removed request for GarrettWu, chelsea-lin, jialuoo and sycai January 23, 2026 18:52

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

d2710c2

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from c405008 to d2710c2 Compare January 23, 2026 23:07

shuoweil requested a review from TrevorBergeron January 23, 2026 23:20

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

83b042d

		continuation_rows: A set of row indices that are continuation rows.
		cleared_on_continuation: A list of column names that should be cleared on continuation rows.



		@dataclasses.dataclass(frozen=True)
		class ColumnClassification:

		continuation_rows: set[int] \| None,
		clear_on_continuation: list[str],


		classification = _classify_columns(result_df)

		# Process ARRAY-of-STRUCT columns into multiple ARRAY columns (one per struct field).

Conversation

shuoweil commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuoweil Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shuoweil commented Dec 29, 2025 •

edited

Loading

shuoweil Jan 13, 2026 •

edited

Loading