Skip to content

SpatialData repr breaks when Dask auto-partitions Parquet points #1084

@enric-bazz

Description

@enric-bazz

Hi,
I'm encountering an issue when calling repr(sdata) which fails during the self-contained check for points elements backed by a Parquet file with the error:

AttributeError: 'list' object has no attribute 'values'

The failure originates in
_search_for_backing_files_recursively()
where the code assumes that each parquet-read task in the Dask graph has a dict in task.args[0] and therefore calls:

v.args[0].values()

However, when dask.dataframe.read_parquet() performs automatic partition aggregation, controlled by split_row_groups='infer' (default), task.args[0] can become a list of row-group dicts instead of a single dict.

Reproduce

To reproduce the error, first trigger Dask’s autopartitioning by forcing a DataFrame into a single partition, writing it to a Parquet file and reading it back with default read_parquet settings. Inspecting the graph reveals the list-of-dicts structure that breaks SpatialData. To trigger the original error, parse the DataFrame with PointsModel, build a SpatialData object, write it to a Zarr store (the error does not occur if unbacked), and finally call repr(sdata). The pseudocode is the following:

import dask.dataframe as dd
from spatialdata._core.points import PointsModel
from spatialdata._core.spatialdata import SpatialData

# 1. Create or load a Dask DataFrame
df = dd.from_pandas(some_pandas_df, npartitions=4)

# 2. Force a single partition
df_one_part = df.repartition(npartitions=1)

# 3. Write single-partition Parquet
df_one_part.to_parquet("example_points.parquet")

# 4. Read Parquet back
df_read = dd.read_parquet("example_points.parquet")

# 5. Inspect graph (optional)
print(df_read.dask)  # shows list-of-dicts if autopartitioning changed structure

# 6. Parse with PointsModel
points = PointsModel.parse(df_read)

# 7. Build SpatialData object
sdata = SpatialData(points=points)

# 8. Write to Zarr store (error triggers only when sdata.is_backed() == True)
sdata.write("example.zarr")

# 9. Trigger error with repr
print(repr(sdata))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions