Skip to content

[Bug]: ICEBERG_CANNOT_OPEN_SPLIT #11579

@julien-alpaca

Description

@julien-alpaca

What happened

Looks like nessie is referencing files in GCS that have been deleted either by GC or a concurrent query on the same table. This happen during backfill and as of now, the only way to fix this is to recreate the table.

[2025-11-06, 21:14:33] INFO - [base] �[0m21:14:33    Database Error in model daily_equity_deposits_margin (models/data_science/daily_equity_deposits_margin.sql): source="airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"

[2025-11-06, 21:14:33] INFO - [base]   TrinoExternalError(type=EXTERNAL, name=ICEBERG_CANNOT_OPEN_SPLIT, message="Error opening Iceberg split gs://xxx/public/fact_account_kpis_d252b9ee-5f55-4e12-b60b-13a8ba8841ef/data/_is_lpca=0/day=2025-10-26/20251105_235839_08692_yzwxz-383988d3-bd9b-4c8e-aee8-542d97cb511e.parquet (offset=4, length=125686177): File gs://xxx/public/fact_account_kpis_d252b9ee-5f55-4e12-b60b-13a8ba8841ef/data/_is_lpca=0/day=2025-10-26/20251105_235839_08692_yzwxz-383988d3-bd9b-4c8e-aee8-542d97cb511e.parquet not found", query_id=20251106_210829_77657_yzwxz): source="airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"

How to reproduce it

Not sure how to reproduce.

Nessie server type (docker/uber-jar/built from source) and version

ghcr.io/projectnessie/nessie:0.104.2

Client type (Ex: UI/Spark/pynessie ...) and version

trinodb/trino:476

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions