Skip to content

Memory should not blow up after Arrow IPC write-read round trip during spilling #17340

@ding-young

Description

@ding-young

Describe the bug

This issue was observed in #17029, where the memory size of a RecordBatch after reading from spill (via Arrow IPC) is significantly larger than the size recorded before spilling.

While some increase is expected due to additional metadata or encoding during IPC write, in many cases the difference is much larger than expected. We should investigate where this memory growth comes from and try to minimize the discrepancy as much as possible since we rely on the maximum memory size recorded at the time of spilling to determine how many spilled files can be read back at once.

To Reproduce

Run cargo test -p datafusion memory_limit::test_stringview_external_sort -- --exact --nocapture in above related PR.

Expected behavior

No response

Additional context

One cause of it was incorrect memory accounting for StringViewArray. However, even after that fix (#17315) , validation still fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions