-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
This issue was observed in #17029, where the memory size of a RecordBatch after reading from spill (via Arrow IPC) is significantly larger than the size recorded before spilling.
While some increase is expected due to additional metadata or encoding during IPC write, in many cases the difference is much larger than expected. We should investigate where this memory growth comes from and try to minimize the discrepancy as much as possible since we rely on the maximum memory size recorded at the time of spilling to determine how many spilled files can be read back at once.
To Reproduce
Run cargo test -p datafusion memory_limit::test_stringview_external_sort -- --exact --nocapture
in above related PR.
Expected behavior
No response
Additional context
One cause of it was incorrect memory accounting for StringViewArray
. However, even after that fix (#17315) , validation still fails.