-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix incorrect memory accounting for sliced StringViewArray
#17315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, thank you.
I suggest to open an issue in arrow-rs
, and link the issue in the implementation comment.
// Therefore, we manually add the sum of the lengths used by all non inlined views | ||
// on top of the sliced size for views buffer. This matches the intended semantics of | ||
// "bytes needed if we materialized exactly this slice into fresh buffers". | ||
// Note: if multiple arrays share the same data buffers, we may double count each StringViewArray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move this 'Note' to function comment, this is important to know when using the function.
This double counting buffer issue seems not possible, inside the same StringViewArray the buffers should be distinct, sharing is only possible among different arrays. Maybe we should only keep the 'remeber to gc()' part in the note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not guaranteed that the buffers inside a StringViewArray are distinct :/
Or to be more precise: It does definitely occur that a StringViewArray holds multiple references to the same buffer due, because the concat kernel does not ensure that buffers are unique. One common case is when datafusion processes a join, and then coalesces the batches that are emitted from the join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, for concat kernel it does't necessarily do gc()
. It's really tricky to get StringView
used correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adjusting the comment location be sufficient in this PR, or do you think we should revise the memory calculation to account for potential buffer sharing within a single array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh we also need to revise the memory calculation. I'll bring up with the fix that considers shared buffers inside single array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, now we always gc()
on StringView columns before spilling, and it's the only use case for this function. So there is no need to account duplicated buffers inside this function. Adding a comment is enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh we also need to revise the memory calculation. I'll bring up with the fix that considers shared buffers inside single array.
I think get_record_batch_memory_size()
has already addressed such case 🤔 The only remaining issue is if there are buffer sharing among different batches, it overestimates, but this seems not to be a big issue for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would not be an issue for most cases, so I only updated the comment.
IIRC, now we always
gc()
on StringView columns before spilling
However it seems like we don't gc()
on StringView if it spills on multi level merge. Maybe we should find out whether we need to gc
for specific cases (i.e. when in-memory streams are involved) or update this calculation logic similar to get_record_batch_memory_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However it seems like we don't gc() on StringView if it spills on multi level merge.
I remember it caused spill file size to explode in some cases, so perhaps we should move the gc()
logic inside the spill utility, and this can ensure gc()
on string views before spilling.
227bd12
to
3f33d3e
Compare
Which issue does this PR close?
Rationale for this change
When we spill
RecordBatch
on external sorting, we callbatch.get_sliced_size
to reportmax_record_batch_size
in each spill file. Thereby we can safely read the batch back from spill file with correct memory estimation. However, it is found (in #17029) thatget_sliced_size
underestimated the memory size forStringViewArray
. The diff was quite large i.e. when correct batch size was 1392730 bytes, it returned 163840 bytes, which means the actual memory usage may far exceed the configured memory_limit, especially when merging spill file.This underestimation happens because arrow api
get_slice_memory_size
counts for onlyviews
buffer forStringViewArray
while ignoring the capacity fordata
(payload) buffer.See
https://github.com/apache/arrow-rs/blob/a620957bc98b7aa14faec10635bb798932f00bf9/arrow-data/src/data.rs#L464-L476
https://github.com/apache/arrow-rs/blob/a620957bc98b7aa14faec10635bb798932f00bf9/arrow-data/src/data.rs#L1743-L1752
What changes are included in this PR?
This PR corrects the memory estimation for
batch.get_sliced_size
forStringViewArray
by manually adding up the capacity for data buffers. Since we call this function afterStringViewArray.gc
, it is safe to add the buffer capacity directly.We may need to upstream this patch if needed.
Are these changes tested?
Yes.
Are there any user-facing changes?
Some external sort queries (including those that use
StringViewArray
) may fail more frequently under tight memory limits, because we fix the underestimation. However, since this reflects actual memory usage and we're still working on the fix, so I think it's okay to proceed.