-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
It is found (in apache/datafusion#17029) that get_sliced_memory_size
underestimated the memory size for StringViewArray
. The diff was quite large i.e. when correct batch size was 1392730
bytes, it returned only 163840
bytes.
This is because payload buffers are not taken account of although these payload buffers are simply cloned()
when we slice()
StringViewArray
See
arrow-rs/arrow-data/src/data.rs
Lines 464 to 476 in a620957
pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError> { | |
let mut result: usize = 0; | |
let layout = layout(&self.data_type); | |
for spec in layout.buffers.iter() { | |
match spec { | |
BufferSpec::FixedWidth { byte_width, .. } => { | |
let buffer_size = self.len.checked_mul(*byte_width).ok_or_else(|| { | |
ArrowError::ComputeError( | |
"Integer overflow computing buffer size".to_string(), | |
) | |
})?; | |
result += buffer_size; |
arrow-rs/arrow-data/src/data.rs
Lines 1743 to 1752 in a620957
pub fn new_view() -> Self { | |
Self { | |
buffers: vec![BufferSpec::FixedWidth { | |
byte_width: mem::size_of::<u128>(), | |
alignment: mem::align_of::<u128>(), | |
}], | |
can_contain_null_mask: true, | |
variadic: true, | |
} | |
} |
To Reproduce
Expected behavior
get_sliced_memory_size
should include the capacity for payload buffers.
If this fix isn’t ideal, we should at least document which buffers are being counted in the calculation.
Additional context
apache/datafusion#17315
here's related pr for temporal fix in datafusion