Skip to content

get_slice_memory_size underestimates memory for StringViewArray #8230

@ding-young

Description

@ding-young

Describe the bug

It is found (in apache/datafusion#17029) that get_sliced_memory_size underestimated the memory size for StringViewArray. The diff was quite large i.e. when correct batch size was 1392730 bytes, it returned only 163840 bytes.

This is because payload buffers are not taken account of although these payload buffers are simply cloned() when we slice() StringViewArray

See

pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError> {
let mut result: usize = 0;
let layout = layout(&self.data_type);
for spec in layout.buffers.iter() {
match spec {
BufferSpec::FixedWidth { byte_width, .. } => {
let buffer_size = self.len.checked_mul(*byte_width).ok_or_else(|| {
ArrowError::ComputeError(
"Integer overflow computing buffer size".to_string(),
)
})?;
result += buffer_size;

pub fn new_view() -> Self {
Self {
buffers: vec![BufferSpec::FixedWidth {
byte_width: mem::size_of::<u128>(),
alignment: mem::align_of::<u128>(),
}],
can_contain_null_mask: true,
variadic: true,
}
}

To Reproduce

Expected behavior

get_sliced_memory_size should include the capacity for payload buffers.
If this fix isn’t ideal, we should at least document which buffers are being counted in the calculation.

Additional context

apache/datafusion#17315
here's related pr for temporal fix in datafusion

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions