Skip to content

Conversation

@XanthosXanthopoulos
Copy link
Collaborator

@XanthosXanthopoulos XanthosXanthopoulos commented Jan 15, 2026

Issue and/or context: SOMA 796

Changes:
This PR introduces the following changes:

  • Enable read buffer resize in case the read query returns zero results due to insufficient buffer size. Adds a retry limit for up to 3 times except for incomplete queries which return data.
  • Refactor different memory allocation scheme selection to make ArrayBuffers generic.
  • Extend the ColumnBuffer public API to report maximum capacity for reads

@codecov
Copy link

codecov bot commented Jan 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.36%. Comparing base (b521dae) to head (743b0b5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4375      +/-   ##
==========================================
- Coverage   86.36%   86.36%   -0.01%     
==========================================
  Files         139      139              
  Lines       21093    21093              
  Branches       15       15              
==========================================
- Hits        18218    18216       -2     
- Misses       2875     2877       +2     
Flag Coverage Δ
python 89.02% <ø> (-0.03%) ⬇️
r 84.99% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.02% <ø> (-0.03%) ⬇️
libtiledbsoma 77.24% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XanthosXanthopoulos XanthosXanthopoulos marked this pull request as ready for review January 19, 2026 02:11
@jp-dark jp-dark changed the title [WIP][c++] Implement buffer resize and resubmission for read queries [c++] Implement buffer resize and resubmission for read queries Jan 20, 2026
@jp-dark jp-dark requested a review from alancleary January 20, 2026 17:28
Copy link
Collaborator

@alancleary alancleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first SOMA review so it's pretty limited in scope.

Overall everything looks good; just a couple minor change requests and some questions about the buffer's memory budget.

*/
static bool use_memory_pool(const std::shared_ptr<tiledb::Array>& array);

void expand_buffers();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring comment explaining what the public method does.

// Map: column name -> ColumnBuffer
std::unordered_map<std::string, std::shared_ptr<ColumnBuffer>> buffers_;

std::unique_ptr<ColumnBufferAllocationStrategy> strategy_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining this member for consistency

void ArrayBuffers::expand_buffers() {
for (const auto& [name, buffer] : buffers_) {
buffer->resize(
buffer->max_size() * DEFAULT_BUFFER_EXPANSION_FACTOR,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to check if a memory budget is being exceeded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call will exceed the buffer by default. The buffers initially allocated use all available budget by default so if the read didn't manage to return any results then it means that the memory budget was to small to begin with.

/**
* @brief Resize the internal buffers to the given size.
*/
void resize(uint64_t size, uint64_t num_cells, bool preserve_data = false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters should be const when not modifying to facilitate compiler optimizations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add @param descriptions to the docstring

virtual ~ColumnBufferAllocationStrategy() = default;

virtual std::pair<size_t, size_t> get_buffer_sizes(
std::variant<tiledb::Attribute, tiledb::Dimension> column) const = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column should be const since none of the implementations seem to modify it

soma_array->close();
}

TEST_CASE("SOMAArray: Test resize") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to constrain the buffer's memory budget? If so, this code should test that resizing beyond the budget throws an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants