Skip to content

Conversation

@bekadavis9
Copy link
Contributor

The current implementation of FragmentConsolidation always sets up a very large buffer space (~ 10GB) and performs consolidation within that workspace. This work aims to reduce the memory footprint and latency of fragment consolidation through use of a ProducerConsumerQueue for concurrent reads/writes. At present, the buffers are (somewhat arbitrarily) sized at 10MB, and the queue is capped at size 10. As such, there are quantitatively more allocations, but the overall size (and runtime) is drastically reduced, as small operations need not construct / destruct the full workspace.

The following test saw an improvement in runtime of 16.500s -> 0.130s:
./test/tiledb_unit --durations yes --vfs=native "C++ API: Test consolidation that respects the current domain"


TYPE: IMPROVEMENT
DESC: Use a concurrent buffer deque in FragmentConsolidation.


Resolves CORE-411.

@bekadavis9 bekadavis9 force-pushed the rd/core-411 branch 2 times, most recently from e70da9b to 47ea0fc Compare December 5, 2025 18:35
@bekadavis9 bekadavis9 requested a review from rroelke December 5, 2025 19:19
Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add new tests for some of the edge cases of the new implementation before we can merge this.

@bekadavis9 bekadavis9 force-pushed the rd/core-411 branch 3 times, most recently from 77c355d to 2a28600 Compare December 10, 2025 18:15
Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still more to do. Aside from the comments -

Performance

As mentioned in the sync we do need to observe that this is not worse for representative customer data. You've clearly seen that it is much faster for the unit tests, which is excellent; but that may not be reflected in production.

I would like to see some kind of repeatable program we can run so that we can re-use this benchmark later on, to compare the result of this to the new consolidation we will implement later.

Add a way to configure the initial buffer size so that we can play with it a bit to see how it affects performance.

Initial buffer size

We had discussed prototyping with a fixed number like 10M. I didn't catch that in this review. But this probably is not what we want to use anyway. The consolidator knows the average var cell size, and can compute the fixed part size of the cells. This should be used to inform the initial buffer size in some fashion. For example, if 10M only fits 1 average cell, then it is probably not a good choice for buffer size.

Testing

Testing is mostly using the existing tests, and then we'll see the performance testing from above. The new tests intend to force the reader to wait for memory to become available. We need tests (new or existing) which in some way assert that this is actually happening. See the review comments about waiting - I have some doubt about its correctness. If we merge #5725 that will help, but that's not the only thing you can do.

REQUIRE(rc == TILEDB_OK);
REQUIRE(err == nullptr);
tiledb_array_consolidate(ctx_, array_name, cfg);
tiledb_config_free(&cfg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must check for correctness after consolidation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, what I meant by this was checking the fragment metadata and running a read query to verify that consolidation happened, i.e. there is just one fragment, and that the contents of the cells are correct.

@bekadavis9 bekadavis9 requested a review from a team as a code owner January 21, 2026 22:22
// Allow use of deprecated param `config_.buffer_size_`
// Allow the buffer to grow 3 times
uint64_t initial_buffer_size = config_.buffer_size_ != 0 ?
buffer_budget :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time analyzing this and it's hard to know if it is correct. First you have to recognize that FragmentConsolidationWorkspace also uses this config parameter to override its constructor argument; then you have to follow through whether the memory accounting uses the expected or actual buffer size; and so on. This deprecated parameter makes things messy. Plus, the line immediately following this might mess it up.

Even if it is deprecated we do have to acknowledge that someone out there might be depending on it, so my current line of thinking is that copy_array should probably just do the old thing whenever this parameter is nonzero.

uint64_t initial_buffer_size = config_.buffer_size_ != 0 ?
buffer_budget :
config_.initial_buffer_size_;
initial_buffer_size = std::min(initial_buffer_size, buffer_budget / 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to use the new "what level was this config set at?" facility that Agis added here. If initial_buffer_size is the configuration default, then the user did not set it, and we have some freedom to adjust it down, and/or determine a size using properties of the array schema as I had suggested. But if they did specify it in some way then we should use that (just taking the min with the budget). This is useful as an override for performance testing, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants