-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
Description
For stupidly large files we're still seeing slow b-tree reads.
On inspection, this occurs, in part, because we do a lot of very tiny reads which we can avoid by reading entire b-tree nodes in one go.
The offending piece of code is this:
def _read_node(self, offset, node_level):
""" Return a single node in the b-tree located at a give offset. """
node = self._read_node_header(offset, node_level)
keys = []
addresses = []
for _ in range(node['entries_used']):
chunk_size, filter_mask = struct.unpack('<II', self.fh.read(8))
fmt = '<' + 'Q' * self.dims
fmt_size = struct.calcsize(fmt)
chunk_offset = struct.unpack(fmt, self.fh.read(fmt_size))
chunk_address = struct.unpack('<Q', self.fh.read(8))[0]There is an obvious optimisation for V1 btrees where we know the length of the entries.