Skip to content

b-tree read performance still not good enough #153

@bnlawrence

Description

@bnlawrence

For stupidly large files we're still seeing slow b-tree reads.

On inspection, this occurs, in part, because we do a lot of very tiny reads which we can avoid by reading entire b-tree nodes in one go.

The offending piece of code is this:

 def _read_node(self, offset, node_level):
        """ Return a single node in the b-tree located at a give offset. """
        node = self._read_node_header(offset, node_level)
        keys = []
        addresses = []
        for _ in range(node['entries_used']):
            chunk_size, filter_mask = struct.unpack('<II', self.fh.read(8))
            fmt = '<' + 'Q' * self.dims
            fmt_size = struct.calcsize(fmt)
            chunk_offset = struct.unpack(fmt, self.fh.read(fmt_size))
            chunk_address = struct.unpack('<Q', self.fh.read(8))[0]

There is an obvious optimisation for V1 btrees where we know the length of the entries.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions