Skip to content

Create index on index column for stat data files and use in fetch #103

@taldcroft

Description

@taldcroft

This code is really slow:

            import tables
            h5 = tables.openFile(os.path.join(*filename))
            table = h5.root.data
            times = (table.col('index') + 0.5) * dt  # <<< READ ENTIRE COLUMN
            row0, row1 = np.searchsorted(times, [tstart, tstop])
            table_rows = table[row0:row1]  # returns np.ndarray (structured array)
            h5.close()
            return (times[row0:row1], table_rows, row0, row1)

Instead create an index on index for each 5min and daily h5 file using h5.root.data.cols.index.createIndex(). This is a one-time operation (but also fix update_archive.py for the path where it creates a stat file fresh).

After this update then change the above to turn things around and compute index_start and index_stop based on tstart and tstop, then get the required rows with readWhere(...). This appears to reduce read times for short queries to less than 1 microsec, vs. 225 microsec now.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions