Skip to content

Computing the default denominator should adjust the dictionary range #109

@oyvinev

Description

@oyvinev

I'm still trying to get to terms about how D4 works, but I think I've found an issue. Please correct me if I've misunderstood something.

The default dict range for non-sparse data is [0,64). This determines which values are stored in the primary table. This makes sense for e.g. WGS data (as stated in the article). However, when the denominator is set to e.g. 10^2 all values are multiplied with this value, which basically means that (almost) no values are able to fit within 6 bits (between 0 and 63).

Wouldn't it make sense to adjust the default dict range if the default denominator > 1? In the case of denominator == 100, the adjusted range should probably be [0,8192).

With non-integer coverage values (typically averaged over multiple samples), this can be a quite confusing gotcha and really limit the great benefits (speed and storage) of the D4 format, when basically all values are stored as 32 bit integers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions