-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I'm still trying to get to terms about how D4 works, but I think I've found an issue. Please correct me if I've misunderstood something.
The default dict range for non-sparse data is [0,64). This determines which values are stored in the primary table. This makes sense for e.g. WGS data (as stated in the article). However, when the denominator is set to e.g. 10^2 all values are multiplied with this value, which basically means that (almost) no values are able to fit within 6 bits (between 0 and 63).
Wouldn't it make sense to adjust the default dict range if the default denominator > 1? In the case of denominator == 100, the adjusted range should probably be [0,8192).
With non-integer coverage values (typically averaged over multiple samples), this can be a quite confusing gotcha and really limit the great benefits (speed and storage) of the D4 format, when basically all values are stored as 32 bit integers.