Skip to content

Revised Compression Method#1

Closed
lafncow wants to merge 4 commits intoblag:masterfrom
lafncow:master
Closed

Revised Compression Method#1
lafncow wants to merge 4 commits intoblag:masterfrom
lafncow:master

Conversation

@lafncow
Copy link

@lafncow lafncow commented May 10, 2017

I made 2 changes to the "compress" method:

  1. it will return fewer than the target number of bytes if it is given a digest that is smaller than the target size already (instead of throwing an error)
  2. it spreads the modulo bytes around rather than dumping them all into the final byte

Why is this better?
The old method divided the bytes into the target number of segments and after even division, placed all remainder bytes into the final segment. This meant that the effect of the remainder bytes on overall entropy was confined to the final byte.
In the new method, the remainder bytes are selected throughout the input bytes and are distributed evenly among the target segments, allowing them to express more entropy. The compression per input byte is more even, since the biggest difference in the number of input bytes per output byte is 1.

For example:

compress_old([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_old([123,456,789,147,258,369,321],4)
# -> [123, 456, 789, 417] (only the last byte has changed)

compress_new([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_new([123,456,789,147,258,369,321],4)
# -> [435, 902, 115, 321] (all 4 bytes have changed)

See also pull request on original repo: zacharyvoase#1

Adam Cornille and others added 4 commits April 5, 2013 16:08
Instead of throwing an error or zero-padding, "compress" now returns the
input bytes if there are less than or equal to "target" number of them.
I think this is logical since the goal of compress is to reduce the
complexity of the digest before making it human consumable. In this case
the complexity is already low enough to proceed.
Excess bytes are now distributed amongst the compressed bytes, instead
of being dumped into the final bit as they were before.
@coveralls
Copy link

coveralls commented May 10, 2017

Coverage Status

Coverage decreased (-2.2%) to 97.826% when pulling e6dc5f7 on lafncow:master into 5bed9ac on blag:master.

@lafncow
Copy link
Author

lafncow commented May 10, 2017

I've made a mistake in my git workflow that is causing PyPI work to bleed into my pull request on the original repo. To correct this, I am going to close this PR, branch my fork, and then create a new PR here. Sorry for the confusion.

@lafncow lafncow closed this May 10, 2017
@lafncow lafncow mentioned this pull request May 10, 2017
@blag
Copy link
Owner

blag commented May 10, 2017

No worries! Do what ya gotta do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants