-
-
Notifications
You must be signed in to change notification settings - Fork 254
pmlogcompress: increase default xz compression level from 0 to 3 #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Raise the default xz compression level from 0 to 3 for improved compression efficiency. For smaller archives with xz -0 level 533M total --> 26M total in 16.4695 sec For smaller archives with xz -3 level 533M total --> 13M total in 24.1473 sec For larger archives with xz -0 level 15G total -->695M total in 815.4636 sec For larger archives with xz -3 level 15G total -->428M total in 1016.3921 sec
While commit it is increasing compression ratio, "efficiency" would relate to the resources which are most important for the user. Not saying it should not be done.. but it should be thought through well. |
|
For reference, commit fdfa25c is the original change to introduce "xz -0" as default. It seems to focus on comparing the -0 option to the -6 option, and -3 is not mentioned as a (possible ideal?) midpoint. FWIW, I think we could make this change. Having smaller on-disk footprint at the cost of increase in CPU required to inflate (up to a point) can make sense from an archive replay latency POV (in addition to the more obvious space savings aspects). |
|
Some quick testing, compressing 4.36GB of pmlogger archives: I agree we should move off from "xz -0", it did surprisingly bad. I would see the sweet spot though with -1 or -2, not -3. Was not expecting gzip to do that bad on the test. Also worth noting, xz has moved in 2013 from single-threaded to multi threaded by default. So the compression is by default then putting all cores of the system under load. |
|
@myllynen wondered about zstd: zstd used 2 threads for compression, so "compress usertime" is the "time spent on cpus, summed up", comparable to my results from above where a single thread was enforced. "wc" is wall clock time, so the time the compression actually took - half of the "usertime" due to the 2 threads. My interpretation: it's amazing how high compression rates we get with zstd on our data while spending just a few compute cycles. But then, spending even more compute cycles, the compression ratio is not improving as much as for xz. The "xz -2" is running 64s and reaching 1.19% ratio. With xzst, spending the same time we would accomplish ~1.78% ratio. So I think "xz -2" is ok. If a customer does not want to spend many cpu cycles on compression and/or also wants to optimize for quicker extraction time, they might want to consider xstd (not sure how hard we make it to switch). |
|
I did also try brotli. It's not impacting our conclusion here, I think, but sharing: |
|
@natoscott I do see -2 looks best in most of the cases.I have tested it out and looks good.Will update the code for it |
Raise the default xz compression level from 0 to 3 for improved compression efficiency. For smaller archives with xz -0 level
533M total --> 26M total in 16.4695 sec
For smaller archives with xz -3 level
533M total --> 13M total in 24.1473 sec
For larger archives with xz -0 level
15G total -->695M total in 815.4636 sec
For larger archives with xz -3 level
15G total -->428M total in 1016.3921 sec
xz_benchmark_small_archive.log
xz_benchmark_big_archive.log