pmlogcompress: increase default xz compression level from 0 to 3 #2413

orasagar · 2025-11-21T08:12:52Z

Raise the default xz compression level from 0 to 3 for improved compression efficiency. For smaller archives with xz -0 level
533M total --> 26M total in 16.4695 sec
For smaller archives with xz -3 level
533M total --> 13M total in 24.1473 sec

For larger archives with xz -0 level
15G total -->695M total in 815.4636 sec
For larger archives with xz -3 level
15G total -->428M total in 1016.3921 sec
xz_benchmark_small_archive.log
xz_benchmark_big_archive.log

Raise the default xz compression level from 0 to 3 for improved compression efficiency. For smaller archives with xz -0 level 533M total --> 26M total in 16.4695 sec For smaller archives with xz -3 level 533M total --> 13M total in 24.1473 sec For larger archives with xz -0 level 15G total -->695M total in 815.4636 sec For larger archives with xz -3 level 15G total -->428M total in 1016.3921 sec

christianhorn · 2025-11-21T08:28:16Z

Raise the default xz compression level from 0 to 3 for improved compression efficiency. [..]

While commit it is increasing compression ratio, "efficiency" would relate to the resources which are most important for the user.
Increasing compression level could also be seen as a waste of cpu cycles and energy, for not much storage savings - depending on how you value the resources.

Not saying it should not be done.. but it should be thought through well.

natoscott · 2025-11-25T04:20:02Z

For reference, commit fdfa25c is the original change to introduce "xz -0" as default. It seems to focus on comparing the -0 option to the -6 option, and -3 is not mentioned as a (possible ideal?) midpoint.

FWIW, I think we could make this change. Having smaller on-disk footprint at the cost of increase in CPU required to inflate (up to a point) can make sense from an archive replay latency POV (in addition to the more obvious space savings aspects).

christianhorn · 2025-11-25T10:34:01Z

Some quick testing, compressing 4.36GB of pmlogger archives:

method | compress    | ratio     | compress | uncompress
               | size               |              | time         | time
xz -0       | 368055576 |  8.47%  |  93s         | 26s
xz -1       | 56707348   |  1.30%   | 40s         | 11s
xz -2       | 51921808    | 1.19%   | 64s         | 8s
xz -3       | 46975632    | 1.10%   | 169s       | 7s
xz -4       | 49721836    | 1.14%   | 357s       | 10s
xz -5       | 44449408    | 1.02%   | 565s       | 9s
xz -6       | 41469712    | 0.95%   | 486s       | 8s
xz -7       | 39125352    | 0.90%   | 682s       | 9s
xz -8       | 37248892    | 0.85%   | 682s       | 8s
xz -9       | 36137088    | 0.83%   | 613s       | 8s

gzip -1     1246678262  28.57%  110s    27s
gzip -2     1265609529  29.01%  60s     28s
gzip -4     1193171742  27.35%  83s     27s
gzip -6     1039722752  23.83%  203s    27s
gzip -9     1018464795  23.34%  1227s   23s

I agree we should move off from "xz -0", it did surprisingly bad. I would see the sweet spot though with -1 or -2, not -3. Was not expecting gzip to do that bad on the test.

Also worth noting, xz has moved in 2013 from single-threaded to multi threaded by default. So the compression is by default then putting all cores of the system under load.

natoscott · 2025-11-25T20:42:29Z

Thanks @chorn - agreed, -2 looks best for xz. @orasagar can you confirm with your archives, and update the PR if so?

christianhorn · 2025-11-26T00:59:34Z

@myllynen wondered about zstd:

level         | filesize          | ratio      | compress  usertime   | uncompress
                 |                        |               | (wc = wallclock time)  | time
zstd -1     | 237199142   | 5.44%   | 4s (2s wc)                      | 3s
zstd -2     | 155115074   | 3.56%   | 4s (2s wc)                      | 2s
zstd -3     | 106141422   | 2.43%   | 5s (2s wc)                      | 2s
zstd -5     | 101441517   | 2.33%   | 8s (4s wc)                      | 3s
zstd -7     | 93525759     | 2.14%   | 16s (12s wc)                  | 2s
zstd -9     | 78489614     | 1.80%   | 24s (12s wc)                  | 2s
zstd -11    | 78182625    | 1.79%   | 43s (22s wc)                  | 2s
zstd -13    | 77520510    | 1.78%   | 132s (66s wc)                | 2s
zstd -15    | 77315212    | 1.77%   | 382s (192s wc)              | 2s
zstd -17    | 66314359    | 1.52%   | 313s (217s wc)              | 2s
zstd -19    | 62033538    | 1.42%  |  751s (378s)                    | 2s

zstd used 2 threads for compression, so "compress usertime" is the "time spent on cpus, summed up", comparable to my results from above where a single thread was enforced. "wc" is wall clock time, so the time the compression actually took - half of the "usertime" due to the 2 threads.

My interpretation: it's amazing how high compression rates we get with zstd on our data while spending just a few compute cycles. But then, spending even more compute cycles, the compression ratio is not improving as much as for xz. The "xz -2" is running 64s and reaching 1.19% ratio. With xzst, spending the same time we would accomplish ~1.78% ratio.

So I think "xz -2" is ok. If a customer does not want to spend many cpu cycles on compression and/or also wants to optimize for quicker extraction time, they might want to consider xstd (not sure how hard we make it to switch).

christianhorn · 2025-11-27T00:23:01Z

I did also try brotli. It's not impacting our conclusion here, I think, but sharing:

method | compress    | ratio   | compress | uncompress
               | size               |            | time           | time
brotli -0	864404557	19.81%	10s			10s
brotli -1	704265093	16.14%	13s			12s
brotli -2	83160772	1.91%	9s			3s
brotli -4	69778319	1.60%	20s			2s
brotli -5	58698136	1.35%	23s			2s
brotli -6	57786988	1.32%	36s			2s
brotli -7	57117606	1.31%	52s			2s
brotli -8	56712559	1.30%	95s			2s
brotli -9	56507445	1.30%	173s		2s

orasagar · 2025-12-01T09:21:19Z

@natoscott I do see -2 looks best in most of the cases.I have tested it out and looks good.Will update the code for it

natoscott requested a review from kmcdonell November 23, 2025 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pmlogcompress: increase default xz compression level from 0 to 3 #2413

pmlogcompress: increase default xz compression level from 0 to 3 #2413

Uh oh!

orasagar commented Nov 21, 2025

Uh oh!

christianhorn commented Nov 21, 2025 •

edited

Loading

Uh oh!

natoscott commented Nov 25, 2025

Uh oh!

christianhorn commented Nov 25, 2025

Uh oh!

natoscott commented Nov 25, 2025

Uh oh!

christianhorn commented Nov 26, 2025

Uh oh!

christianhorn commented Nov 27, 2025

Uh oh!

orasagar commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

pmlogcompress: increase default xz compression level from 0 to 3 #2413

Are you sure you want to change the base?

pmlogcompress: increase default xz compression level from 0 to 3 #2413

Uh oh!

Conversation

orasagar commented Nov 21, 2025

Uh oh!

christianhorn commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natoscott commented Nov 25, 2025

Uh oh!

christianhorn commented Nov 25, 2025

Uh oh!

natoscott commented Nov 25, 2025

Uh oh!

christianhorn commented Nov 26, 2025

Uh oh!

christianhorn commented Nov 27, 2025

Uh oh!

orasagar commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christianhorn commented Nov 21, 2025 •

edited

Loading