[BLOG] Compression Improves Everything #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

xdk-amz wants to merge 1 commit into valkey-io:main from xdk-amz:main

xdk-amz commented Jul 1, 2025

Signed-off-by: Dante Knowles xdk@amazon.com

Description

Adds a blog detailing the benefits of compression for caching workloads using non-binary data.

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.


          [BLOG] Compression Improves Everything

8512d1e

Signed-off-by: Dante Knowles <xdk@amazon.com>

xdk-amz requested review from madolson and stockholmux as code owners

July 1, 2025 23:17

stockholmux requested changes

View reviewed changes

Member

stockholmux left a comment

So, I started reviewing this blog post but stopped. I did read the rest of the post but given comments, it didn't feel like a good use of my time.

I feel like something is missing here. I don't quite understand the Valkey relevance. The post seems to generally advocate for compression but veers off into mobile connections and web page compression and never really ties into Valkey.

If you want to move forward with a compression blog post, please re-write and make it relevant to Valkey.

content/blog/2025-07-01-compression-improves-everything/index.md

+              featured_image = "/assets/media/featured/random-03.webp"
+              +++
+              Rising data volumes include rising costs, and in this post we’ll explore how compression reduces your data size to “improve everything”, reducing costs and boosting performance with virtually no tradeoffs. Valkey’s cost and performance as a cache depends on several factors: network load, storage capacity, and retrieval time. As data volumes continue to grow exponentially across industries, the ability to efficiently store and retrieve information becomes increasingly critical to application performance and user experience. In this blog post, we’ll show how you can potentially reduce storage costs by 4x without compromising on client performance.

Member

stockholmux Jul 3, 2025

"As data volumes continue to grow exponentially across industries, the ability to efficiently store and retrieve information becomes increasingly critical to application performance and user experience." <- drop this sentence. I don't think it adds anything

content/blog/2025-07-01-compression-improves-everything/index.md

+. **JSON**
+. **HTML**
+. **XML**
+. **Javascript**

Member

stockholmux Jul 3, 2025

WTF is javascript doing in this list? It's not a "non-binary data format". I don't doubt that JS compresses well, but so all source code.

Author

xdk-amz Jul 7, 2025 •

edited

Loading

Considering javascript is often assetized and distributed out to end-users in a variety of applications, I think it is fair to call it data as a transportable asset. Most source code is not distributed out to end-users in that way, but I understand that it is not literally a 'data format'.

content/blog/2025-07-01-compression-improves-everything/index.md

+. **XML**
+. **Javascript**
+              Assuming your data is in one of the above formats or a similar format, there are many algorithms available to effectively compress that data.

Member

stockholmux Jul 3, 2025

This feels super restrictive and wrong. Far more than JSON, HTML, and XML compress well.

Author

xdk-amz Jul 3, 2025

or a similar format

Author

xdk-amz Jul 3, 2025

can add plain-text as well to make it more clear what a 'similar format' scopes though

content/blog/2025-07-01-compression-improves-everything/index.md

+              Let’s examine the performance of [Zstd](https://github.com/facebook/zstd), an open-source compression library released by Meta (dual-licensed under BSD/GPL-2.0). Zstd offers several advantages:
+. **High default compression ratio**: 2.896
+. **Effective multi-core scaling**: Can take advantage of unused CPU cores to accelerate the compression workload

Member

stockholmux Jul 3, 2025

you have a source for this? The other two are indicated on the website, I can't find much that precisely says what you're stating.

content/blog/2025-07-01-compression-improves-everything/index.md

		3. Dictionary training: Allows even small data to compress effectively by learning on repetition in the training set


		Zstd’s [benchmarks](https://github.com/facebook/zstd?tab=readme-ov-file#benchmarks) report compression speeds of 510 MB/s and decompression speeds of 1550 MB/s on a Core i7-9700K CPU @ 4.9GHz at the default 2.896 compression ratio.

Member

stockholmux Jul 3, 2025

mentioning the default compression ratio here is repetitive.

content/blog/2025-07-01-compression-improves-everything/index.md




		1. [LZ4](https://github.com/lz4/lz4): Prioritizes speed over compression ratio, making it ideal for latency-sensitive applications. It achieves compression ratios around 2.100, 675 MB/s compression speed, and 3850 MB/s decompression speed.

Member

stockholmux Jul 3, 2025

Linked source doesn't have these numbers.

Author

xdk-amz Jul 3, 2025

Thanks, I had originally authored this based on the benchmarks in the zstd repo and added the links to the other algorithms at a later date. I'll update these numbers to reflect the related link.

content/blog/2025-07-01-compression-improves-everything/index.md

		3. [gzip](https://www.gzip.org/): A ubiquitous standard. While not as fast as newer algorithms, its widespread usage and compatibility makes it a safe choice


		For servers running in a modern data center, multi-core CPUs are the norm, but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively. These advancements in CPU hardware and compression software vastly outpace existing improvements in networking data transmission which is inherently limited by serial constraints. Compression algorithms like Zstd can operate at >500 MB/s, compared to 6.25 MB/s for the global average internet connection. Even an algorithm like Lz4 has a default compression ratio of 2.1, reducing network data transfer requirements by that same factor.

Member

stockholmux Jul 3, 2025

"but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively" <-- how are mobile devices relevant to valkey?

I don't understand the point you're making with "compared to 6.25 MB/s for the global average internet connection". No one is running Valkey over a "global average internet connection"

Author

xdk-amz Jul 3, 2025

An end-user for an application can be a mobile device. This application might be using Valkey as a cache for content that inevitably ends up on a mobile device.

I chose mobile devices as an example to highlight that the benefits can be relevant even if the compression and decompression happen on the worst possible hardware in the stack.

content/blog/2025-07-01-compression-improves-everything/index.md


		For servers running in a modern data center, multi-core CPUs are the norm, but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively. These advancements in CPU hardware and compression software vastly outpace existing improvements in networking data transmission which is inherently limited by serial constraints. Compression algorithms like Zstd can operate at >500 MB/s, compared to 6.25 MB/s for the global average internet connection. Even an algorithm like Lz4 has a default compression ratio of 2.1, reducing network data transfer requirements by that same factor.

		Hardware acceleration for compression is becoming increasingly common in modern CPUs and specialized chips, further tilting the balance in favor of compression as a performance optimization. What might have been computationally expensive a decade ago is now trivial, while bandwidth constraints remain significant, especially in mobile contexts. Though compression/decompression add processing overhead, this is practically negligible compared to network transfer time. The math is straightforward: if your compression ratio is >2 and your compression/decompression operates at >2x your network transfer speed, there will be a decrease in overall transfer times for the client in addition to the storage cost savings.

Member

stockholmux Jul 3, 2025

Provide a source for the "Hardware acceleration for compression is becoming increasingly common". Also is this relevant (or taken advantage of) in Valkey?

Also, the mobile stuff seems off topic entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet