Skip to content

[BLOG] Compression Improves Everything #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xdk-amz
Copy link

@xdk-amz xdk-amz commented Jul 1, 2025

Signed-off-by: Dante Knowles xdk@amazon.com

Description

Adds a blog detailing the benefits of compression for caching workloads using non-binary data.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Dante Knowles <xdk@amazon.com>
Copy link
Member

@stockholmux stockholmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I started reviewing this blog post but stopped. I did read the rest of the post but given comments, it didn't feel like a good use of my time.

I feel like something is missing here. I don't quite understand the Valkey relevance. The post seems to generally advocate for compression but veers off into mobile connections and web page compression and never really ties into Valkey.

If you want to move forward with a compression blog post, please re-write and make it relevant to Valkey.

featured_image = "/assets/media/featured/random-03.webp"
+++

Rising data volumes include rising costs, and in this post we’ll explore how compression reduces your data size to “improve everything”, reducing costs and boosting performance with virtually no tradeoffs. Valkey’s cost and performance as a cache depends on several factors: network load, storage capacity, and retrieval time. As data volumes continue to grow exponentially across industries, the ability to efficiently store and retrieve information becomes increasingly critical to application performance and user experience. In this blog post, we’ll show how you can potentially reduce storage costs by 4x without compromising on client performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As data volumes continue to grow exponentially across industries, the ability to efficiently store and retrieve information becomes increasingly critical to application performance and user experience." <- drop this sentence. I don't think it adds anything

1. **JSON**
2. **HTML**
3. **XML**
4. **Javascript**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WTF is javascript doing in this list? It's not a "non-binary data format". I don't doubt that JS compresses well, but so all source code.

3. **XML**
4. **Javascript**

Assuming your data is in one of the above formats or a similar format, there are many algorithms available to effectively compress that data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels super restrictive and wrong. Far more than JSON, HTML, and XML compress well.

Let’s examine the performance of [Zstd](https://github.com/facebook/zstd), an open-source compression library released by Meta (dual-licensed under BSD/GPL-2.0). Zstd offers several advantages:

1. **High default compression ratio**: 2.896
2. **Effective multi-core scaling**: Can take advantage of unused CPU cores to accelerate the compression workload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a source for this? The other two are indicated on the website, I can't find much that precisely says what you're stating.

3. **Dictionary training**: Allows even small data to compress effectively by learning on repetition in the training set


Zstd’s [benchmarks](https://github.com/facebook/zstd?tab=readme-ov-file#benchmarks) report compression speeds of **510 MB/s** and decompression speeds of **1550 MB/s** on a **Core i7-9700K CPU @ 4.9GHz** at the default **2.896** compression ratio.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioning the default compression ratio here is repetitive.




1. [LZ4](https://github.com/lz4/lz4): Prioritizes speed over compression ratio, making it ideal for latency-sensitive applications. It achieves compression ratios around **2.100**, **675 MB/s** compression speed, and **3850 MB/s** decompression speed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked source doesn't have these numbers.

3. [gzip](https://www.gzip.org/): A ubiquitous standard. While not as fast as newer algorithms, its widespread usage and compatibility makes it a safe choice


For servers running in a modern data center, multi-core CPUs are the norm, but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively. These advancements in CPU hardware and compression software vastly outpace existing improvements in networking data transmission which is inherently limited by serial constraints. Compression algorithms like Zstd can operate at **>500 MB/s**, compared to **6.25 MB/s** for the global average internet connection. Even an algorithm like Lz4 has a default compression ratio of **2.1**, reducing network data transfer requirements by that same factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively" <-- how are mobile devices relevant to valkey?

I don't understand the point you're making with "compared to 6.25 MB/s for the global average internet connection". No one is running Valkey over a "global average internet connection"


For servers running in a modern data center, multi-core CPUs are the norm, but even today’s budget mobile devices offer 8 core CPUs that can parallelize compression and decompression effectively. These advancements in CPU hardware and compression software vastly outpace existing improvements in networking data transmission which is inherently limited by serial constraints. Compression algorithms like Zstd can operate at **>500 MB/s**, compared to **6.25 MB/s** for the global average internet connection. Even an algorithm like Lz4 has a default compression ratio of **2.1**, reducing network data transfer requirements by that same factor.

Hardware acceleration for compression is becoming increasingly common in modern CPUs and specialized chips, further tilting the balance in favor of compression as a performance optimization. What might have been computationally expensive a decade ago is now trivial, while bandwidth constraints remain significant, especially in mobile contexts. Though compression/decompression add processing overhead, this is practically negligible compared to network transfer time. The math is straightforward: if your compression ratio is **>2** and your compression/decompression operates at **>2x** your network transfer speed, there will be a decrease in overall transfer times for the client in addition to the storage cost savings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a source for the "Hardware acceleration for compression is becoming increasingly common". Also is this relevant (or taken advantage of) in Valkey?

Also, the mobile stuff seems off topic entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants