feat: add zstd compression to local chunk store#862
Conversation
Summary of ChangesHello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces Zstandard compression to the local chunk storage mechanism. The primary goal is to significantly reduce the disk footprint of cached data, which is particularly beneficial for environments with large data volumes like Nix store paths. The changes ensure that data is compressed before being written and decompressed seamlessly when read, without altering the external API of the chunk store. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces zstd compression to the local chunk store, aiming to reduce disk usage by compressing data on PutChunk and decompressing it on GetChunk. However, a significant race condition exists in the PutChunk implementation due to the use of a shared, unsynchronized zstd.Encoder instance. Since klauspost/compress/zstd's zstd.Encoder is not thread-safe, concurrent calls to PutChunk could lead to data corruption or application crashes. It is recommended to implement a sync.Pool for encoders or use a mutex to serialize access. Additionally, there are a couple of suggestions for improvement: remove the unused decoder field in the localStore struct to simplify the code, and refactor localReadCloser to use composition instead of embedding for a more explicit and robust API.
51a8b70 to
9c79f8c
Compare
0efd7d8 to
971925c
Compare
9c79f8c to
49755ec
Compare
971925c to
117d7b0
Compare
117d7b0 to
6518e24
Compare
6518e24 to
24a34db
Compare
Implement Zstandard (zstd) compression for the local chunk store to reduce disk usage for cached chunks. Changes: - Added zstd encoder to the localStore struct for efficient compression. - Modified PutChunk to compress data using zstd before writing to disk and return the compressed size. - Updated GetChunk to use zstd.NewReader for transparent decompression, wrapped in a localReadCloser to ensure proper resource cleanup. - Added comprehensive tests to verify compression efficiency and data integrity through round-trip checks. This change reduces the storage footprint of the local chunk cache, which is especially beneficial for large Nix store paths.
24a34db to
c933ddf
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #862 +/- ##
=====================================
Coverage 3.96% 3.96%
=====================================
Files 6 6
Lines 429 429
=====================================
Hits 17 17
Misses 409 409
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|

Implement Zstandard (zstd) compression for the local chunk store to reduce disk usage for cached chunks.
Changes:
This change reduces the storage footprint of the local chunk cache, which is especially beneficial for large Nix store paths.