Clast is a modular library designed for Content-Defined Chunking (CDC). It splits data into variable-sized chunks based on content, rather than fixed offsets (Fixed-Size Chunking, FSC), making it a critical building block for data deduplication, backup systems, and efficient storage solutions.
Clast is architected to support multiple chunking algorithms, allowing developers to choose the best strategy for their specific use cases.
- High Performance: Optimized for throughput and low CPU overhead.
- Modular Architecture: Designed to support various CDC algorithms.
- Async & Sync: Support for both synchronous
std::ioand asynchronoustokioruntimes.
An implementation of the FastCDC algorithm as described in The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems.
It incorporates five key optimizations:
- Gear-based Rolling Hashing
- Optimized Hash Judgment
- Sub-minimum Chunk Cut-Point Skipping
- Normalized Chunking
- Rolling Two Bytes
Use cargo to install the package:
cargo add clastClast uses feature flags to minimize the compiled binary size. You can selectively enable the features you need.
fastcdc: Enables the FastCDC algorithm implementation. (Enabled by default)async: Enables asynchronous support usingtokio.
Example of enabling only fastcdc (default behavior):
cargo add clastExample of enabling fastcdc and async support:
cargo add clast --features asyncOr in your Cargo.toml:
[dependencies]
clast = { version = "1.0.0", features = ["async"] }Please refer to the Tutorials for detailed usage examples.
- FastCDC: Wen Xia et al., "The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems," IEEE Transactions on Parallel and Distributed Systems, 2020.
Contributions are welcome! Please feel free to open a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/new-feature). - Commit your changes (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/new-feature). - Open a Pull Request.
Please make sure to run cargo fmt and cargo test before opening a Pull Request.
MIT © Arcadia Softs. See LICENSE for details.
