Skip to content

arcadiasofts/clast-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clast Banner

A high-performance, extensible Content-Defined Chunking (CDC) library for Rust.


English │ 한국어


Overview

Clast is a modular library designed for Content-Defined Chunking (CDC). It splits data into variable-sized chunks based on content, rather than fixed offsets (Fixed-Size Chunking, FSC), making it a critical building block for data deduplication, backup systems, and efficient storage solutions.

Clast is architected to support multiple chunking algorithms, allowing developers to choose the best strategy for their specific use cases.


Key Features

  • High Performance: Optimized for throughput and low CPU overhead.
  • Modular Architecture: Designed to support various CDC algorithms.
  • Async & Sync: Support for both synchronous std::io and asynchronous tokio runtimes.

Supported Algorithms

FastCDC

An implementation of the FastCDC algorithm as described in The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems.

It incorporates five key optimizations:

  • Gear-based Rolling Hashing
  • Optimized Hash Judgment
  • Sub-minimum Chunk Cut-Point Skipping
  • Normalized Chunking
  • Rolling Two Bytes

Installation

Use cargo to install the package:

cargo add clast

Feature Flags

Clast uses feature flags to minimize the compiled binary size. You can selectively enable the features you need.

  • fastcdc: Enables the FastCDC algorithm implementation. (Enabled by default)
  • async: Enables asynchronous support using tokio.

Example of enabling only fastcdc (default behavior):

cargo add clast

Example of enabling fastcdc and async support:

cargo add clast --features async

Or in your Cargo.toml:

[dependencies]
clast = { version = "1.0.0", features = ["async"] }

Usage

Please refer to the Tutorials for detailed usage examples.


Reference

  • FastCDC: Wen Xia et al., "The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems," IEEE Transactions on Parallel and Distributed Systems, 2020.

Contributing

Contributions are welcome! Please feel free to open a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/new-feature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/new-feature).
  5. Open a Pull Request.

Please make sure to run cargo fmt and cargo test before opening a Pull Request.


License

MIT © Arcadia Softs. See LICENSE for details.

About

A Rust library for Content-Defined Chunking (CDC).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages