Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions doc/dev/deduplication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,16 +162,16 @@ Below we explain how to perform deduplication.

.. code:: bash

ceph-dedup-tool --op estimate --pool $POOL --chunk-size chunk_size
ceph-dedup-tool --op estimate --pool POOL --chunk-size CHUNK_SIZE
--chunk-algorithm fixed|fastcdc --fingerprint-algorithm sha1|sha256|sha512
--max-thread THREAD_COUNT

This CLI command will show how much storage space can be saved when deduplication
is applied on the pool. If the amount of the saved space is higher than user's expectation,
the pool probably is worth performing deduplication.
Users should specify $POOL where the object---the users want to perform
Users should specify ``POOL`` where the object---the users want to perform
deduplication---is stored. The users also need to run ceph-dedup-tool multiple time
with varying ``chunk_size`` to find the optimal chunk size. Note that the
with varying ``CHUNK_SIZE`` to find the optimal chunk size. Note that the
optimal value probably differs in the content of each object in case of fastcdc
chunk algorithm (not fixed). Example output:

Expand All @@ -196,7 +196,7 @@ chunk algorithm (not fixed). Example output:

The above is an example output when executing ``estimate``. ``target_chunk_size`` is the same as
``chunk_size`` given by the user. ``dedup_bytes_ratio`` shows how many bytes are redundant from
examined bytes. For instance, 1 - ``dedup_bytes_ratio`` means the percentage of saved storage space.
examined bytes. For instance, (1 - ``dedup_bytes_ratio``) * 100 means the percentage of saved storage space.
``dedup_object_ratio`` is the generated chunk objects / ``examined_objects``. ``chunk_size_average``
means that the divided chunk size on average when performing CDC---this may differnet from ``target_chunk_size``
because CDC genarates differnt chunk-boundary depending on the content. ``chunk_size_stddev``
Expand Down Expand Up @@ -227,13 +227,13 @@ If --loop is set, the theads will wakeup after ``WAKEUP_PERIOD``. If not, the th
.. code:: bash

ceph-dedup-tool --op object-dedup --pool POOL --object OID --chunk-pool CHUNK_POOL
--fingerprint-algorithm sha1|sha256|sha512 --dedup-cdc-chunk-size CHUNK_SIZE
--fingerprint-algorithm FP_ALGO --dedup-cdc-chunk-size CHUNK_SIZE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP_ALGO (sha1|sha256|sha512)


The ``object-dedup`` command triggers deduplication on the RADOS object specified by ``OID``.
All parameters shown above must be specified. ``CHUNK_SIZE`` should be taken from
the results of step 1 above.
Note that when this command is executed, ``fastcdc`` will be set by default and other parameters
such as ``FP`` and ``CHUNK_SIZE`` will be set as defaults for the pool.
such as ``FP_ALGO`` and ``CHUNK_SIZE`` will be set as defaults for the pool.
Deduplicated objects will appear in the chunk pool. If the object is mutated over time, user needs to re-run
``object-dedup`` because chunk-boundary should be recalculated based on updated contents.
The user needs to specify ``snap`` if the target object is snapshotted. After deduplication is done, the target
Expand Down
6 changes: 3 additions & 3 deletions src/tools/ceph_dedup_tool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -152,11 +152,11 @@ po::options_description make_usage() {
": fix mismatched references")
("op dump-chunk-refs --chunk-pool <POOL> --object <OID>",
": dump chunk object's references")
("op chunk-dedup --pool <POOL> --object <OID> --chunk-pool <POOL> --fingerprint-algorithm <FP> --source-off <OFFSET> --source-length <LENGTH>",
("op chunk-dedup --pool <POOL> --object <OID> --chunk-pool <POOL> --fingerprint-algorithm <FP_ALGO> --source-off <OFFSET> --source-length <LENGTH>",
": perform a chunk dedup---deduplicate only a chunk, which is a part of object.")
("op object-dedup --pool <POOL> --object <OID> --chunk-pool <POOL> --fingerprint-algorithm <FP> --dedup-cdc-chunk-size <CHUNK_SIZE> [--snap]",
("op object-dedup --pool <POOL> --object <OID> --chunk-pool <POOL> --fingerprint-algorithm <FP_ALGO> --dedup-cdc-chunk-size <CHUNK_SIZE> [--snap]",
": perform a object dedup---deduplicate the entire object, not a chunk. Related snapshots are also deduplicated if --snap is given")
("op sample-dedup --pool <POOL> --chunk-pool <POOL> --chunk-algorithm <ALGO> --fingerprint-algorithm <FP> --daemon --loop",
("op sample-dedup --pool <POOL> --chunk-pool <POOL> --chunk-algorithm <ALGO> --fingerprint-algorithm <FP_ALGO> --daemon --loop",
": perform a sample dedup---make crawling threads which crawl objects in base pool and deduplicate them based on their deduplication efficiency")
;
po::options_description op_desc("Opational arguments");
Expand Down