mds: scrub fix by sajibreadd-croit · Pull Request #6 · croit/ceph

sajibreadd-croit · 2025-12-19T11:04:04Z

remote link damage identification with reverse parent scrubbing
- remote link identification becomes tricky if inode is cached.
- Try to open the link normally, if issue while opening mark as damaged
- If openned successfully, it can be possible there is damage but inode
  is cached that's why it is succssful while opening. In that case take
  that openned inode, and scrub ancestors recursively. If any of the ancestor
  is damaged it remote link is marked as damaged.
- while scrubbing some flag is maintained in the inode,
  e.g. whether scrub is backward or forward or both
- his backward scrubbing will only work in read-only scrub that means
  without repair flag and mds_scrub_hard_link this ceph flag is turned on.
- A new type of damage introduced, using which multiple links point to same
  inode can be identified, which was not possible previously.
mds_damage_log_to_file and mds_damage_log_file is used to print out damages
in a file persistently as it's not safe to keep it in memory

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

sajibreadd-croit · 2025-12-19T11:08:02Z

src/mds/ScrubStack.cc

+      } else if (in->scrub_info()->forward_scrub) {
+        bool added_children = false;
+        bool done = false; // it's done, so pop it off the stack
+        scrub_dir_inode(in, &added_children, &done);
+        if (done) {
+          dout(20) << __func__ << " dir inode, done" << dendl;
+          in->set_forward_scrub(false);
+          dequeue(in);
+        }
+        if (added_children) {
+          // dirfrags were queued at top of stack
+          it = scrub_stack.begin();
+        }


@ifed01 regarding this comment #5 (comment)

Shouldn't we call scrub_dir_inode_final(in) here if (!remote_links().empty()) ?

if (!remote_links.empty) then inside scrub_dir_inode it will call scrub_dir_inode_final(in) and handle the backward scrub there.

hmm... But calling scrub_dir_inode_final() would happen before forward_scrub is cleared so it would be a no-op in regard to remote links, nor?

src/mds/ScrubStack.cc

For scrubbing dirfrag we are pushing children back into the scrub stack. Instead we can follow the same strategy for scrub directory and pushing children front of the scrub stack, and in kick_off_scrubs always start scrubbing from the front of the stack. It will prevent ScrubStack to pinning whole level of the file-system tree. Fixes: https://tracker.ceph.com/issues/71167 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>

Fixes: https://tracker.ceph.com/issues/68611 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>

src/mds/CDir.cc

ifed01 · 2026-01-13T09:47:13Z

src/common/options/mds.yaml.in

+- name: mds_scrub_hard_link
+  type: bool
+  level: advanced
+  desc: force scrubbing hard link


IMO better rephrase to "force hard link scrubbing"

src/mds/DamageTable.cc

ifed01 · 2026-01-13T10:15:08Z

src/mds/DamageTable.h

+    void set_log_to_file(bool _log_to_file) {
+      log_to_file = _log_to_file;
+      if (log_to_file) {
+        open_damage_log_file(fout, log_file);


May be bettet to check for non_empty log_file before calling open_damage_log_file() to avoid redundant errors on empy file opening attempt in the log.

src/mds/CInode.cc

ifed01 · 2026-01-13T12:57:26Z

src/mds/ScrubStack.cc

+  in->scrub_reset_remote_links();
+
+  if (in->scrub_info()->forward_scrub) {
+    _enqueue(in, header, true);


I don't understand why do we need to enqueue it once again here - IIUC this will effectively do nothing as remote_links are emty and in->scrub_is_in_progress() == true

And if this block is removed then scrub_reset_remote_links() call above is redundant as well.

First of all scrubbing operation are not parallel in mds side, it parallalize the I/O towards osd only.

void *MDSRank::ProgressThread::entry() { std::unique_lock l(mds->mds_lock); while (true) { cond.wait(l, [this] { return (mds->stopping || !mds->finished_queue.empty() || (!mds->waiting_for_nolaggy.empty() && !mds->beacon.is_laggy())); }); if (mds->stopping) { break; } mds->_advance_queues(); } return NULL; }

This progress thread runs every task sequentially and under big mds_lock

I don't understand why do we need to enqueue it once again here

Let's say forward_scrub is false, in that case it will scrub backwards. Now for a specific inode it sends request to osd through in->validate_disk_state(&fin->result, fin);. This I/O is asynchronous. So in this async I/O, what if in between some directory recursed through that inode and tries to enqeue it for forward scrubbing. But as scrub is in progress (asynchronously) it won't push that inode into the queue but make forward_scrub = true here.

int ScrubStack::_enqueue( MDSCacheObject *obj, ScrubHeaderRef &header, bool top, bool *added, std::vector<std::pair<std::string, inodeno_t>> &&remote_links) { ceph_assert(ceph_mutex_is_locked_by_me(mdcache->mds->mds_lock)); if (CInode *in = dynamic_cast<CInode*>(obj)) { if (in->scrub_is_in_progress()) { dout(10) << __func__ << " with {" << *in << "}" << ", already in scrubbing" << dendl; if (!remote_links.empty()) { in->scrub_add_remote_link(std::move(remote_links)); } else { in->set_forward_scrub(true); } return -CEPHFS_EBUSY; }

You are right. We need to make scrub_is_in_progress false before _enqueue call

src/mds/ScrubStack.cc

src/mds/ScrubHeader.h

1. remote link damage identification with reverse parent scrubbing - remote link identification becomes tricky if inode is cached. - Try to open the link normally, if issue while opening mark as damaged - If openned successfully, it can be possible there is damage but inode is cached that's why it is succssful while opening. In that case take that openned inode, and scrub ancestors recursively. If any of the ancestor is damaged it remote link is marked as damaged. - while scrubbing some flag is maintained in the inode, e.g. whether scrub is backward or forward or both - his backward scrubbing will only work in read-only scrub that means without repair flag and mds_scrub_hard_link this ceph flag is turned on. - A new type of damage introduced, using which multiple links point to same inode can be identified, which was not possible previously. 2. mds_damage_log_to_file and mds_damage_log_file is used to print out damages in a file persistently as it's not safe to keep it in memory Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>

github-actions bot added cephfs common config-change labels Dec 19, 2025

sajibreadd-croit commented Dec 19, 2025

View reviewed changes

src/mds/ScrubStack.cc Outdated Show resolved Hide resolved

sajibreadd-croit requested a review from ifed01 December 19, 2025 11:11

sajibreadd-croit force-pushed the wip-washu-scrub-fix-v18.2.4 branch 5 times, most recently from 280aec3 to 3fdcc5b Compare December 19, 2025 12:44

sajibreadd-croit force-pushed the wip-washu-scrub-fix-v18.2.4 branch from 3fdcc5b to 9e16a8e Compare January 7, 2026 08:10

github-actions bot added core mon api-change build/ops documentation mgr pybind cephadm orchestrator rook bluestore crimson dashboard CI rbd nvmeof rgw ceph-volume labels Jan 7, 2026

github-actions bot added tests nfs monitoring telemetry script labels Jan 7, 2026

sajibreadd-croit added 2 commits January 7, 2026 09:22

mds: gracefully terminate missed dir object scrubbing

314eadc

Fixes: https://tracker.ceph.com/issues/68611 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>

sajibreadd-croit force-pushed the wip-washu-scrub-fix-v18.2.4 branch from 9e16a8e to 14e62d3 Compare January 7, 2026 08:23

ifed01 mentioned this pull request Jan 13, 2026

mds scrub fix #5

Open

14 tasks

ifed01 reviewed Jan 29, 2026

View reviewed changes

sajibreadd-croit force-pushed the wip-washu-scrub-fix-v18.2.4 branch from 14e62d3 to 59e5062 Compare January 29, 2026 15:06

sajibreadd-croit force-pushed the wip-washu-scrub-fix-v18.2.4 branch from 59e5062 to de31182 Compare January 30, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mds: scrub fix#6

mds: scrub fix#6
sajibreadd-croit wants to merge 3 commits intocroit-ceph-v18.2.4from
wip-washu-scrub-fix-v18.2.4

sajibreadd-croit commented Dec 19, 2025 •

edited

Loading

Uh oh!

sajibreadd-croit Dec 19, 2025 •

edited

Loading

Uh oh!

ifed01 Dec 19, 2025

Uh oh!

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Uh oh!

ifed01 Jan 13, 2026

Uh oh!

sajibreadd-croit Jan 29, 2026 •

edited

Loading

Uh oh!

sajibreadd-croit Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sajibreadd-croit commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

sajibreadd-croit Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ifed01 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ifed01 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

ifed01 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

sajibreadd-croit Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sajibreadd-croit Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sajibreadd-croit commented Dec 19, 2025 •

edited

Loading

sajibreadd-croit Dec 19, 2025 •

edited

Loading

sajibreadd-croit Jan 29, 2026 •

edited

Loading

sajibreadd-croit Jan 29, 2026 •

edited

Loading