Do orphaned inode extent freeing in chunks by chriskirby1 · Pull Request #231 · versity/scoutfs

chriskirby1 · 2025-06-09T22:55:52Z

Fix an issue where the final freeing of extents for unlinked files could trigger spurious hung task timeout warnings. Instead of holding the scoutfs inode lock for the entire duration of extent freeing, release and reacquire the lock every time the transaction sequence number changes.

This is only used in the evict and orphan inode cleanup paths; truncate and release continue to free during a single lock hold.

Remove the hung_task_timeout_secs workaround from the large-fragmented-free test script.

chriskirby1 · 2025-06-10T15:59:06Z

retest

chriskirby1 · 2025-06-11T17:34:00Z

The test failures are either hung task timeouts on debug kernels or the try_drain_data_freed commit budget issue that's fixed by another PR. The hung task complaints are gone from non-debug kernels. This change also keeps what could be background cleanup work from blocking access to an entire inode shard, so I'm keeping the PR open.

aversecat · 2025-06-11T18:15:40Z

I would put an empty newline before any s-o-b lines, for clarity. Having them right after the last sentence of the commit message makes them look munged.

Fix an issue where the final freeing of extents for unlinked files could trigger spurious hung task timeout warnings. Instead of holding the scoutfs inode lock for the entire duration of extent freeing, release and reacquire the lock every time the transaction sequence number changes. This is only used in the evict and orphan inode cleanup paths; truncate and release continue to free during a single lock hold. Remove the hung_task_timeout_secs workaround from the large-fragmented-free test script. Signed-off-by: Chris Kirby <ckirby@versity.com>

aversecat · 2025-06-12T17:37:12Z

I'm concerned that we still have not solved the actual issue here. I've hit the task hung timeout with v1.24-21-g497372d (which is your version before the rebase) and all 4 orphan workers are stuck for 120s even despite the total test took 453s (so not slower at all).

Even if we are going to drastically allow hung tasks to take much longer, it's still a sign that something is locked semi permanently while the goal of this patch is to lock/unlock every second or so... right?

not ok 19 - large-fragmented-free
 # large-fragmented-free ** failed **
 #
 # dmesg:
 #
 # [37970.881552] INFO: task kworker/14:1:2464557 blocked for more than 122 seconds.
 # [37970.885547]       Tainted: G           OE     -------  ---  5.14.0-503.35.1.el9_5.x86_64+debug #1
 # [37970.888859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 # [37970.892168] task:kworker/14:1    state:D stack:22880 pid:2464557 tgid:2464557 ppid:2      flags:0x00004000
 # [37970.892185] Workqueue: events inode_orphan_scan_worker [scoutfs]
 # [37970.892257] Call Trace:
 # [37970.892261]  <TASK>
 # [37970.892272]  __schedule+0x778/0x1600
 # [37970.892287]  ? __pfx___schedule+0x10/0x10
 # [37970.892306]  ? schedule+0x24c/0x300
 # [37970.892318]  schedule+0xe8/0x300
 # [37970.892327]  lock_key_range+0x848/0x1740 [scoutfs]
 # [37970.892403]  ? __pfx_lock_key_range+0x10/0x10 [scoutfs]
 # [37970.892475]  ? __pfx_autoremove_wake_function+0x10/0x10
 # [37970.892494]  ? mark_lock.part.0+0xca/0xa50
 # [37970.892503]  ? __pfx_mark_lock.part.0+0x10/0x10
 # [37970.892515]  scoutfs_lock_ino+0x12e/0x170 [scoutfs]
 # [37970.892576]  ? __pfx_scoutfs_lock_ino+0x10/0x10 [scoutfs]
 # [37970.892646]  ? _raw_spin_unlock_irq+0x2f/0x50
 # [37970.892653]  ? __wait_for_common+0x2c6/0x5d0
 # [37970.892659]  ? __pfx_schedule_timeout+0x10/0x10
 # [37970.892672]  try_delete_inode_items.isra.0+0x123/0x480 [scoutfs]
 # [37970.892729]  ? submit_send.isra.0+0x5ae/0x940 [scoutfs]
 # [37970.892794]  ? __pfx_try_delete_inode_items.isra.0+0x10/0x10 [scoutfs]
 # [37970.892865]  ? __pfx_scoutfs_net_sync_request+0x10/0x10 [scoutfs]
 # [37970.892939]  ? __wait_for_common+0x3e3/0x5d0
 # [37970.892950]  ? scoutfs_omap_test+0x204/0x320 [scoutfs]
 # [37970.893013]  ? __pfx_scoutfs_omap_test+0x10/0x10 [scoutfs]
 # [37970.893079]  ? scoutfs_client_open_ino_map+0xf4/0x170 [scoutfs]
 # [37970.893136]  ? __pfx_scoutfs_client_open_ino_map+0x10/0x10 [scoutfs]
 # [37970.893198]  inode_orphan_scan_worker+0x4cc/0x720 [scoutfs]
 # [37970.893265]  ? __pfx_inode_orphan_scan_worker+0x10/0x10 [scoutfs]
 # [37970.893327]  ? local_clock_noinstr+0x9/0xc0
 # [37970.893394]  process_one_work+0x89e/0x1600
 # [37970.893425]  ? __lock_acquired+0x207/0x800
 # [37970.893434]  ? __pfx_process_one_work+0x10/0x10
 # [37970.893453]  ? assign_work+0x16c/0x240
 # [37970.893466]  worker_thread+0x586/0xd20
 # [37970.893485]  ? __pfx_worker_thread+0x10/0x10
 # [37970.893495]  kthread+0x2ab/0x360
 # [37970.893501]  ? _raw_spin_unlock_irq+0x24/0x50
 # [37970.893507]  ? __pfx_kthread+0x10/0x10
 # [37970.893516]  ret_from_fork+0x29/0x50
 # [37970.893540]  </TASK>
 # [37970.893547] INFO: task kworker/6:0:2593030 blocked for more than 122 seconds.
 # [37970.896847]       Tainted: G           OE     -------  ---  5.14.0-503.35.1.el9_5.x86_64+debug #1
 # [37970.900154] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 # [37970.903459] task:kworker/6:0     state:D stack:22456 pid:2593030 tgid:2593030 ppid:2      flags:0x00004000
 # [37970.903474] Workqueue: events inode_orphan_scan_worker [scoutfs]
 # [37970.903538] Call Trace:
 # [37970.903542]  <TASK>
 # [37970.903552]  __schedule+0x778/0x1600
 # [37970.903565]  ? __pfx___schedule+0x10/0x10
 # [37970.903584]  ? schedule+0x24c/0x300
 # [37970.903596]  schedule+0xe8/0x300
 # [37970.903604]  lock_key_range+0x848/0x1740 [scoutfs]
 # [37970.903679]  ? __pfx_lock_key_range+0x10/0x10 [scoutfs]
 # [37970.903741]  ? __pfx_autoremove_wake_function+0x10/0x10
 # [37970.903758]  ? mark_lock.part.0+0xca/0xa50
 # [37970.903767]  ? __pfx_mark_lock.part.0+0x10/0x10
 # [37970.903779]  scoutfs_lock_ino+0x12e/0x170 [scoutfs]
 # [37970.903840]  ? __pfx_scoutfs_lock_ino+0x10/0x10 [scoutfs]
 # [37970.903910]  ? _raw_spin_unlock_irq+0x2f/0x50
 # [37970.903917]  ? __wait_for_common+0x2c6/0x5d0
 # [37970.903923]  ? __pfx_schedule_timeout+0x10/0x10
 # [37970.903936]  try_delete_inode_items.isra.0+0x123/0x480 [scoutfs]
 # [37970.903993]  ? submit_send.isra.0+0x5ae/0x940 [scoutfs]
 # [37970.904059]  ? __pfx_try_delete_inode_items.isra.0+0x10/0x10 [scoutfs]
 # [37970.904130]  ? __pfx_scoutfs_net_sync_request+0x10/0x10 [scoutfs]
 # [37970.904205]  ? __wait_for_common+0x3e3/0x5d0
 # [37970.904215]  ? scoutfs_omap_test+0x204/0x320 [scoutfs]
 # [37970.904279]  ? __pfx_scoutfs_omap_test+0x10/0x10 [scoutfs]
 # [37970.904344]  ? scoutfs_client_open_ino_map+0xf4/0x170 [scoutfs]
 # [37970.904402]  ? __pfx_scoutfs_client_open_ino_map+0x10/0x10 [scoutfs]
 # [37970.904476]  inode_orphan_scan_worker+0x4cc/0x720 [scoutfs]
 # [37970.904544]  ? __pfx_inode_orphan_scan_worker+0x10/0x10 [scoutfs]
 # [37970.904607]  ? local_clock_noinstr+0x9/0xc0
 # [37970.904672]  process_one_work+0x89e/0x1600
 # [37970.904691]  ? __lock_acquired+0x207/0x800
 # [37970.904699]  ? __pfx_process_one_work+0x10/0x10
 # [37970.904718]  ? assign_work+0x16c/0x240
 # [37970.904730]  worker_thread+0x586/0xd20
 # [37970.904749]  ? __pfx_worker_thread+0x10/0x10
 # [37970.904759]  kthread+0x2ab/0x360
 # [37970.904765]  ? _raw_spin_unlock_irq+0x24/0x50
 # [37970.904771]  ? __pfx_kthread+0x10/0x10
 # [37970.904780]  ret_from_fork+0x29/0x50
 # [37970.904803]  </TASK>
 # [37970.904817] INFO: task kworker/2:2:2661545 blocked for more than 122 seconds.
 # [37970.908126]       Tainted: G           OE     -------  ---  5.14.0-503.35.1.el9_5.x86_64+debug #1
 # [37970.911431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 # [37970.914703] task:kworker/2:2     state:D stack:22152 pid:2661545 tgid:2661545 ppid:2      flags:0x00004000
 # [37970.914717] Workqueue: events inode_orphan_scan_worker [scoutfs]
 # [37970.914780] Call Trace:
 # [37970.914784]  <TASK>
 # [37970.914793]  __schedule+0x778/0x1600
 # [37970.914807]  ? __pfx___schedule+0x10/0x10
 # [37970.914825]  ? schedule+0x24c/0x300
 # [37970.914837]  schedule+0xe8/0x300
 # [37970.914845]  lock_key_range+0x848/0x1740 [scoutfs]
 # [37970.914920]  ? __pfx_lock_key_range+0x10/0x10 [scoutfs]
 # [37970.914981]  ? __pfx_autoremove_wake_function+0x10/0x10
 # [37970.914998]  ? mark_lock.part.0+0xca/0xa50
 # [37970.915007]  ? __pfx_mark_lock.part.0+0x10/0x10
 # [37970.915019]  scoutfs_lock_ino+0x12e/0x170 [scoutfs]
 # [37970.915079]  ? __pfx_scoutfs_lock_ino+0x10/0x10 [scoutfs]
 # [37970.915149]  ? _raw_spin_unlock_irq+0x2f/0x50
 # [37970.915156]  ? __wait_for_common+0x2c6/0x5d0
 # [37970.915162]  ? __pfx_schedule_timeout+0x10/0x10
 # [37970.915175]  try_delete_inode_items.isra.0+0x123/0x480 [scoutfs]
 # [37970.915233]  ? submit_send.isra.0+0x5ae/0x940 [scoutfs]
 # [37970.915298]  ? __pfx_try_delete_inode_items.isra.0+0x10/0x10 [scoutfs]
 # [37970.915369]  ? __pfx_scoutfs_net_sync_request+0x10/0x10 [scoutfs]
 # [37970.915454]  ? __wait_for_common+0x3e3/0x5d0
 # [37970.915465]  ? scoutfs_omap_test+0x204/0x320 [scoutfs]
 # [37970.915529]  ? __pfx_scoutfs_omap_test+0x10/0x10 [scoutfs]
 # [37970.915594]  ? scoutfs_client_open_ino_map+0xf4/0x170 [scoutfs]
 # [37970.915652]  ? __pfx_scoutfs_client_open_ino_map+0x10/0x10 [scoutfs]
 # [37970.915715]  inode_orphan_scan_worker+0x4cc/0x720 [scoutfs]
 # [37970.915782]  ? __pfx_inode_orphan_scan_worker+0x10/0x10 [scoutfs]
 # [37970.915844]  ? local_clock_noinstr+0x9/0xc0
 # [37970.915909]  process_one_work+0x89e/0x1600
 # [37970.915928]  ? __lock_acquired+0x207/0x800
 # [37970.915936]  ? __pfx_process_one_work+0x10/0x10
 # [37970.915955]  ? assign_work+0x16c/0x240
 # [37970.915967]  worker_thread+0x586/0xd20
 # [37970.915986]  ? __pfx_worker_thread+0x10/0x10
 # [37970.915996]  kthread+0x2ab/0x360
 # [37970.916002]  ? _raw_spin_unlock_irq+0x24/0x50
 # [37970.916008]  ? __pfx_kthread+0x10/0x10
 # [37970.916017]  ret_from_fork+0x29/0x50
 # [37970.916040]  </TASK>
 # [37970.916049] INFO: task kworker/0:3:2674586 blocked for more than 122 seconds.
 # [37970.919320]       Tainted: G           OE     -------  ---  5.14.0-503.35.1.el9_5.x86_64+debug #1
 # [37970.922599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 # [37970.925880] task:kworker/0:3     state:D stack:22880 pid:2674586 tgid:2674586 ppid:2      flags:0x00004000
 # [37970.925894] Workqueue: events inode_orphan_scan_worker [scoutfs]
 # [37970.925956] Call Trace:
 # [37970.925960]  <TASK>
 # [37970.925969]  __schedule+0x778/0x1600
 # [37970.925982]  ? __pfx___schedule+0x10/0x10
 # [37970.926001]  ? schedule+0x24c/0x300
 # [37970.926012]  schedule+0xe8/0x300
 # [37970.926021]  lock_key_range+0x848/0x1740 [scoutfs]
 # [37970.926096]  ? __pfx_lock_key_range+0x10/0x10 [scoutfs]
 # [37970.926157]  ? __pfx_autoremove_wake_function+0x10/0x10
 # [37970.926174]  ? mark_lock.part.0+0xca/0xa50
 # [37970.926183]  ? __pfx_mark_lock.part.0+0x10/0x10
 # [37970.926195]  scoutfs_lock_ino+0x12e/0x170 [scoutfs]
 # [37970.926256]  ? __pfx_scoutfs_lock_ino+0x10/0x10 [scoutfs]
 # [37970.926326]  ? _raw_spin_unlock_irq+0x2f/0x50
 # [37970.926333]  ? __wait_for_common+0x2c6/0x5d0
 # [37970.926339]  ? __pfx_schedule_timeout+0x10/0x10
 # [37970.926351]  try_delete_inode_items.isra.0+0x123/0x480 [scoutfs]
 # [37970.926409]  ? submit_send.isra.0+0x5ae/0x940 [scoutfs]
 # [37970.926485]  ? __pfx_try_delete_inode_items.isra.0+0x10/0x10 [scoutfs]
 # [37970.926557]  ? __pfx_scoutfs_net_sync_request+0x10/0x10 [scoutfs]
 # [37970.926632]  ? __wait_for_common+0x3e3/0x5d0
 # [37970.926642]  ? scoutfs_omap_test+0x204/0x320 [scoutfs]
 # [37970.926706]  ? __pfx_scoutfs_omap_test+0x10/0x10 [scoutfs]
 # [37970.926772]  ? scoutfs_client_open_ino_map+0xf4/0x170 [scoutfs]
 # [37970.926830]  ? __pfx_scoutfs_client_open_ino_map+0x10/0x10 [scoutfs]
 # [37970.926893]  inode_orphan_scan_worker+0x4cc/0x720 [scoutfs]
 # [37970.926959]  ? __pfx_inode_orphan_scan_worker+0x10/0x10 [scoutfs]
 # [37970.927021]  ? local_clock_noinstr+0x9/0xc0
 # [37970.927087]  process_one_work+0x89e/0x1600
 # [37970.927106]  ? __lock_acquired+0x207/0x800
 # [37970.927114]  ? __pfx_process_one_work+0x10/0x10
 # [37970.927132]  ? assign_work+0x16c/0x240
 # [37970.927145]  worker_thread+0x586/0xd20
 # [37970.927164]  ? __pfx_worker_thread+0x10/0x10
 # [37970.927174]  kthread+0x2ab/0x360
 # [37970.927179]  ? _raw_spin_unlock_irq+0x24/0x50
 # [37970.927185]  ? __pfx_kthread+0x10/0x10
 # [37970.927194]  ret_from_fork+0x29/0x50
 # [37970.927217]  </TASK>
 # [37970.927225] 
 #                Showing all locks held in the system:
 # [37970.927239] 1 lock held by khungtaskd/146:
 # [37970.927244]  #0: ffffffffb8f37740 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x7/0x30
 # [37970.927282] 2 locks held by kworker/u65:1/1899590:
 # [37970.927287]  #0: ffff8883d2018158 ((wq_completion)scoutfs_quorum_work#4){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927310]  #1: ffffc9000253fdc0 ((work_completion)(&qinf->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927331] 2 locks held by kworker/u65:5/1901891:
 # [37970.927335]  #0: ffff888d8ee33958 ((wq_completion)scoutfs_quorum_work#5){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927357]  #1: ffffc900091d7dc0 ((work_completion)(&qinf->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927377] 2 locks held by kworker/u64:21/2357628:
 # [37970.927381]  #0: ffff8884e9b2c158 ((wq_completion)scoutfs_trans){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927401]  #1: ffffc90022d4fdc0 ((work_completion)(&(&tri->write_work)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927432] 2 locks held by kworker/14:1/2464557:
 # [37970.927436]  #0: ffff888100063558 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927456]  #1: ffffc90024977dc0 ((work_completion)(&(&inf->orphan_scan_dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927476] 2 locks held by kworker/6:0/2593030:
 # [37970.927480]  #0: ffff888100063558 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927499]  #1: ffffc900246e7dc0 ((work_completion)(&(&inf->orphan_scan_dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927520] 2 locks held by kworker/u65:0/2659656:
 # [37970.927524]  #0: ffff88862ad5f158 ((wq_completion)scoutfs_quorum_work#6){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927546]  #1: ffffc9000246fdc0 ((work_completion)(&qinf->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927566] 2 locks held by kworker/u64:1/2659683:
 # [37970.927570]  #0: ffff888b338f7158 ((wq_completion)scoutfs_net_accepted#5){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927592]  #1: ffffc90002c9fdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927612] 2 locks held by kworker/u64:2/2659684:
 # [37970.927615]  #0: ffff888b338f2958 ((wq_completion)scoutfs_net_accepted#3){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927637]  #1: ffffc90002ccfdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927657] 2 locks held by kworker/u64:6/2659697:
 # [37970.927661]  #0: ffff8883d201e158 ((wq_completion)scoutfs_net_client#8){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927683]  #1: ffffc90002d8fdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927702] 2 locks held by kworker/u64:10/2659714:
 # [37970.927706]  #0: ffff888b338f7958 ((wq_completion)scoutfs_net_accepted#2){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927728]  #1: ffffc9000368fdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927747] 5 locks held by kworker/u64:12/2659715:
 # [37970.927752] 2 locks held by kworker/u65:4/2659747:
 # [37970.927756]  #0: ffff8882dc1de158 ((wq_completion)scoutfs_log_merge#4){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927777]  #1: ffffc90003937dc0 ((work_completion)(&(&finf->log_merge_dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927797] 2 locks held by kworker/u64:15/2660004:
 # [37970.927801]  #0: ffff888d8ee32158 ((wq_completion)scoutfs_server){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927820]  #1: ffffc90004687dc0 ((work_completion)(&server->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927840] 2 locks held by kworker/u64:19/2660200:
 # [37970.927844]  #0: ffff888df2a90158 ((wq_completion)scoutfs_net_client#7){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927866]  #1: ffffc900053f7dc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927885] 2 locks held by kworker/u64:22/2660202:
 # [37970.927889]  #0: ffff888d8ee35158 ((wq_completion)scoutfs_net_client#9){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927911]  #1: ffffc90005577dc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927931] 2 locks held by kworker/u64:24/2660283:
 # [37970.927935]  #0: ffff888b338f5158 ((wq_completion)scoutfs_net_accepted#4){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927957]  #1: ffffc90005867dc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.927976] 2 locks held by kworker/2:2/2661545:
 # [37970.927980]  #0: ffff888100063558 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.927999]  #1: ffffc9000afe7dc0 ((work_completion)(&(&inf->orphan_scan_dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928019] 2 locks held by kworker/u64:25/2670583:
 # [37970.928023]  #0: ffff88880f551958 ((wq_completion)scoutfs_net_client#6){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.928044]  #1: ffffc90022587dc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928064] 3 locks held by kworker/u64:26/2670584:
 # [37970.928068] 2 locks held by kworker/u64:27/2671915:
 # [37970.928072]  #0: ffff88862ad5e958 ((wq_completion)scoutfs_net_client#10){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.928094]  #1: ffffc9002617fdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928114] 2 locks held by kworker/0:3/2674586:
 # [37970.928118]  #0: ffff888100063558 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.928138]  #1: ffffc90006e47dc0 ((work_completion)(&(&inf->orphan_scan_dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928157] 2 locks held by kworker/u64:28/2678228:
 # [37970.928161]  #0: ffff88818dcce958 ((wq_completion)scoutfs_net_server){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.928180]  #1: ffffc9000c1b7dc0 ((work_completion)(&conn->listen_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928200] 2 locks held by kworker/u64:29/2678229:
 # [37970.928203]  #0: ffff888b338f3958 ((wq_completion)scoutfs_net_accepted){+.+.}-{0:0}, at: process_one_work+0xefc/0x1600
 # [37970.928223]  #1: ffffc9002698fdc0 ((work_completion)(&conn->recv_work)){+.+.}-{0:0}, at: process_one_work+0x7ea/0x1600
 # [37970.928243] 1 lock held by rm/2687643:
 # [37970.928247]  #0: ffff88837c25e480 (sb_writers#17){.+.+}-{0:0}, at: do_unlinkat+0x14e/0x550
 # 
 # [37970.928274] =============================================
 #

chriskirby1 · 2025-06-12T18:11:07Z

I'm assuming this is still a debug kernel. When I looked at how many extents we were removing during each pass, the number fluctuated pretty wildly on a debug kernel. It was often taking 10+ seconds just to see the trans_seq change. Sometimes it was > 20s. And the worker thread wasn't doing a huge number of extent removals during that time- things just seemed to get very slow for a bit. Sometimes we would only remove 16 extents during that interval. Other times, it was tens of thousands.

On normal el95, it was a lot more consistent. I see the trans_seq moving forward pretty much once a second (tracing is only reporting at 1s granularity). To me that seems like we're just seeing the effects of the debug kernel, but if you think it's more than that, I'll investigate.

zabbo · 2025-06-12T18:19:00Z

I'm concerned that we still have not solved the actual issue here.

There can be more than one issue that's causing hung tasks. This can be a fine solution to its part while leaving more work to do. Tracing would show what lock they were waiting on and what's happening with it.

I'd guess that the server commits are insanely slow, and they're in the path of forward progress on locks. Clients won't send invalidation responses to release write locks until they commit their dirty transaction.

The entire point of the test is to create dense client-side allocation btrees full of fragmented extent items that then cause a block of IO on the server per item in the client's trees. A few hundred items per block is a crazy iop amplification factor for a commit that can have thousands of blocks.

I'd try an experiment that limits the amount of work that try_drain_data_freed() will do before giving up and letting its caller send the commit response.

aversecat · 2025-06-12T18:23:54Z

There can be more than one issue that's causing hung tasks.

I think that's what I should have written in the first round. This commit itself looks in order to me. I suspect you're correct on server commits being the bottleneck here.

zabbo

Hopefully the orphan inode group binning will mean we don't have to return to this, but if we do, I wanted to get my thoughts down while they were fresh.

.

zabbo

OK, this time I think I actually managed to click all the right buttons to provide review comments! As we've discussed, though, hopefully the orphan scanner inode sharding will mean we don't need to bother with this. We'll see.

zabbo · 2025-06-13T16:26:51Z

kmod/src/data.c

 int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
-				u64 ino, u64 iblock, u64 last, bool offline,
-				struct scoutfs_lock *lock)
+				u64 ino, u64 *iblock, u64 last, bool offline,


I think this function has gotten to the point where it's no longer helpful shared code with some differences, but acctually two different things interleaved with conditionals. Instead of also letting it make partial progress, let's split it into two functions: this that truncates live inodes and requires the inode pointer, and a new _data_free_items that is specifically built to remove a transaction's worth of extents and then return. It wouldn't need any of the arguments -- it'd always start at 0 and remove whatever it finds.

zabbo · 2025-06-13T16:29:40Z

kmod/src/ioctl.c

 							  scoutfs_ino(inode),
-							  sblock, U64_MAX,
-							  false, lock);
+							  &sblock, U64_MAX,


These changes make me nervous. This used to obviously always truncate (s,e) and (s, ~0). Now it probably truncates from (s, e) and (e, ~0), but we'd have to go diving into the implementation to make sure there isn't some accidental final increment before returning that means it actually truncates (e + 1, ~0), or whatever. Let's avoid this pattern as much as possible. (It's why truncate_extent_items() returns the block number instead of updating an arg. This lesson was learned over and over in the kernel with the *ppos arguments to write functions that sometimes updated write pointers, or not, and was a nightmare.)

zabbo · 2025-06-13T16:32:21Z

kmod/src/inode.c

 	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
 	scoutfs_unlock(sb, orph_lock, SCOUTFS_LOCK_WRITE_ONLY);

+	if (more)


Adding this loop without clearing more at the start makes me nervous. It certainly looks like it'll loop forever if any of the errors before more is cleared are returned (perhaps from shutdown or forced unmount returning blanket errors from net/io paths). I'd rather not bother with this additional loop and would instead make sure to do enough work per iteration that we don't mind returning to the caller and letting them use their existing retry mechanisms.

chriskirby1 added the fixes ci failure label Jun 9, 2025

chriskirby1 force-pushed the clk/truncate_breakup branch from 497372d to 809dcde Compare June 11, 2025 14:38

chriskirby1 requested review from aversecat and zabbo June 11, 2025 17:34

chriskirby1 force-pushed the clk/truncate_breakup branch from 809dcde to 417261a Compare June 11, 2025 18:24

zabbo previously requested changes Jun 12, 2025

View reviewed changes

zabbo self-requested a review June 12, 2025 21:59

zabbo requested changes Jun 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do orphaned inode extent freeing in chunks#231

Do orphaned inode extent freeing in chunks#231
chriskirby1 wants to merge 1 commit intomainfrom
clk/truncate_breakup

chriskirby1 commented Jun 9, 2025

Uh oh!

chriskirby1 commented Jun 10, 2025

Uh oh!

chriskirby1 commented Jun 11, 2025

Uh oh!

aversecat commented Jun 11, 2025

Uh oh!

aversecat commented Jun 12, 2025

Uh oh!

chriskirby1 commented Jun 12, 2025

Uh oh!

zabbo commented Jun 12, 2025

Uh oh!

aversecat commented Jun 12, 2025 via email

Uh oh!

zabbo left a comment

Uh oh!

zabbo left a comment

Uh oh!

zabbo Jun 13, 2025

Uh oh!

zabbo Jun 13, 2025

Uh oh!

zabbo Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chriskirby1 commented Jun 9, 2025

Uh oh!

chriskirby1 commented Jun 10, 2025

Uh oh!

chriskirby1 commented Jun 11, 2025

Uh oh!

aversecat commented Jun 11, 2025

Uh oh!

aversecat commented Jun 12, 2025

Uh oh!

chriskirby1 commented Jun 12, 2025

Uh oh!

zabbo commented Jun 12, 2025

Uh oh!

aversecat commented Jun 12, 2025 via email

Uh oh!

zabbo left a comment

Choose a reason for hiding this comment

Uh oh!

zabbo left a comment

Choose a reason for hiding this comment

Uh oh!

zabbo Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

zabbo Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

zabbo Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants