Clk/cluster lock spinlock by chriskirby1 · Pull Request #218 · versity/scoutfs

chriskirby1 · 2025-05-05T20:41:50Z

No description provided.

kmod/src/lock.c

zabbo

First, add your Signed-off-bys for the patches you contributed to. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n409

Then just a few relatively minor things to fix. Though minor, they're pretty subtle so that gives me some confidence that the big moving pieces are strong.

kmod/src/lock.c

chriskirby1 · 2025-05-21T20:32:22Z

retest

Cluster lock invalidation and shrinking have very similar work flows. They rarely modify the state of locks and put them on lists for work to process. Today the lists and state modification are protected by the mount-wide lock_info spinlock, which we want to break up. This creates a little work_list struct that has a work_queue, list, and lock. Invalidation and shrinking use this to track locks that are being processed and protect the list with the new spinlock in the struct. This leaves some awkward nesting with the lock_info spinlock because it still protects invididual lock state. That will be fixed as we move towards individual lock refcounting and spinlocks. Signed-off-by: Zach Brown <zab@versity.com>

Add a spinlock to the scoutfs_lock cluster lock which protects its state. This replaces the use of the mount-wide lock_info spinlock. In practice, for now, this largely just mirrors the continued use of the lock_info spinlock because it's still needed to protect the mount-wide structures that are used during put_lock. That'll be fixed in future patches as the use of global structures is reduced. Signed-off-by: Zach Brown <zab@versity.com>

The first pass at managing the cluster lock state machine used a simple global spinlock. It's time to break it up. This adds refcounting to the cluster lock struct. Rather than managing global data structures and individual lock state all under a global spinlock, we use per-structure locks, a lock spinlock, and a lock refcount. Active users of the cluster lock hold a reference. This primarily lets unlock only check global structures once the refcounts say that it's time to remove the lock from the structures. The careful use of the refcount to avoid locks that are being freed during lookup also paves the way for using mostly read-only RCU lookup structures soon. The global LRU is still modified on every lock use, that'll also be removed up in future work. The linfo spinlock is now only used for the LRU and lookup structures. Other uses are removed, which causes more careful use of the finer grained locks that initially just mirrored the use of the linfo spinlock to keep those introduction patches safe. The move from a single global lock to more fine grained locks creates nesting that needs to be managed. Shrinking and recovery in particular need to be careful as they transition from spinlocks used to find cluster locks to getting the cluster lock spinlock. The presence of freeing locks in the lookup indexes means that some callers need to retry if they hit freeing locks. We have to add this protection to recovery iterating over locks by their key value, but it wouldn't have made sense to build that around the lookup rbtree as its going away. It makes sense to use the range tree that we're going to keep using to make sure we don't accidentally introduce locks whose ranges overlap (which would lead to item cache inconsistency). Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Chris Kirby <ckirby@versity.com>

The previous work we did introduce the per-lock spinlock and the refcount now make it easy to switch from an rbtree protected by a spinlock to a hash table protected by RCU read critical sections. The cluster lock lookup fast path now only dirties fields in the scoutfs_lock struct itself. We have to be a little careful when inserting so that users can't get references to locks that made it into the hash table but which then had to be removed because they were found to overlap. Freeing is straight forward and we only have to make sure to free the locks in RCU grace periods so that read sections can continue to reference the memory and see the refcount that indicates that the locks are freeing. A few remaining places were using the lookup rbtree to walk all locks, they're converted to using the range tree that we're keeping around to resolve overlapping ranges but which is also handy for iteration that isn't performance sensitive. The LRU still does create contention on the linfo spinlock on every lookup, fixing that is next. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Chris Kirby <ckirby@versity.com>

Currently we maintain a single LRU list of cluster locks and every time we acquire a cluster lock we move it to the head of the LRU, creating significant contention acquiring the spinlock that protects the LRU list. This moves to two LRU lists, a list of cluster locks ready to be reclaimed and one for locks that are in active use. We mark locks with which list they're on and only move them to the active list if they're on the reclaim list. We track imbalance between the two lists so that they're always roughly the same size. This removes contention maintaining a precise LRU amongst a set of active cluster locks. It doesn't address contention creating or removing locks, which are already very expensive operations. It also loses strict ordering by access time. Reclaim has to make it through the oldest half of locks before getting to the newer half, though there is no guaranteed ordering amogst the newest half. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Chris Kirby <ckirby@versity.com>

We had a little helper that scheduled work after testing the list, which required holding the spinlock. This was a little too crude and required scoutfs_unlock() acquiring the invalidate work list spinlock even though it already had the cluster lock spinlock held and could see that there are invalidate requests pending and should queue the invalidation work. Signed-off-by: Zach Brown <zab@versity.com>

Signed-off-by: Zach Brown <zab@versity.com>

chriskirby1 requested review from aversecat and zabbo May 5, 2025 20:41

aversecat reviewed May 5, 2025

View reviewed changes

kmod/src/lock.c Outdated Show resolved Hide resolved

chriskirby1 force-pushed the clk/cluster_lock_spinlock branch 3 times, most recently from 07fe04e to 4fce641 Compare May 9, 2025 18:34

zabbo requested changes May 12, 2025

View reviewed changes

kmod/src/lock.c Outdated Show resolved Hide resolved

kmod/src/lock.c Outdated Show resolved Hide resolved

kmod/src/lock.c Outdated Show resolved Hide resolved

chriskirby1 force-pushed the clk/cluster_lock_spinlock branch 10 times, most recently from f1c9318 to 2c9e54a Compare May 14, 2025 14:41

chriskirby1 force-pushed the clk/cluster_lock_spinlock branch from 2c9e54a to 0fbc144 Compare May 27, 2025 20:15

zabbo added 7 commits October 31, 2025 15:38

Update tracing with cluster lock changes

96049fe

Signed-off-by: Zach Brown <zab@versity.com>

chriskirby1 force-pushed the clk/cluster_lock_spinlock branch from 0fbc144 to 96049fe Compare November 3, 2025 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clk/cluster lock spinlock#218

Clk/cluster lock spinlock#218
chriskirby1 wants to merge 7 commits intomainfrom
clk/cluster_lock_spinlock

chriskirby1 commented May 5, 2025

Uh oh!

Uh oh!

zabbo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chriskirby1 commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chriskirby1 commented May 5, 2025

Uh oh!

Uh oh!

zabbo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chriskirby1 commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants