Skip to content

Clk/cluster lock spinlock#218

Open
chriskirby1 wants to merge 7 commits intomainfrom
clk/cluster_lock_spinlock
Open

Clk/cluster lock spinlock#218
chriskirby1 wants to merge 7 commits intomainfrom
clk/cluster_lock_spinlock

Conversation

@chriskirby1
Copy link
Contributor

No description provided.

@chriskirby1 chriskirby1 requested review from aversecat and zabbo May 5, 2025 20:41
@chriskirby1 chriskirby1 force-pushed the clk/cluster_lock_spinlock branch 3 times, most recently from 07fe04e to 4fce641 Compare May 9, 2025 18:34
Copy link
Collaborator

@zabbo zabbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, add your Signed-off-bys for the patches you contributed to. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n409

Then just a few relatively minor things to fix. Though minor, they're pretty subtle so that gives me some confidence that the big moving pieces are strong.

@chriskirby1 chriskirby1 force-pushed the clk/cluster_lock_spinlock branch 10 times, most recently from f1c9318 to 2c9e54a Compare May 14, 2025 14:41
@chriskirby1
Copy link
Contributor Author

retest

@chriskirby1 chriskirby1 force-pushed the clk/cluster_lock_spinlock branch from 2c9e54a to 0fbc144 Compare May 27, 2025 20:15
zabbo added 7 commits October 31, 2025 15:38
Cluster lock invalidation and shrinking have very similar work flows.
They rarely modify the state of locks and put them on lists for work to
process.  Today the lists and state modification are protected by the
mount-wide lock_info spinlock, which we want to break up.

This creates a little work_list struct that has a work_queue, list, and
lock.   Invalidation and shrinking use this to track locks that are
being processed and protect the list with the new spinlock in the
struct.

This leaves some awkward nesting with the lock_info spinlock because it
still protects invididual lock state.  That will be fixed as we move
towards individual lock refcounting and spinlocks.

Signed-off-by: Zach Brown <zab@versity.com>
Add a spinlock to the scoutfs_lock cluster lock which protects its
state.   This replaces the use of the mount-wide lock_info spinlock.

In practice, for now, this largely just mirrors the continued use of the
lock_info spinlock because it's still needed to protect the mount-wide
structures that are used during put_lock.   That'll be fixed in future
patches as the use of global structures is reduced.

Signed-off-by: Zach Brown <zab@versity.com>
The first pass at managing the cluster lock state machine used a simple
global spinlock.  It's time to break it up.

This adds refcounting to the cluster lock struct.  Rather than managing
global data structures and individual lock state all under a global
spinlock, we use per-structure locks, a lock spinlock, and a lock
refcount.

Active users of the cluster lock hold a reference.  This primarily lets
unlock only check global structures once the refcounts say that it's
time to remove the lock from the structures.  The careful use of the
refcount to avoid locks that are being freed during lookup also paves
the way for using mostly read-only RCU lookup structures soon.

The global LRU is still modified on every lock use, that'll also be
removed up in future work.

The linfo spinlock is now only used for the LRU and lookup structures.
Other uses are removed, which causes more careful use of the finer
grained locks that initially just mirrored the use of the linfo spinlock
to keep those introduction patches safe.

The move from a single global lock to more fine grained locks creates
nesting that needs to be managed.  Shrinking and recovery in particular
need to be careful as they transition from spinlocks used to find
cluster locks to getting the cluster lock spinlock.

The presence of freeing locks in the lookup indexes means that some
callers need to retry if they hit freeing locks.  We have to add this
protection to recovery iterating over locks by their key value, but it
wouldn't have made sense to build that around the lookup rbtree as its
going away.  It makes sense to use the range tree that we're going to
keep using to make sure we don't accidentally introduce locks whose
ranges overlap (which would lead to item cache inconsistency).

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Chris Kirby <ckirby@versity.com>
The previous work we did introduce the per-lock spinlock and the
refcount now make it easy to switch from an rbtree protected by a
spinlock to a hash table protected by RCU read critical sections.
The cluster lock lookup fast path now only dirties fields in the
scoutfs_lock struct itself.

We have to be a little careful when inserting so that users can't get
references to locks that made it into the hash table but which then had
to be removed because they were found to overlap.

Freeing is straight forward and we only have to make sure to free the
locks in RCU grace periods so that read sections can continue to
reference the memory and see the refcount that indicates that the locks
are freeing.

A few remaining places were using the lookup rbtree to walk all locks,
they're converted to using the range tree that we're keeping around to
resolve overlapping ranges but which is also handy for iteration that
isn't performance sensitive.

The LRU still does create contention on the linfo spinlock on every
lookup, fixing that is next.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Chris Kirby <ckirby@versity.com>
Currently we maintain a single LRU list of cluster locks and every time
we acquire a cluster lock we move it to the head of the LRU, creating
significant contention acquiring the spinlock that protects the LRU
list.

This moves to two LRU lists, a list of cluster locks ready to be
reclaimed and one for locks that are in active use.  We mark locks with
which list they're on and only move them to the active list if they're
on the reclaim list.  We track imbalance between the two lists so that
they're always roughly the same size.

This removes contention maintaining a precise LRU amongst a set of
active cluster locks.  It doesn't address contention creating or
removing locks, which are already very expensive operations.

It also loses strict ordering by access time.  Reclaim has to make it
through the oldest half of locks before getting to the newer half,
though there is no guaranteed ordering amogst the newest half.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Chris Kirby <ckirby@versity.com>
We had a little helper that scheduled work after testing the list, which
required holding the spinlock.  This was a little too crude and required
scoutfs_unlock() acquiring the invalidate work list spinlock even though
it already had the cluster lock spinlock held and could see that there
are invalidate requests pending and should queue the invalidation work.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Zach Brown <zab@versity.com>
@chriskirby1 chriskirby1 force-pushed the clk/cluster_lock_spinlock branch from 0fbc144 to 96049fe Compare November 3, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants