HDDS-14592. Reduce lock contention in AbstractLayoutVersionManager#9840
HDDS-14592. Reduce lock contention in AbstractLayoutVersionManager#9840Russole wants to merge 6 commits intoapache:masterfrom
Conversation
|
Hi @adoroszlai, @yandrey, @jojochuang, |
|
@yandrey321 please take a look |
There was a problem hiding this comment.
Pull request overview
This PR refactors the layout version manager implementation to reduce lock contention by switching from explicit locking to immutable state snapshots with atomic updates, and adjusts OM initialization error handling accordingly.
Changes:
- Replaces
ReentrantReadWriteLockusage inAbstractLayoutVersionManagerwith an immutableStateheld in anAtomicReferenceand publishes feature maps as unmodifiable after init. - Adds stricter initialization validation (eg, empty feature set; MLV > SLV) and makes
finalized()lock-free and idempotent via CAS retry. - Updates
OMLayoutVersionManagererror handling to avoid reading manager state before initialization completes/fails (MLV > SLV case).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/upgrade/OMLayoutVersionManager.java | Improves initialization exception messaging without depending on possibly-uninitialized LVM state. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/upgrade/AbstractLayoutVersionManager.java | Introduces atomic, immutable state and unmodifiable feature maps to eliminate explicit locking and reduce contention. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| private State withMlv(int newMlv) { | ||
| return new State(newMlv, softwareLayoutVersion, computeStatus(newMlv, softwareLayoutVersion)); |
There was a problem hiding this comment.
State.withMlv() recomputes currentUpgradeState via computeStatus(), so each call to finalized() can overwrite FINALIZATION_IN_PROGRESS / STARTING_FINALIZATION / FINALIZATION_DONE back to FINALIZATION_REQUIRED (until MLV==SLV) or ALREADY_FINALIZED. This breaks upgrade progress reporting and can re-trigger finalize flows that key off getUpgradeState()==FINALIZATION_REQUIRED (eg datanode finalize command handler). Consider preserving the existing upgrade status when advancing MLV (eg carry over cur.currentUpgradeState), and only derive the initial status during init().
| return new State(newMlv, softwareLayoutVersion, computeStatus(newMlv, softwareLayoutVersion)); | |
| // Preserve the existing upgrade state when advancing MLV. | |
| // Initial status should be derived when the State is created (e.g. during init()). | |
| return new State(newMlv, softwareLayoutVersion, currentUpgradeState); |
| protected void init(int version, T[] lfs) throws IOException { | ||
| Preconditions.checkArgument(initialized.compareAndSet(false, true), | ||
| "LayoutVersionManager is already initialized."); | ||
| final TreeMap<Integer, LayoutFeature> localFeatures = new TreeMap<>(); | ||
| final Map<String, LayoutFeature> localFeatureMap = new HashMap<>(); |
There was a problem hiding this comment.
init() sets initialized=true before publishing state via state.set(...). If the LayoutVersionManager instance is visible to other threads during init, getters will hit requireState()==null and throw even though initialization is in progress (previously the RW lock would block readers until init completed). Consider using state itself as the initialization guard (eg compareAndSet(null, newState) at the end, or an IN_PROGRESS sentinel), or only flipping the initialized flag after state is published.
peterxcli
left a comment
There was a problem hiding this comment.
Curious which components or feature benefit from this change, and why they need such high concurrency CAS? Thanks!
| public void close() { | ||
| if (mBean != null) { | ||
| MBeans.unregister(mBean); | ||
| ObjectName bean = mBean; |
There was a problem hiding this comment.
this is not guaranteed to be correct without a synchronized close, or declare mBean as a volatile.
There was a problem hiding this comment.
Good point. Updated mBean to be volatile.
|
Thanks @peterxcli for the discussion. It may also be helpful to reference: I agree that having concrete numbers is important, and that microbenchmarks without clear evidence may add unnecessary complexity without clear benefit. Before we have benchmarking results, it seems the main benefit of this change is in simplifying and centralizing concurrency handling, rather than improving performance through higher concurrency. |
Actually I think CAS is harder to understand the logic compared to a simple rw lock and is error-prone. But it might be I don't have enough context. |
So it seems there is an opportunity to no involve any lock or atomic variable in normal path(except finalization) |
What changes were proposed in this pull request?
AbstractLayoutVersionManager.StatewithAtomicReferencefor lock-free state transitions.finalized()and retain idempotent semantics.LayoutVersionManagerstate before it is initialized (MLV > SLV case).What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14592
How was this patch tested?