Skip to content

HDDS-14592. Reduce lock contention in AbstractLayoutVersionManager#9840

Open
Russole wants to merge 6 commits intoapache:masterfrom
Russole:HDDS-14592
Open

HDDS-14592. Reduce lock contention in AbstractLayoutVersionManager#9840
Russole wants to merge 6 commits intoapache:masterfrom
Russole:HDDS-14592

Conversation

@Russole
Copy link
Copy Markdown
Contributor

@Russole Russole commented Feb 27, 2026

What changes were proposed in this pull request?

  • Remove explicit locking from AbstractLayoutVersionManager.
  • Introduce immutable State with AtomicReference for lock-free state transitions.
  • Convert feature maps read-only after initialization to prevent accidental changes and allow safe concurrent reads.
  • Improve initialization validation and error handling (MLV > SLV case).
  • Remove locking from finalized() and retain idempotent semantics.
  • Fix OM initialization error handling to avoid accessing LayoutVersionManager state before it is initialized (MLV > SLV case).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14592

How was this patch tested?

@adoroszlai adoroszlai changed the title HDDS-14592. Reduce lock contension in AbstractLayoutVersionManager HDDS-14592. Reduce lock contention in AbstractLayoutVersionManager Feb 27, 2026
@Russole
Copy link
Copy Markdown
Contributor Author

Russole commented Feb 28, 2026

Hi @adoroszlai, @yandrey, @jojochuang,
Could you please take a look at this PR when you have time?
Thanks!

@adoroszlai
Copy link
Copy Markdown
Contributor

@yandrey321 please take a look

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the layout version manager implementation to reduce lock contention by switching from explicit locking to immutable state snapshots with atomic updates, and adjusts OM initialization error handling accordingly.

Changes:

  • Replaces ReentrantReadWriteLock usage in AbstractLayoutVersionManager with an immutable State held in an AtomicReference and publishes feature maps as unmodifiable after init.
  • Adds stricter initialization validation (eg, empty feature set; MLV > SLV) and makes finalized() lock-free and idempotent via CAS retry.
  • Updates OMLayoutVersionManager error handling to avoid reading manager state before initialization completes/fails (MLV > SLV case).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/upgrade/OMLayoutVersionManager.java Improves initialization exception messaging without depending on possibly-uninitialized LVM state.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/upgrade/AbstractLayoutVersionManager.java Introduces atomic, immutable state and unmodifiable feature maps to eliminate explicit locking and reduce contention.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

private State withMlv(int newMlv) {
return new State(newMlv, softwareLayoutVersion, computeStatus(newMlv, softwareLayoutVersion));
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State.withMlv() recomputes currentUpgradeState via computeStatus(), so each call to finalized() can overwrite FINALIZATION_IN_PROGRESS / STARTING_FINALIZATION / FINALIZATION_DONE back to FINALIZATION_REQUIRED (until MLV==SLV) or ALREADY_FINALIZED. This breaks upgrade progress reporting and can re-trigger finalize flows that key off getUpgradeState()==FINALIZATION_REQUIRED (eg datanode finalize command handler). Consider preserving the existing upgrade status when advancing MLV (eg carry over cur.currentUpgradeState), and only derive the initial status during init().

Suggested change
return new State(newMlv, softwareLayoutVersion, computeStatus(newMlv, softwareLayoutVersion));
// Preserve the existing upgrade state when advancing MLV.
// Initial status should be derived when the State is created (e.g. during init()).
return new State(newMlv, softwareLayoutVersion, currentUpgradeState);

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +101
protected void init(int version, T[] lfs) throws IOException {
Preconditions.checkArgument(initialized.compareAndSet(false, true),
"LayoutVersionManager is already initialized.");
final TreeMap<Integer, LayoutFeature> localFeatures = new TreeMap<>();
final Map<String, LayoutFeature> localFeatureMap = new HashMap<>();
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init() sets initialized=true before publishing state via state.set(...). If the LayoutVersionManager instance is visible to other threads during init, getters will hit requireState()==null and throw even though initialization is in progress (previously the RW lock would block readers until init completed). Consider using state itself as the initialization guard (eg compareAndSet(null, newState) at the end, or an IN_PROGRESS sentinel), or only flipping the initialized flag after state is published.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious which components or feature benefit from this change, and why they need such high concurrency CAS? Thanks!

public void close() {
if (mBean != null) {
MBeans.unregister(mBean);
ObjectName bean = mBean;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not guaranteed to be correct without a synchronized close, or declare mBean as a volatile.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated mBean to be volatile.

@Russole
Copy link
Copy Markdown
Contributor Author

Russole commented Apr 3, 2026

Thanks @peterxcli for the discussion. It may also be helpful to reference:
#9892 (comment)

I agree that having concrete numbers is important, and that microbenchmarks without clear evidence may add unnecessary complexity without clear benefit.

Before we have benchmarking results, it seems the main benefit of this change is in simplifying and centralizing concurrency handling, rather than improving performance through higher concurrency.

@Russole Russole requested a review from jojochuang April 4, 2026 09:37
@peterxcli
Copy link
Copy Markdown
Member

peterxcli commented Apr 4, 2026

simplifying and centralizing concurrency handling

Actually I think CAS is harder to understand the logic compared to a simple rw lock and is error-prone. But it might be I don't have enough context.

@peterxcli
Copy link
Copy Markdown
Member

#9892 (review)

and they all use the same AbstractVersionManager#isAllowed method, for which the only potentially expensive operation I can see is a read lock whose corresponding write lock is only held during finalization.

So it seems there is an opportunity to no involve any lock or atomic variable in normal path(except finalization)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants