Skip to content

DOCTEAM-1802: Local VG on shared storage#461

Open
tahliar wants to merge 4 commits intomainfrom
DOCTEAM-1802-exclusive-vg-lv
Open

DOCTEAM-1802: Local VG on shared storage#461
tahliar wants to merge 4 commits intomainfrom
DOCTEAM-1802-exclusive-vg-lv

Conversation

@tahliar
Copy link
Collaborator

@tahliar tahliar commented Jan 30, 2026

PR creator: Description

Restructured the CLVM steps to include a better procedure for exclusive activation mode.

PDF:
cha-ha-clvm_en.pdf

PR creator: Are there any relevant issues/feature requests?

  • bsc#1241273
  • jsc#DOCTEAM-1802

PR creator: Which product versions do the changes apply to?

When opening a PR, check all versions of the documentation that your PR applies to.

  • SLE-HA 15
    • 15 SP7 (current main, no backport necessary)
    • 15 SP6
    • 15 SP5
    • 15 SP4
    • 15 SP3
  • SLE-HA 12
    • 12 SP5

PR reviewer: Checklist

The doc team member merging your PR will take care of the following tasks and will tick the boxes if done.

<listitem>
<para>
A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN,
or DRBD*, for example. To create a volume group, you must have at least two shared disks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“To create a volume group, you must have at least two shared disks.”

This wording is too strong. During administration operations, sometimes a VG might only have one physical disk and may only have one LV, technically. Though, the common use cases are multiple physical disks and multiple LVs.

The term <quote>&clvm;</quote> indicates that LVM is being used in a cluster environment.
When managing shared storage on a cluster, every node must be informed
about changes to the storage subsystem. Logical Volume
Manager (LVM) supports transparent management of volume groups
Copy link

@jiri-belka jiri-belka Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"supports transparent management of volume groups" - does it? I'm not such an expert but with 'system_id', 'tags' it is only about "blocking" activation; for shared VG, it is about "locking" over - in this context - lvmlockd over DLM. Note, there's no VG metadata sync, such data live in the shared physical block device but are protected by locks and when a lock is released, IIUC, a node "invalidates" VG "metadata".

So, maybe the wording could be different here... ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the challenging fact is, there's cached LVM metadata in each node's memory. And lvmlockd exactly ensures the consistency of that so that it prevents any conflicting views and race conditions between the nodes when volume creating, extending, or removing and so on is happening on one node ....

I think that's also the reason why the vgcreate/lvcreate commands are done while dlm & lvmlockd are running in the existing demonstration.

A volume group is a storage pool of multiple physical
disks. A logical volume belongs to a volume group, and can be seen as an
elastic volume on which you can create a file system. In a cluster environment,
shared VGs consist of shared storage and can be used concurrently by multiple nodes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that shared VG is a specific term, a concept, for VGs with lvmlockd support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in here it should refer to VGs that can be reached/accessed by multiple nodes either exclusively or concurrently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I see the argument here. With the historical context I mentioned yesterday in bsc#1241273#c8. Section 28 was scoped to be lvmlockd related only in the past. That's why the last sentence says "can be used concurrently". Sounds it makes sense to me to move this last sentence elsewhere, for example, under LVM-activate conceptual overview.

Given that, we want to incorporate "HA LVM" into DOC. Maybe we should clarify these two concepts explicitly:

  • "HA LVM" is meant to be scoped with the local LVM activation, which is naturally indicated the exclusive use of it, example, the local filesystem.

  • "Cluster LVM" is meant to be scoped with using lvmlockd to activate LVM. That is often used to activate LVM concurrently on multiple nodes for the cluster filesystem for concurrent IO. Also there are use cases to activate LVM on multiple nodes to minimize the activation time cost when switchingn over the application stack on top of it, for example, as the KVM backstores. The "exclusive" activation mode is also provided for the advanced use cases too, for example to prevent LVM activation without lvmlockd stack.

Along with that, it brings up yet another point. Maybe it makes sense to change the title of section #28, something like,

from: "28 Cluster logical volume manager (Cluster LVM)"

to: "28 LVM High Availability"

<itemizedlist>
<listitem>
<para>
Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>.
Copy link

@jiri-belka jiri-belka Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right but a wide-used example is LV as disk for a VM in KVM hosts cluster. That is, it is not only about filesystem. Exclusive mode is dedicated to uses where no cluster-wide "synchronization" of access exists, for example for non-clustered filesystems, like ext4, xfs etc...

Ah, I see this is taken from ocf_heartbeat_LVM-activate(7),...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the "exclusive" mode does implicitly refer to be 'lvmlockd' access mode only. It is not required for "HA LVM".

In regarding to KVM use causes, it can be complex. Its backstore can use "Cluster LVM" "shared" and "exclusive", or the plain "HA LVM".

<term>&dlm.long; (&dlm;)</term>
<listitem>
<para>
&dlm; coordinates access to shared resources among multiple nodes through

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLM - iiuc - doesn't coordinates access to shared resources, it does "coordinate" locking, that is, access to eg. files on a shared clustered filesystem has nothing to do with DLM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems be confusion about the terminology here. I believe "resources" here is not referring to "cluster resources managed by pacemaker". It has to mean "shared objects" that are supposed to be protected by DLM, for instance, shared LVM metadata, metadata/files of a cluster filesystem ...

<listitem>
<para>
A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN,
or DRBD*, for example. To create a volume group, you must have at least two shared disks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/storage device/block device - imo; NVME is a way forward, one could say as it is faster than SCSI (note, LUN is SCSI only concept).

DRBD is irrelevant here IMO, but yes technically it might be possible but using DRBD here contradicts "[a] shared device" since DRBD is always local because DRBD equals "distributed block device", that is, a local block devices that "synces" over TCP/IP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to mention NVME, we should be precise that it's "NVMe-oF".

Nothing wrong with listing DRBD here since it's always meant to be a shared block device over TCP/IP.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DRBD device, of course, could be a shared device between nodes. I'd suggest that do not mention DRBD here to avoid the complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to drop DRBD here.

Even, in HA 16.0, I propose to drop the current "28.5 Scenario: Cluster LVM with DRBD". That is a very old use case. We can add it back if a customer ask for it in HA 16.0. @tahliar

</listitem>
<listitem>
<para>
<xref linkend="sec-ha-clvm-config-resources-exclusive"/>. Use this mode for local

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, can be raw LV without a filesystem, eg. LV for VM disk.

<para>
<xref linkend="sec-ha-clvm-config-resources-exclusive"/>. Use this mode for local
file systems like <systemitem>ext4</systemitem>. This is useful for active/passive
clusters where only one node at a time is active.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is active/has this resource active/

</step>
<step>
<para>
Open the <filename>/etc/lvm/lvm.conf</filename> file.
Copy link

@jiri-belka jiri-belka Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is wrong, use /etv/lvm/lvmlocal.conf since the "global" lvm.conf might be globally managed.

jb155sapqe01:~ # grep -Pv '^\s*($|#)' /etc/lvm/lvmlocal.conf 
global {
        system_id_source="lvmlocal"
}
local {
	system_id = "uname"
}
jb155sapqe01:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what the user wants the VG system_id to be:

  1. If the user wants to specify the system_id to be "foo", then
global {
        system_id_source="lvmlocal"
}
local {
	system_id = "foo"
}

The value system_id in /etc/lvm/lvmlocal.conf is treated as a string

sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"

sp7-2:~ # vgcreate test /dev/vdb
  Volume group "test" successfully created with system ID uname

sp7-2:~ # vgdisplay test
  --- Volume group ---
  VG Name               test
  System ID             uname
  Format                lvm2
...
  1. If the user wants to specify the system_id to be the output of uname, then
global {
        system_id_source="uname"
}

is enough.
e.g.

sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="uname"
  Configuration node local/system_id not found
sp7-2:~ # vgcreate test2 /dev/vdc
  Volume group "test2" successfully created with system ID sp7-2

</para>
<screen>&prompt.root;<command>crm cluster copy /etc/lvm/lvm.conf</command></screen>
<para>
This allows the VG to move to another node if the active node fails.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, what "allows"? lvm.conf ?

</sect1>

<sect2 xml:id="sec-ha-clvm-scenario-drbd">
<sect1 xml:id="sec-ha-clvm-scenario-drbd">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this from here for complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, agree too, as said earlier.

Copy link
Contributor

@gao-yan gao-yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tahliar !

Made some points in below.

<para>
Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>.
In this mode, the LV can only be activated on one node at a time. Access to the LV is
restricted using the <systemitem>system_id</systemitem>. For more information, see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the technical fact we talked about , maybe something like this should be more precise:

"Exclusive access to the LV can be achieved through using lvmlockd or system_id".

<term>&dlm.long; (&dlm;)</term>
<listitem>
<para>
&dlm; coordinates access to shared resources among multiple nodes through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems be confusion about the terminology here. I believe "resources" here is not referring to "cluster resources managed by pacemaker". It has to mean "shared objects" that are supposed to be protected by DLM, for instance, shared LVM metadata, metadata/files of a cluster filesystem ...

The term <quote>&clvm;</quote> indicates that LVM is being used in a cluster environment.
When managing shared storage on a cluster, every node must be informed
about changes to the storage subsystem. Logical Volume
Manager (LVM) supports transparent management of volume groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the challenging fact is, there's cached LVM metadata in each node's memory. And lvmlockd exactly ensures the consistency of that so that it prevents any conflicting views and race conditions between the nodes when volume creating, extending, or removing and so on is happening on one node ....

I think that's also the reason why the vgcreate/lvcreate commands are done while dlm & lvmlockd are running in the existing demonstration.

A volume group is a storage pool of multiple physical
disks. A logical volume belongs to a volume group, and can be seen as an
elastic volume on which you can create a file system. In a cluster environment,
shared VGs consist of shared storage and can be used concurrently by multiple nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in here it should refer to VGs that can be reached/accessed by multiple nodes either exclusively or concurrently?

<listitem>
<para>
A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN,
or DRBD*, for example. To create a volume group, you must have at least two shared disks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to mention NVME, we should be precise that it's "NVMe-oF".

Nothing wrong with listing DRBD here since it's always meant to be a shared block device over TCP/IP.

</para>
<screen>&prompt.root;<command>lvcreate -an -L10G -n lv1 vg1</command></screen>
</step>
</procedure>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BE AWARE of the dangerous situation here: in this case at this point, since LVM metadata is not protected by lvmlockd, the other nodes likely remain completely unaware that the on-disk metadata has changed because they are still using the old disk layout cached in their memory. So all the nodes should better reboot to obtain the latest metadata.

Otherwise, I'm not sure if it'd make sense to suggest something like the following to be performed on all the other nodes for them to scan and refresh the metadata. Some say even this cannot guarantee the consistency:

pvscan --cache: Rescans the Physical Volumes.

vgscan --cache: Rescans the Volume Groups.

lvscan: Refreshes the status of the Logical Volumes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to add those steps as a backup, in case someone can't reboot the nodes right now. But maybe I could suggest also rebooting the nodes later, at a convenient time?

<para>
Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>.
In this mode, the LV can only be activated on one node at a time. Access to the LV is
restricted using the <systemitem>system_id</systemitem>. For more information, see
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: If the parameter lvname is not specified, all the LVs in the VG will be activated. So here should be LV(s).

--
Glass

<step>
<para>
Configure a <systemitem>lvmlockd</systemitem> resource as follows:
Configure an <systemitem>lvmlockd</systemitem> resource as follows:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: why 'an' ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure @tahliar made it correct here :-)

The rule for using "a" vs. "an" depends on how we pronounce the first letter (el for l in lvmlockd ), not the letter itself.

<listitem>
<para>
A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN,
or DRBD*, for example. To create a volume group, you must have at least two shared disks.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DRBD device, of course, could be a shared device between nodes. I'd suggest that do not mention DRBD here to avoid the complexity.

</step>
<step>
<para>
Open the <filename>/etc/lvm/lvm.conf</filename> file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what the user wants the VG system_id to be:

  1. If the user wants to specify the system_id to be "foo", then
global {
        system_id_source="lvmlocal"
}
local {
	system_id = "foo"
}

The value system_id in /etc/lvm/lvmlocal.conf is treated as a string

sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"

sp7-2:~ # vgcreate test /dev/vdb
  Volume group "test" successfully created with system ID uname

sp7-2:~ # vgdisplay test
  --- Volume group ---
  VG Name               test
  System ID             uname
  Format                lvm2
...
  1. If the user wants to specify the system_id to be the output of uname, then
global {
        system_id_source="uname"
}

is enough.
e.g.

sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="uname"
  Configuration node local/system_id not found
sp7-2:~ # vgcreate test2 /dev/vdc
  Volume group "test2" successfully created with system ID sp7-2

params vgname=vg1 vg_access_mode=lvmlockd \
op start timeout=90s interval=0 \
op stop timeout=90s interval=0 \
op monitor interval=30s timeout=90s</command></screen>
Copy link
Contributor

@gao-yan gao-yan Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer still as well showcasing how to achieve exclusive mode with lvmlockd, in case that certain users want to consistently use lvmlockd for their setup with LVs in different modes, or favor the benefits of lvmlockd over system_id anyway.

Of course the title of this section could something like "Configuring Cluster LVM using lvmlockd".

And the next section could be "Configuring LVM using system_id". The description will tell system_id only supports exclusive mode then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants