Conversation
| <listitem> | ||
| <para> | ||
| A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN, | ||
| or DRBD*, for example. To create a volume group, you must have at least two shared disks. |
There was a problem hiding this comment.
“To create a volume group, you must have at least two shared disks.”
This wording is too strong. During administration operations, sometimes a VG might only have one physical disk and may only have one LV, technically. Though, the common use cases are multiple physical disks and multiple LVs.
| The term <quote>&clvm;</quote> indicates that LVM is being used in a cluster environment. | ||
| When managing shared storage on a cluster, every node must be informed | ||
| about changes to the storage subsystem. Logical Volume | ||
| Manager (LVM) supports transparent management of volume groups |
There was a problem hiding this comment.
"supports transparent management of volume groups" - does it? I'm not such an expert but with 'system_id', 'tags' it is only about "blocking" activation; for shared VG, it is about "locking" over - in this context - lvmlockd over DLM. Note, there's no VG metadata sync, such data live in the shared physical block device but are protected by locks and when a lock is released, IIUC, a node "invalidates" VG "metadata".
So, maybe the wording could be different here... ?
There was a problem hiding this comment.
AFAIK, the challenging fact is, there's cached LVM metadata in each node's memory. And lvmlockd exactly ensures the consistency of that so that it prevents any conflicting views and race conditions between the nodes when volume creating, extending, or removing and so on is happening on one node ....
I think that's also the reason why the vgcreate/lvcreate commands are done while dlm & lvmlockd are running in the existing demonstration.
| A volume group is a storage pool of multiple physical | ||
| disks. A logical volume belongs to a volume group, and can be seen as an | ||
| elastic volume on which you can create a file system. In a cluster environment, | ||
| shared VGs consist of shared storage and can be used concurrently by multiple nodes. |
There was a problem hiding this comment.
Note that shared VG is a specific term, a concept, for VGs with lvmlockd support.
There was a problem hiding this comment.
Maybe in here it should refer to VGs that can be reached/accessed by multiple nodes either exclusively or concurrently?
There was a problem hiding this comment.
Emm, I see the argument here. With the historical context I mentioned yesterday in bsc#1241273#c8. Section 28 was scoped to be lvmlockd related only in the past. That's why the last sentence says "can be used concurrently". Sounds it makes sense to me to move this last sentence elsewhere, for example, under LVM-activate conceptual overview.
Given that, we want to incorporate "HA LVM" into DOC. Maybe we should clarify these two concepts explicitly:
-
"HA LVM" is meant to be scoped with the local LVM activation, which is naturally indicated the exclusive use of it, example, the local filesystem.
-
"Cluster LVM" is meant to be scoped with using lvmlockd to activate LVM. That is often used to activate LVM concurrently on multiple nodes for the cluster filesystem for concurrent IO. Also there are use cases to activate LVM on multiple nodes to minimize the activation time cost when switchingn over the application stack on top of it, for example, as the KVM backstores. The "exclusive" activation mode is also provided for the advanced use cases too, for example to prevent LVM activation without lvmlockd stack.
Along with that, it brings up yet another point. Maybe it makes sense to change the title of section #28, something like,
from: "28 Cluster logical volume manager (Cluster LVM)"
to: "28 LVM High Availability"
| <itemizedlist> | ||
| <listitem> | ||
| <para> | ||
| Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>. |
There was a problem hiding this comment.
You are right but a wide-used example is LV as disk for a VM in KVM hosts cluster. That is, it is not only about filesystem. Exclusive mode is dedicated to uses where no cluster-wide "synchronization" of access exists, for example for non-clustered filesystems, like ext4, xfs etc...
Ah, I see this is taken from ocf_heartbeat_LVM-activate(7),...
There was a problem hiding this comment.
Here the "exclusive" mode does implicitly refer to be 'lvmlockd' access mode only. It is not required for "HA LVM".
In regarding to KVM use causes, it can be complex. Its backstore can use "Cluster LVM" "shared" and "exclusive", or the plain "HA LVM".
| <term>&dlm.long; (&dlm;)</term> | ||
| <listitem> | ||
| <para> | ||
| &dlm; coordinates access to shared resources among multiple nodes through |
There was a problem hiding this comment.
DLM - iiuc - doesn't coordinates access to shared resources, it does "coordinate" locking, that is, access to eg. files on a shared clustered filesystem has nothing to do with DLM.
There was a problem hiding this comment.
There seems be confusion about the terminology here. I believe "resources" here is not referring to "cluster resources managed by pacemaker". It has to mean "shared objects" that are supposed to be protected by DLM, for instance, shared LVM metadata, metadata/files of a cluster filesystem ...
| <listitem> | ||
| <para> | ||
| A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN, | ||
| or DRBD*, for example. To create a volume group, you must have at least two shared disks. |
There was a problem hiding this comment.
s/storage device/block device - imo; NVME is a way forward, one could say as it is faster than SCSI (note, LUN is SCSI only concept).
DRBD is irrelevant here IMO, but yes technically it might be possible but using DRBD here contradicts "[a] shared device" since DRBD is always local because DRBD equals "distributed block device", that is, a local block devices that "synces" over TCP/IP.
There was a problem hiding this comment.
If we are to mention NVME, we should be precise that it's "NVMe-oF".
Nothing wrong with listing DRBD here since it's always meant to be a shared block device over TCP/IP.
There was a problem hiding this comment.
A DRBD device, of course, could be a shared device between nodes. I'd suggest that do not mention DRBD here to avoid the complexity.
There was a problem hiding this comment.
I agree to drop DRBD here.
Even, in HA 16.0, I propose to drop the current "28.5 Scenario: Cluster LVM with DRBD". That is a very old use case. We can add it back if a customer ask for it in HA 16.0. @tahliar
| </listitem> | ||
| <listitem> | ||
| <para> | ||
| <xref linkend="sec-ha-clvm-config-resources-exclusive"/>. Use this mode for local |
There was a problem hiding this comment.
Same as above, can be raw LV without a filesystem, eg. LV for VM disk.
| <para> | ||
| <xref linkend="sec-ha-clvm-config-resources-exclusive"/>. Use this mode for local | ||
| file systems like <systemitem>ext4</systemitem>. This is useful for active/passive | ||
| clusters where only one node at a time is active. |
There was a problem hiding this comment.
s/is active/has this resource active/
| </step> | ||
| <step> | ||
| <para> | ||
| Open the <filename>/etc/lvm/lvm.conf</filename> file. |
There was a problem hiding this comment.
IMO this is wrong, use /etv/lvm/lvmlocal.conf since the "global" lvm.conf might be globally managed.
jb155sapqe01:~ # grep -Pv '^\s*($|#)' /etc/lvm/lvmlocal.conf
global {
system_id_source="lvmlocal"
}
local {
system_id = "uname"
}
jb155sapqe01:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"There was a problem hiding this comment.
It depends on what the user wants the VG system_id to be:
- If the user wants to specify the system_id to be "foo", then
global {
system_id_source="lvmlocal"
}
local {
system_id = "foo"
}
The value system_id in /etc/lvm/lvmlocal.conf is treated as a string
sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"
sp7-2:~ # vgcreate test /dev/vdb
Volume group "test" successfully created with system ID uname
sp7-2:~ # vgdisplay test
--- Volume group ---
VG Name test
System ID uname
Format lvm2
...
- If the user wants to specify the system_id to be the output of
uname, then
global {
system_id_source="uname"
}
is enough.
e.g.
sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="uname"
Configuration node local/system_id not found
sp7-2:~ # vgcreate test2 /dev/vdc
Volume group "test2" successfully created with system ID sp7-2
| </para> | ||
| <screen>&prompt.root;<command>crm cluster copy /etc/lvm/lvm.conf</command></screen> | ||
| <para> | ||
| This allows the VG to move to another node if the active node fails. |
There was a problem hiding this comment.
I don't understand, what "allows"? lvm.conf ?
| </sect1> | ||
|
|
||
| <sect2 xml:id="sec-ha-clvm-scenario-drbd"> | ||
| <sect1 xml:id="sec-ha-clvm-scenario-drbd"> |
There was a problem hiding this comment.
I would drop this from here for complexity.
There was a problem hiding this comment.
Ah, yes, agree too, as said earlier.
| <para> | ||
| Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>. | ||
| In this mode, the LV can only be activated on one node at a time. Access to the LV is | ||
| restricted using the <systemitem>system_id</systemitem>. For more information, see |
There was a problem hiding this comment.
Based on the technical fact we talked about , maybe something like this should be more precise:
"Exclusive access to the LV can be achieved through using lvmlockd or system_id".
| <term>&dlm.long; (&dlm;)</term> | ||
| <listitem> | ||
| <para> | ||
| &dlm; coordinates access to shared resources among multiple nodes through |
There was a problem hiding this comment.
There seems be confusion about the terminology here. I believe "resources" here is not referring to "cluster resources managed by pacemaker". It has to mean "shared objects" that are supposed to be protected by DLM, for instance, shared LVM metadata, metadata/files of a cluster filesystem ...
| The term <quote>&clvm;</quote> indicates that LVM is being used in a cluster environment. | ||
| When managing shared storage on a cluster, every node must be informed | ||
| about changes to the storage subsystem. Logical Volume | ||
| Manager (LVM) supports transparent management of volume groups |
There was a problem hiding this comment.
AFAIK, the challenging fact is, there's cached LVM metadata in each node's memory. And lvmlockd exactly ensures the consistency of that so that it prevents any conflicting views and race conditions between the nodes when volume creating, extending, or removing and so on is happening on one node ....
I think that's also the reason why the vgcreate/lvcreate commands are done while dlm & lvmlockd are running in the existing demonstration.
| A volume group is a storage pool of multiple physical | ||
| disks. A logical volume belongs to a volume group, and can be seen as an | ||
| elastic volume on which you can create a file system. In a cluster environment, | ||
| shared VGs consist of shared storage and can be used concurrently by multiple nodes. |
There was a problem hiding this comment.
Maybe in here it should refer to VGs that can be reached/accessed by multiple nodes either exclusively or concurrently?
| <listitem> | ||
| <para> | ||
| A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN, | ||
| or DRBD*, for example. To create a volume group, you must have at least two shared disks. |
There was a problem hiding this comment.
If we are to mention NVME, we should be precise that it's "NVMe-oF".
Nothing wrong with listing DRBD here since it's always meant to be a shared block device over TCP/IP.
| </para> | ||
| <screen>&prompt.root;<command>lvcreate -an -L10G -n lv1 vg1</command></screen> | ||
| </step> | ||
| </procedure> |
There was a problem hiding this comment.
BE AWARE of the dangerous situation here: in this case at this point, since LVM metadata is not protected by lvmlockd, the other nodes likely remain completely unaware that the on-disk metadata has changed because they are still using the old disk layout cached in their memory. So all the nodes should better reboot to obtain the latest metadata.
Otherwise, I'm not sure if it'd make sense to suggest something like the following to be performed on all the other nodes for them to scan and refresh the metadata. Some say even this cannot guarantee the consistency:
pvscan --cache: Rescans the Physical Volumes.
vgscan --cache: Rescans the Volume Groups.
lvscan: Refreshes the status of the Logical Volumes.
There was a problem hiding this comment.
I think it would be useful to add those steps as a backup, in case someone can't reboot the nodes right now. But maybe I could suggest also rebooting the nodes later, at a convenient time?
| <para> | ||
| Exclusive mode should be used for local file systems like <systemitem>ext4</systemitem>. | ||
| In this mode, the LV can only be activated on one node at a time. Access to the LV is | ||
| restricted using the <systemitem>system_id</systemitem>. For more information, see |
There was a problem hiding this comment.
NIT: If the parameter lvname is not specified, all the LVs in the VG will be activated. So here should be LV(s).
--
Glass
| <step> | ||
| <para> | ||
| Configure a <systemitem>lvmlockd</systemitem> resource as follows: | ||
| Configure an <systemitem>lvmlockd</systemitem> resource as follows: |
There was a problem hiding this comment.
Pretty sure @tahliar made it correct here :-)
The rule for using "a" vs. "an" depends on how we pronounce the first letter (el for l in lvmlockd ), not the letter itself.
| <listitem> | ||
| <para> | ||
| A shared storage device is available, such as Fibre Channel, FCoE, SCSI, iSCSI SAN, | ||
| or DRBD*, for example. To create a volume group, you must have at least two shared disks. |
There was a problem hiding this comment.
A DRBD device, of course, could be a shared device between nodes. I'd suggest that do not mention DRBD here to avoid the complexity.
| </step> | ||
| <step> | ||
| <para> | ||
| Open the <filename>/etc/lvm/lvm.conf</filename> file. |
There was a problem hiding this comment.
It depends on what the user wants the VG system_id to be:
- If the user wants to specify the system_id to be "foo", then
global {
system_id_source="lvmlocal"
}
local {
system_id = "foo"
}
The value system_id in /etc/lvm/lvmlocal.conf is treated as a string
sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="lvmlocal"
system_id="uname"
sp7-2:~ # vgcreate test /dev/vdb
Volume group "test" successfully created with system ID uname
sp7-2:~ # vgdisplay test
--- Volume group ---
VG Name test
System ID uname
Format lvm2
...
- If the user wants to specify the system_id to be the output of
uname, then
global {
system_id_source="uname"
}
is enough.
e.g.
sp7-2:~ # lvmconfig --merged global/system_id_source local/system_id
system_id_source="uname"
Configuration node local/system_id not found
sp7-2:~ # vgcreate test2 /dev/vdc
Volume group "test2" successfully created with system ID sp7-2
| params vgname=vg1 vg_access_mode=lvmlockd \ | ||
| op start timeout=90s interval=0 \ | ||
| op stop timeout=90s interval=0 \ | ||
| op monitor interval=30s timeout=90s</command></screen> |
There was a problem hiding this comment.
I'd prefer still as well showcasing how to achieve exclusive mode with lvmlockd, in case that certain users want to consistently use lvmlockd for their setup with LVs in different modes, or favor the benefits of lvmlockd over system_id anyway.
Of course the title of this section could something like "Configuring Cluster LVM using lvmlockd".
And the next section could be "Configuring LVM using system_id". The description will tell system_id only supports exclusive mode then.
PR creator: Description
Restructured the CLVM steps to include a better procedure for exclusive activation mode.
PDF:
cha-ha-clvm_en.pdf
PR creator: Are there any relevant issues/feature requests?
PR creator: Which product versions do the changes apply to?
When opening a PR, check all versions of the documentation that your PR applies to.
main, no backport necessary)PR reviewer: Checklist
The doc team member merging your PR will take care of the following tasks and will tick the boxes if done.
<revision><date>YYYY-MM-DD</date>of chapter, part and book (if the change warrants it - for criteria, see https://documentation.suse.com/style/current/html/style-guide-db/sec-structure.html#sec-revinfo)