|
| 1 | +# etcd.openshift.io API Group |
| 2 | + |
| 3 | +This API group contains CRDs related to etcd cluster management. Specifically, this is only used for TNF (Two Node Fencing) |
| 4 | +for gathering status updates from the node to ensure the cluster-admin is warned about unhealthy setups. |
| 5 | + |
| 6 | +## API Versions |
| 7 | + |
| 8 | +### v1alpha1 |
| 9 | + |
| 10 | +Contains the `PacemakerCluster` custom resource for monitoring Pacemaker cluster health in TNF (Two Node Fencing) deployments. |
| 11 | + |
| 12 | +#### PacemakerCluster |
| 13 | + |
| 14 | +- **Feature Gate**: None - this CRD is gated by cluster-etcd-operator start-up. It will only be created once a TNF cluster has transitioned to external etcd. |
| 15 | +- **Component**: `two-node-fencing` |
| 16 | +- **Scope**: Cluster-scoped singleton resource named "cluster" |
| 17 | +- **Resource Path**: `pacemakerclusters.etcd.openshift.io` |
| 18 | + |
| 19 | +The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster. It is periodically updated by the cluster-etcd-operator's status collector running as a privileged CronJob. |
| 20 | + |
| 21 | +**Status Fields:** |
| 22 | +- `lastUpdated` (required): Timestamp when status was last collected - used to detect stale data |
| 23 | +- `summary`: High-level cluster health metrics |
| 24 | + - `pacemakerDaemonState`: Running state (enum: `Running`, `KnownNotRunning`) |
| 25 | + - `quorumStatus`: Quorum state (enum: `Quorate`, `NoQuorum`) |
| 26 | + - `nodesOnline`, `nodesTotal`: Node counts (0-2) |
| 27 | + - `resourcesStarted`, `resourcesTotal`: Resource counts (0-16) |
| 28 | +- `nodes`: Detailed status of each node (1-2 nodes) |
| 29 | + - Name, IPv4/IPv6 addresses, online status (enum), mode (enum: `Active`, `Standby`) |
| 30 | +- `resources`: Detailed status of each resource (1-16 resources) |
| 31 | + - Name, resource agent, role (enum: `Started`, `Stopped`), active status (enum), node assignment |
| 32 | +- `nodeHistory`: Recent operation failures for troubleshooting (up to 16 entries, last 5 minutes) |
| 33 | +- `fencingHistory`: Recent fencing events (up to 16 events, last 24 hours) |
| 34 | + - Target node, action (enum: `reboot`, `off`, `on`), status (enum: `success`, `failed`, `pending`), completion timestamp |
| 35 | +- `collectionError`: Any errors encountered during status collection (max 2KB) |
| 36 | +- `rawXML`: Full XML output from `pcs status xml` for debugging (max 256KB) |
| 37 | + |
| 38 | +**Design Principles:** |
| 39 | +The API follows "Act on Deterministic Information": |
| 40 | +- All fields except `lastUpdated` are optional |
| 41 | +- Missing data indicates unknown state, not error |
| 42 | +- Operator only acts on definitive information |
| 43 | +- Unknown state preserves the last known health condition |
| 44 | + |
| 45 | +**Usage:** |
| 46 | +The cluster-etcd-operator healthcheck controller watches this resource and updates operator conditions based on the cluster state. |
0 commit comments