From 12c889cd8cef18de7cae1270611a94a92c5ed408 Mon Sep 17 00:00:00 2001 From: Joe Lodin Date: Thu, 9 Oct 2025 13:36:36 -0400 Subject: [PATCH 1/4] Add docs for decommissioning nodes with the operator --- .../v25.2/scale-cockroachdb-operator.md | 50 +++++++++++++++++++ .../v25.3/scale-cockroachdb-operator.md | 50 +++++++++++++++++++ .../v25.4/scale-cockroachdb-operator.md | 50 +++++++++++++++++++ 3 files changed, 150 insertions(+) diff --git a/src/current/v25.2/scale-cockroachdb-operator.md b/src/current/v25.2/scale-cockroachdb-operator.md index 76de7364979..055b50c9bd0 100644 --- a/src/current/v25.2/scale-cockroachdb-operator.md +++ b/src/current/v25.2/scale-cockroachdb-operator.md @@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ~~~ shell kubectl get pods ~~~ + +## Decommission nodes + +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. + +{{site.data.alerts.callout_info}} +The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. + +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +{{site.data.alerts.end}} + +The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: + +- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example: + {% include_cached copy-clipboard.html %} + ~~~ yaml + containers: + - name: cockroach-operator + image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }} + args: + - "-enable-k8s-node-controller=true" + ~~~ +- The role-based access control system must be configured to allow the operator to patch nodes. +- At least one replica of the operator must not be on the target node. +- There must be no under-replicated ranges on the CockroachDB cluster. + +To mark a node for decommissioning, follow these steps: + +1. Identify the name of the Kubernetes node that is to be removed. + +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true" + ~~~ + +1. Monitor the cluster: + - Confirm the decommissioned node's cordoned status: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl describe node {example-node-name} + ~~~ + - Monitor operator events and logs for decommission start and completion messages: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl logs pod {operator-pod-name} + ~~~ + +If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled. diff --git a/src/current/v25.3/scale-cockroachdb-operator.md b/src/current/v25.3/scale-cockroachdb-operator.md index e6b544db6da..0d5e89a0176 100644 --- a/src/current/v25.3/scale-cockroachdb-operator.md +++ b/src/current/v25.3/scale-cockroachdb-operator.md @@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ~~~ shell kubectl get pods ~~~ + +## Decommission nodes + +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. + +{{site.data.alerts.callout_info}} +The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. + +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +{{site.data.alerts.end}} + +The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: + +- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example: + {% include_cached copy-clipboard.html %} + ~~~ yaml + containers: + - name: cockroach-operator + image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }} + args: + - "-enable-k8s-node-controller=true" + ~~~ +- The role-based access control system must be configured to allow the operator to patch nodes. +- At least one replica of the operator must not be on the target node. +- There must be no under-replicated ranges on the CockroachDB cluster. + +To mark a node for decommissioning, follow these steps: + +1. Identify the name of the Kubernetes node that is to be removed. + +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true" + ~~~ + +1. Monitor the cluster: + - Confirm the decommissioned node's cordoned status: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl describe node {example-node-name} + ~~~ + - Monitor operator events and logs for decommission start and completion messages: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl logs pod {operator-pod-name} + ~~~ + +If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled. diff --git a/src/current/v25.4/scale-cockroachdb-operator.md b/src/current/v25.4/scale-cockroachdb-operator.md index e6b544db6da..0d5e89a0176 100644 --- a/src/current/v25.4/scale-cockroachdb-operator.md +++ b/src/current/v25.4/scale-cockroachdb-operator.md @@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ~~~ shell kubectl get pods ~~~ + +## Decommission nodes + +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. + +{{site.data.alerts.callout_info}} +The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. + +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +{{site.data.alerts.end}} + +The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: + +- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example: + {% include_cached copy-clipboard.html %} + ~~~ yaml + containers: + - name: cockroach-operator + image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }} + args: + - "-enable-k8s-node-controller=true" + ~~~ +- The role-based access control system must be configured to allow the operator to patch nodes. +- At least one replica of the operator must not be on the target node. +- There must be no under-replicated ranges on the CockroachDB cluster. + +To mark a node for decommissioning, follow these steps: + +1. Identify the name of the Kubernetes node that is to be removed. + +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: + + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true" + ~~~ + +1. Monitor the cluster: + - Confirm the decommissioned node's cordoned status: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl describe node {example-node-name} + ~~~ + - Monitor operator events and logs for decommission start and completion messages: + {% include_cached copy-clipboard.html %} + ~~~ shell + kubectl logs pod {operator-pod-name} + ~~~ + +If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled. From a67d1015948d414304a5eb796bca1ff67b78d1ff Mon Sep 17 00:00:00 2001 From: Joe Lodin Date: Thu, 9 Oct 2025 17:29:20 -0400 Subject: [PATCH 2/4] Peach comments --- src/current/v25.2/scale-cockroachdb-operator.md | 4 ++-- src/current/v25.3/scale-cockroachdb-operator.md | 4 ++-- src/current/v25.4/scale-cockroachdb-operator.md | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/current/v25.2/scale-cockroachdb-operator.md b/src/current/v25.2/scale-cockroachdb-operator.md index 055b50c9bd0..b33da93a86d 100644 --- a/src/current/v25.2/scale-cockroachdb-operator.md +++ b/src/current/v25.2/scale-cockroachdb-operator.md @@ -110,7 +110,7 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. {{site.data.alerts.callout_info}} -The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. {{site.data.alerts.end}} @@ -134,7 +134,7 @@ To mark a node for decommissioning, follow these steps: 1. Identify the name of the Kubernetes node that is to be removed. -1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example: {% include_cached copy-clipboard.html %} ~~~ shell diff --git a/src/current/v25.3/scale-cockroachdb-operator.md b/src/current/v25.3/scale-cockroachdb-operator.md index 0d5e89a0176..a5b2c317cda 100644 --- a/src/current/v25.3/scale-cockroachdb-operator.md +++ b/src/current/v25.3/scale-cockroachdb-operator.md @@ -110,7 +110,7 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. {{site.data.alerts.callout_info}} -The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. {{site.data.alerts.end}} @@ -134,7 +134,7 @@ To mark a node for decommissioning, follow these steps: 1. Identify the name of the Kubernetes node that is to be removed. -1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example: {% include_cached copy-clipboard.html %} ~~~ shell diff --git a/src/current/v25.4/scale-cockroachdb-operator.md b/src/current/v25.4/scale-cockroachdb-operator.md index 0d5e89a0176..a5b2c317cda 100644 --- a/src/current/v25.4/scale-cockroachdb-operator.md +++ b/src/current/v25.4/scale-cockroachdb-operator.md @@ -110,7 +110,7 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. {{site.data.alerts.callout_info}} -The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. {{site.data.alerts.end}} @@ -134,7 +134,7 @@ To mark a node for decommissioning, follow these steps: 1. Identify the name of the Kubernetes node that is to be removed. -1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example: +1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example: {% include_cached copy-clipboard.html %} ~~~ shell From 6a4797901ce85864da75ea916c1f492539392625 Mon Sep 17 00:00:00 2001 From: Joe Lodin Date: Fri, 10 Oct 2025 15:47:40 -0400 Subject: [PATCH 3/4] Address comments --- src/current/v25.2/scale-cockroachdb-operator.md | 7 +++---- src/current/v25.3/scale-cockroachdb-operator.md | 7 +++---- src/current/v25.4/scale-cockroachdb-operator.md | 7 +++---- 3 files changed, 9 insertions(+), 12 deletions(-) diff --git a/src/current/v25.2/scale-cockroachdb-operator.md b/src/current/v25.2/scale-cockroachdb-operator.md index b33da93a86d..68162c89b62 100644 --- a/src/current/v25.2/scale-cockroachdb-operator.md +++ b/src/current/v25.2/scale-cockroachdb-operator.md @@ -107,12 +107,12 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ## Decommission nodes -When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline. {{site.data.alerts.callout_info}} -Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately. -If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. {{site.data.alerts.end}} The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: @@ -126,7 +126,6 @@ The following prerequisites are necessary for the {{ site.data.products.cockroac args: - "-enable-k8s-node-controller=true" ~~~ -- The role-based access control system must be configured to allow the operator to patch nodes. - At least one replica of the operator must not be on the target node. - There must be no under-replicated ranges on the CockroachDB cluster. diff --git a/src/current/v25.3/scale-cockroachdb-operator.md b/src/current/v25.3/scale-cockroachdb-operator.md index a5b2c317cda..32d6f847a39 100644 --- a/src/current/v25.3/scale-cockroachdb-operator.md +++ b/src/current/v25.3/scale-cockroachdb-operator.md @@ -107,12 +107,12 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ## Decommission nodes -When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline. {{site.data.alerts.callout_info}} -Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately. -If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. {{site.data.alerts.end}} The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: @@ -126,7 +126,6 @@ The following prerequisites are necessary for the {{ site.data.products.cockroac args: - "-enable-k8s-node-controller=true" ~~~ -- The role-based access control system must be configured to allow the operator to patch nodes. - At least one replica of the operator must not be on the target node. - There must be no under-replicated ranges on the CockroachDB cluster. diff --git a/src/current/v25.4/scale-cockroachdb-operator.md b/src/current/v25.4/scale-cockroachdb-operator.md index a5b2c317cda..32d6f847a39 100644 --- a/src/current/v25.4/scale-cockroachdb-operator.md +++ b/src/current/v25.4/scale-cockroachdb-operator.md @@ -107,12 +107,12 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C ## Decommission nodes -When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline. +When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline. {{site.data.alerts.callout_info}} -Annotating a CockroachDB node for decommissioning immediately begins the decommission process. The annotation is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node. +Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately. -If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling. +If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. {{site.data.alerts.end}} The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: @@ -126,7 +126,6 @@ The following prerequisites are necessary for the {{ site.data.products.cockroac args: - "-enable-k8s-node-controller=true" ~~~ -- The role-based access control system must be configured to allow the operator to patch nodes. - At least one replica of the operator must not be on the target node. - There must be no under-replicated ranges on the CockroachDB cluster. From 5f16dafc312182e26c12a1243e50be9a02e6473d Mon Sep 17 00:00:00 2001 From: Mike Lewis <76072290+mikeCRL@users.noreply.github.com> Date: Mon, 20 Oct 2025 10:52:53 -0400 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: NishanthNalluri --- src/current/v25.2/scale-cockroachdb-operator.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/current/v25.2/scale-cockroachdb-operator.md b/src/current/v25.2/scale-cockroachdb-operator.md index 68162c89b62..cbefc4f1ba5 100644 --- a/src/current/v25.2/scale-cockroachdb-operator.md +++ b/src/current/v25.2/scale-cockroachdb-operator.md @@ -110,9 +110,9 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline. {{site.data.alerts.callout_info}} -Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately. +Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node and the decommissioning process for the CockroachDB pods scheduled on this Kubernetes node begins immediately. -If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. +If cluster resources are constrained, replacement pods may remain in the Pending state until the Kubernetes scheduler identifies suitable nodes. {{site.data.alerts.end}} The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node: