Skip to content

Commit e883033

Browse files
authored
Backport leader-leaseholder split info to v24.* (#20472)
Fixes DOC-14774 Backports non-leader-leases changes from #19755
1 parent fe20ed6 commit e883033

File tree

10 files changed

+30
-8
lines changed

10 files changed

+30
-8
lines changed

src/current/_includes/v24.1/essential-alerts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -318,9 +318,9 @@ Send an alert when the number of ranges with replication below the replication f
318318

319319
- Refer to [Replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues).
320320

321-
### Requests stuck in raft
321+
### Requests stuck in Raft
322322

323-
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated.
323+
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated. This can also be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
324324

325325
**Metric**
326326
<br>`requests.slow.raft`

src/current/_includes/v24.3/essential-alerts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -318,9 +318,9 @@ Send an alert when the number of ranges with replication below the replication f
318318

319319
- Refer to [Replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues).
320320

321-
### Requests stuck in raft
321+
### Requests stuck in Raft
322322

323-
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated.
323+
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated. This can also be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
324324

325325
**Metric**
326326
<br>`requests.slow.raft`

src/current/v24.1/architecture/replication-layer.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,15 @@ A table's meta and system ranges (detailed in the [distribution layer]({% link {
146146

147147
However, unlike table data, system ranges cannot use epoch-based leases because that would create a circular dependency: system ranges are already being used to implement epoch-based leases for table data. Therefore, system ranges use expiration-based leases instead. Expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder.
148148

149+
#### Leader‑leaseholder splits
150+
151+
[Epoch-based leases](#epoch-based-leases-table-data) are vulnerable to _leader-leaseholder splits_. These can occur when a leaseholder's Raft log has fallen behind other replicas in its group and it cannot acquire Raft leadership. Coupled with a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition), this split can cause permanent unavailability of the range if (1) the stale leaseholder continues heartbeating the [liveness range](#epoch-based-leases-table-data) to hold its lease but (2) cannot reach the leader to propose writes.
152+
153+
Symptoms of leader-leaseholder splits include a [stalled Raft log]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#requests-stuck-in-raft) on the leaseholder and [increased disk usage]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disks-filling-up) on follower replicas buffering pending Raft entries. Remediations include:
154+
155+
- Restarting the affected nodes.
156+
- Fixing the network partition (or slow networking) between nodes.
157+
149158
#### How leases are transferred from a dead node
150159

151160
When the cluster needs to access a range on a leaseholder node that is dead, that range's lease must be transferred to a healthy node. This process is as follows:

src/current/v24.1/cluster-setup-troubleshooting.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -387,6 +387,8 @@ Like any database system, if you run out of disk space the system will no longer
387387
- [Why is disk usage increasing despite lack of writes?]({% link {{ page.version.version }}/operational-faqs.md %}#why-is-disk-usage-increasing-despite-lack-of-writes)
388388
- [Can I reduce or disable the storage of timeseries data?]({% link {{ page.version.version }}/operational-faqs.md %}#can-i-reduce-or-disable-the-storage-of-time-series-data)
389389
390+
In rare cases, disk usage can increase on nodes with [Raft followers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) due to a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
391+
390392
###### Automatic ballast files
391393
392394
CockroachDB automatically creates an emergency ballast file at [node startup]({% link {{ page.version.version }}/cockroach-start.md %}). This feature is **on** by default. Note that the [`cockroach debug ballast`]({% link {{ page.version.version }}/cockroach-debug-ballast.md %}) command is still available but deprecated.

src/current/v24.1/monitoring-and-alerting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1205,7 +1205,7 @@ Currently, not all events listed have corresponding alert rule definitions avail
12051205

12061206
#### Requests stuck in Raft
12071207

1208-
- **Rule:** Send an alert when requests are taking a very long time in replication.
1208+
- **Rule:** Send an alert when requests are taking a very long time in replication. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
12091209

12101210
- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's `_status/vars` output.
12111211

src/current/v24.1/ui-slow-requests-dashboard.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Hovering over the graph displays values for the following metrics:
2929

3030
Metric | Description
3131
--------|----
32-
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric.
32+
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
3333

3434
## Slow DistSender RPCs
3535

src/current/v24.3/architecture/replication-layer.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,15 @@ A table's meta and system ranges (detailed in the [distribution layer]({% link {
148148

149149
However, unlike table data, system ranges cannot use epoch-based leases because that would create a circular dependency: system ranges are already being used to implement epoch-based leases for table data. Therefore, system ranges use expiration-based leases instead. Expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder.
150150

151+
#### Leader‑leaseholder splits
152+
153+
[Epoch-based leases](#epoch-based-leases-table-data) are vulnerable to _leader-leaseholder splits_. These can occur when a leaseholder's Raft log has fallen behind other replicas in its group and it cannot acquire Raft leadership. Coupled with a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition), this split can cause permanent unavailability of the range if (1) the stale leaseholder continues heartbeating the [liveness range](#epoch-based-leases-table-data) to hold its lease but (2) cannot reach the leader to propose writes.
154+
155+
Symptoms of leader-leaseholder splits include a [stalled Raft log]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#requests-stuck-in-raft) on the leaseholder and [increased disk usage]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disks-filling-up) on follower replicas buffering pending Raft entries. Remediations include:
156+
157+
- Restarting the affected nodes.
158+
- Fixing the network partition (or slow networking) between nodes.
159+
151160
#### How leases are transferred from a dead node
152161

153162
When the cluster needs to access a range on a leaseholder node that is dead, that range's lease must be transferred to a healthy node. This process is as follows:

src/current/v24.3/cluster-setup-troubleshooting.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -387,6 +387,8 @@ Like any database system, if you run out of disk space the system will no longer
387387
- [Why is disk usage increasing despite lack of writes?]({% link {{ page.version.version }}/operational-faqs.md %}#why-is-disk-usage-increasing-despite-lack-of-writes)
388388
- [Can I reduce or disable the storage of timeseries data?]({% link {{ page.version.version }}/operational-faqs.md %}#can-i-reduce-or-disable-the-storage-of-time-series-data)
389389
390+
In rare cases, disk usage can increase on nodes with [Raft followers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) due to a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
391+
390392
###### Automatic ballast files
391393
392394
CockroachDB automatically creates an emergency ballast file at [node startup]({% link {{ page.version.version }}/cockroach-start.md %}). This feature is **on** by default. Note that the [`cockroach debug ballast`]({% link {{ page.version.version }}/cockroach-debug-ballast.md %}) command is still available but deprecated.

src/current/v24.3/monitoring-and-alerting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1205,7 +1205,7 @@ Currently, not all events listed have corresponding alert rule definitions avail
12051205

12061206
#### Requests stuck in Raft
12071207

1208-
- **Rule:** Send an alert when requests are taking a very long time in replication.
1208+
- **Rule:** Send an alert when requests are taking a very long time in replication. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
12091209

12101210
- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's `_status/vars` output.
12111211

src/current/v24.3/ui-slow-requests-dashboard.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Hovering over the graph displays values for the following metrics:
2929

3030
Metric | Description
3131
--------|----
32-
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric.
32+
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).
3333

3434
## Slow DistSender RPCs
3535

0 commit comments

Comments
 (0)