Skip to content

Commit d0d710f

Browse files
committed
Add storage.wal.failover.write_and_sync.latency
Fixes DOC-13184 Summary of changes: - Add a mention of the `storage.wal.failover.write_and_sync.latency` metric to the `wal-failover-metrics.md` include file, which will pull it into the 'WAL failover' and 'cockroach start' pages. - We're also doing a cockroachdb/cockroach PR to mark this metric as 'essential', so it shows up in the list of Storage essential metrics at e.g. https://www.cockroachlabs.com/docs/v25.3/essential-metrics-self-hosted.html#storage
1 parent 20c72f4 commit d0d710f

File tree

5 files changed

+20
-0
lines changed

5 files changed

+20
-0
lines changed

src/current/_includes/v24.1/wal-failover-metrics.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,14 @@ You can monitor WAL failover occurrences using the following metrics:
33
- `storage.wal.failover.secondary.duration`: Cumulative time spent (in nanoseconds) writing to the secondary WAL directory. Only populated when WAL failover is configured.
44
- `storage.wal.failover.primary.duration`: Cumulative time spent (in nanoseconds) writing to the primary WAL directory. Only populated when WAL failover is configured.
55
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
6+
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary.
7+
- `storage.wal.failover.write_and_sync.latency`: When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
68

79
The `storage.wal.failover.secondary.duration` is the primary metric to monitor. You should expect this metric to be `0` unless a WAL failover occurs. If a WAL failover occurs, the rate at which it increases provides an indication of the health of the primary store.
810

911
You can access these metrics via the following methods:
1012

1113
- The [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}) in [DB Console]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
1214
- By [monitoring CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}).
15+
16+
For more information, refer to [Essential storage metrics]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#storage)

src/current/_includes/v24.3/wal-failover-metrics.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,14 @@ You can monitor WAL failover occurrences using the following metrics:
33
- `storage.wal.failover.secondary.duration`: Cumulative time spent (in nanoseconds) writing to the secondary WAL directory. Only populated when WAL failover is configured.
44
- `storage.wal.failover.primary.duration`: Cumulative time spent (in nanoseconds) writing to the primary WAL directory. Only populated when WAL failover is configured.
55
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
6+
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary.
7+
- `storage.wal.failover.write_and_sync.latency`: When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
68

79
The `storage.wal.failover.secondary.duration` is the primary metric to monitor. You should expect this metric to be `0` unless a WAL failover occurs. If a WAL failover occurs, the rate at which it increases provides an indication of the health of the primary store.
810

911
You can access these metrics via the following methods:
1012

1113
- The [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}) in [DB Console]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
1214
- By [monitoring CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}).
15+
16+
For more information, refer to [Essential storage metrics]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#storage)

src/current/_includes/v25.2/wal-failover-metrics.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,14 @@ You can monitor WAL failover occurrences using the following metrics:
33
- `storage.wal.failover.secondary.duration`: Cumulative time spent (in nanoseconds) writing to the secondary WAL directory. Only populated when WAL failover is configured.
44
- `storage.wal.failover.primary.duration`: Cumulative time spent (in nanoseconds) writing to the primary WAL directory. Only populated when WAL failover is configured.
55
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
6+
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary.
7+
- `storage.wal.failover.write_and_sync.latency`: When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
68

79
The `storage.wal.failover.secondary.duration` is the primary metric to monitor. You should expect this metric to be `0` unless a WAL failover occurs. If a WAL failover occurs, the rate at which it increases provides an indication of the health of the primary store.
810

911
You can access these metrics via the following methods:
1012

1113
- The [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}) in [DB Console]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
1214
- By [monitoring CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}).
15+
16+
For more information, refer to [Essential storage metrics]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#storage)

src/current/_includes/v25.3/wal-failover-metrics.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,14 @@ You can monitor WAL failover occurrences using the following metrics:
33
- `storage.wal.failover.secondary.duration`: Cumulative time spent (in nanoseconds) writing to the secondary WAL directory. Only populated when WAL failover is configured.
44
- `storage.wal.failover.primary.duration`: Cumulative time spent (in nanoseconds) writing to the primary WAL directory. Only populated when WAL failover is configured.
55
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
6+
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary.
7+
- `storage.wal.failover.write_and_sync.latency`: When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
68

79
The `storage.wal.failover.secondary.duration` is the primary metric to monitor. You should expect this metric to be `0` unless a WAL failover occurs. If a WAL failover occurs, the rate at which it increases provides an indication of the health of the primary store.
810

911
You can access these metrics via the following methods:
1012

1113
- The [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}) in [DB Console]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
1214
- By [monitoring CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}).
15+
16+
For more information, refer to [Essential storage metrics]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#storage)

src/current/_includes/v25.4/wal-failover-metrics.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,14 @@ You can monitor WAL failover occurrences using the following metrics:
33
- `storage.wal.failover.secondary.duration`: Cumulative time spent (in nanoseconds) writing to the secondary WAL directory. Only populated when WAL failover is configured.
44
- `storage.wal.failover.primary.duration`: Cumulative time spent (in nanoseconds) writing to the primary WAL directory. Only populated when WAL failover is configured.
55
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
6+
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary.
7+
- `storage.wal.failover.write_and_sync.latency`: When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
68

79
The `storage.wal.failover.secondary.duration` is the primary metric to monitor. You should expect this metric to be `0` unless a WAL failover occurs. If a WAL failover occurs, the rate at which it increases provides an indication of the health of the primary store.
810

911
You can access these metrics via the following methods:
1012

1113
- The [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}) in [DB Console]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
1214
- By [monitoring CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}).
15+
16+
For more information, refer to [Essential storage metrics]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#storage)

0 commit comments

Comments
 (0)