Skip to content

Conversation

@rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Oct 14, 2025

Fixes DOC-13184

Summary of changes:

@netlify
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 813a0fd
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/68ff944ea2788100085dd77a

@netlify
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 813a0fd
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/68ff944e269eed0008fafe6d

@github-actions
Copy link

github-actions bot commented Oct 14, 2025

Files changed:

rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 14, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
@netlify
Copy link

netlify bot commented Oct 14, 2025

Netlify Preview

Name Link
🔨 Latest commit 813a0fd
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/68ff944e80bad3000893d2c3
😎 Deploy Preview https://deploy-preview-20566--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


src/current/_includes/v25.4/wal-failover-metrics.md line 7 at r1 (raw file):

- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary. 
- `storage.wal.failover.write_and_sync.latency` metric is up one level from `storage.wal.fsync.latency`, and during the failover will report the latency actually observed by higher levels (which should be ~equivalent to the latency of the secondary).

It is not just during the failover. We should say something like.

When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland
Copy link
Contributor Author

thanks @sumeerbhola, i've updated in 998c744 - PTAL

once we're happy with this change and it's had docs review I would like to backport it to the WAL failover docs for all versions where this metric is supported

which previous versions have this metric available? is it everything v24.1+ or only a subset?

craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Oct 15, 2025
155395: storage: mark add'l WAL latency metric essential r=rmloveland a=rmloveland

This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).

Addresses part of DOC-13184

Co-authored-by: Rich Loveland <rich@cockroachlabs.com>
rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 16, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 16, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Oct 20, 2025
155395: storage: mark add'l WAL latency metric essential r=rmloveland a=rmloveland

This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).

Addresses part of DOC-13184

Co-authored-by: Rich Loveland <rich@cockroachlabs.com>
Copy link

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which previous versions have this metric available? is it everything v24.1+ or only a subset?

Since v24.1 cockroachdb/cockroach#123232

:lgtm:

@sumeerbhola reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rmloveland)

Copy link
Contributor

@peachdawnleach peachdawnleach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@rmloveland
Copy link
Contributor Author

TFTRs! Backporting to all supported versions v24.1+ before merge

Fixes DOC-13184

Summary of changes:

- Add a mention of the `storage.wal.failover.write_and_sync.latency`
  metric to the `wal-failover-metrics.md` include file, which will pull
  it into the 'WAL failover' and 'cockroach start' pages.

- We're also doing a cockroachdb/cockroach PR to mark this metric as
  'essential', so it shows up in the list of Storage essential metrics
  at e.g.
  https://www.cockroachlabs.com/docs/v25.3/essential-metrics-self-hosted.html#storage
@rmloveland rmloveland force-pushed the 20251014-DOC-13184-update-wal-failover-with-latency-metric branch from 998c744 to d0d710f Compare October 27, 2025 15:48
@rmloveland rmloveland enabled auto-merge (squash) October 27, 2025 15:48
@rmloveland rmloveland merged commit 28e255e into main Oct 27, 2025
6 checks passed
@rmloveland rmloveland deleted the 20251014-DOC-13184-update-wal-failover-with-latency-metric branch October 27, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants