Skip to content

improve logging around unhealthy clocks#60

Merged
iliana merged 1 commit intorelease-22.1-oxidefrom
justin/log-unhealthy-clocks
Mar 16, 2026
Merged

improve logging around unhealthy clocks#60
iliana merged 1 commit intorelease-22.1-oxidefrom
justin/log-unhealthy-clocks

Conversation

@JustinAzoff
Copy link
Copy Markdown
Contributor

We have seen this error

clock synchronization error: this node is more than 500ms away from at least half of the known nodes

but when this happens it's not clear what the real issue is. Are the clocks 501ms away? or 5000ms?

This logs an additional error any time a remote node is unhealthy

E260312 20:20:10.978114 15 2@rpc/clock_offset.go:256  [-] 3  node 3 is not healthy: clock offset is off=91ns, err=31ns, at=1970-01-01 00:00:00 +0000 UTC

We have seen this error

    clock synchronization error: this node is more than 500ms away from at least half of the known nodes

but when this happens it's not clear what the real issue is.  Are the
clocks 501ms away? or 5000ms?

This logs an additional error any time a remote node is unhealthy

    E260312 20:20:10.978114 15 2@rpc/clock_offset.go:256  [-] 3  node 3 is not healthy: clock offset is off=91ns, err=31ns, at=1970-01-01 00:00:00 +0000 UTC
@JustinAzoff JustinAzoff force-pushed the justin/log-unhealthy-clocks branch from 42ad73b to 3ca158b Compare March 13, 2026 17:43
@iliana iliana merged commit 9574dd9 into release-22.1-oxide Mar 16, 2026
14 checks passed
@iliana iliana deleted the justin/log-unhealthy-clocks branch March 16, 2026 19:37
iliana added a commit to oxidecomputer/omicron that referenced this pull request Mar 20, 2026
Primarily for debugging (#9427, and time sync problems).

The vast majority of changes here are build system related or updating
dependencies that are used in tests only, but there are some actual code
changes worth pointing out:

- oxidecomputer/cockroach#46 (r+ @sudomateo)
- oxidecomputer/pebble@e7c3451
  "fix lints; run tests in buildomat" (not reviewed)
- oxidecomputer/pebble@06125d2
  backport of "sstable: do a bit flip computation on a checksum mismatch
  in the Reader" (not reviewed)
- oxidecomputer/cockroach#60 (@JustinAzoff;
  r+ @iliana)
- oxidecomputer/cockroach#64 (not reviewed)
- oxidecomputer/cockroach#65 (r+ @sudomateo)

Full delta:
oxidecomputer/cockroach@367bca4...86fdbfc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants