feat(sync): add unified metrics interface for sync protocol observability#2007
feat(sync): add unified metrics interface for sync protocol observability#2007
Conversation
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 95% | Review time: 203.3s
🟡 4 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-d319aeb9
…lity Implement a unified metrics interface that works for both simulation (deterministic benchmarking) and production (Prometheus observability). New modules: - `sync::metrics` - SyncMetricsCollector trait with PhaseTimer - `sync::prometheus_metrics` - Production Prometheus implementation - `sync_sim::metrics_adapter` - Simulation adapter bridging to SimMetrics - `sync_sim::benchmarks` - Protocol benchmarks using the trait Key features: - Protocol cost metrics: messages, bytes, round trips, entities, merges - Phase timing: handshake, data_transfer, merge, sync_total - CIP invariant monitoring: I5 (snapshot_blocked), I6 (buffer_drops), I7 (verification_failures) - NoOpMetrics for zero-overhead disabled metrics - Full integration testing with simulation framework All 23 tests pass validating the trait implementation works correctly with real simulation data.
614bd9d to
e9eed76
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 225.0s
🟡 2 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-b87581e4
This comment has been minimized.
This comment has been minimized.
|
Could not push Autofix changes. The PR branch may have changed since the Autofix ran, or the Autofix commit may no longer exist. |
- Add protocol parameter to record_sync_complete and record_sync_failure to fix hardcoded 'all' label losing cardinality - Add sanitize_protocol() and sanitize_crdt_type() to prevent unbounded label cardinality from untrusted input (whitelist known values) - Fix O(n) loop in record_entities_transferred by using direct increment - Implement record_protocol_selected with a counter instead of no-op - Fix sync_successes_total never being incremented (Bugbot fix) - Fix benchmark byte totals undercount from integer division (Bugbot fix) - Fix Box::leak memory leak in scaling benchmarks (Bugbot fix)
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 92% | Review time: 174.2s
💡 4 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-55a3f1ea
|
Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.
Or push these changes by commenting: Preview (0884931133)diff --git a/Cargo.lock b/Cargo.lock
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1563,24 +1563,6 @@
]
[[package]]
-name = "bindgen"
-version = "0.72.1"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "993776b509cfb49c750f11b8f07a46fa23e0a1386ffc01fb1e7d343efc387895"
-dependencies = [
- "bitflags 2.10.0",
- "cexpr",
- "clang-sys",
- "itertools 0.13.0",
- "proc-macro2",
- "quote",
- "regex",
- "rustc-hash 2.1.1",
- "shlex",
- "syn 2.0.108",
-]
-
-[[package]]
name = "binread"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -6387,8 +6369,6 @@
"glob",
"libc",
"libz-sys",
- "lz4-sys",
- "zstd-sys",
]
[[package]]
@@ -6473,16 +6453,6 @@
checksum = "112b39cec0b298b6c1999fee3e31427f74f676e4cb9879ed1a121b43661a4154"
[[package]]
-name = "lz4-sys"
-version = "1.11.1+lz4-1.10.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "6bd8c0d6c6ed0cd30b3652886bb8711dc4bb01d637a68105a3d5158039b418e6"
-dependencies = [
- "cc",
- "libc",
-]
-
-[[package]]
name = "lz4_flex"
version = "0.11.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -12661,7 +12631,6 @@
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "91e19ebc2adc8f83e43039e79776e3fda8ca919132d68a1fed6a5faca2683748"
dependencies = [
- "bindgen 0.72.1",
"cc",
"pkg-config",
]
diff --git a/crates/node/tests/sync_sim/metrics_adapter.rs b/crates/node/tests/sync_sim/metrics_adapter.rs
--- a/crates/node/tests/sync_sim/metrics_adapter.rs
+++ b/crates/node/tests/sync_sim/metrics_adapter.rs
@@ -144,10 +144,16 @@
&self,
_context_id: &str,
_protocol: &str,
- _duration: Duration,
+ duration: Duration,
_entities: usize,
) {
- // Simulation uses convergence metrics instead
+ use super::runtime::SimTime;
+
+ let mut guard = self.metrics.lock().expect("metrics lock poisoned");
+ // Convert std::time::Duration to SimTime (both use microseconds internally)
+ let sim_time = SimTime::from_micros(duration.as_micros() as u64);
+ guard.convergence.converged = true;
+ guard.convergence.time_to_converge = Some(sim_time);
}
fn record_sync_failure(&self, _context_id: &str, _protocol: &str, _reason: &str) { |

Summary
Implement a unified metrics interface (
SyncMetricsCollectortrait) for sync protocol observability that works for both simulation (deterministic benchmarking) and production (Prometheus).Key changes:
sync::metricsmodule withSyncMetricsCollectortrait andPhaseTimersync::prometheus_metricsmodule with production Prometheus implementationsync_sim::metrics_adapterbridging the trait to simulation metricssync_sim::benchmarksusing the trait for protocol benchmarkingcrates/server/src/metrics.rsMetrics collected:
Implementations:
NoOpMetrics- Zero-overhead disabled metricsPrometheusSyncMetrics- Production Prometheus with labeled counters/histogramsSimMetricsCollector- Simulation adapter for benchmarkingTest plan
SyncMetricsCollectortrait (3 tests)PrometheusSyncMetrics(3 tests)SimMetricsCollectoradapter (9 tests)cargo fmt- passcargo clippy- no new warningsNote
Medium Risk
Mostly additive observability code, but it expands exported sync APIs and introduces Prometheus label/bucket choices that could impact metrics cardinality and performance if misused.
Overview
Introduces a unified sync observability surface via
SyncMetricsCollectorandPhaseTimer, including aNoOpMetricsimplementation and public re-exports fromsync::mod.Adds a production
PrometheusSyncMetricscollector registering labeled counters/histograms for protocol cost, phase timing, safety invariants (I5/I6/I7), and session outcomes, with label sanitization to avoid unbounded cardinality.Extends the sync simulation test suite with a
SimMetricsCollectoradapter and new deterministic benchmarks/integration tests that exercise the trait end-to-end, and updates server metrics docs to describe wiring sync metrics into the/metricsregistry.Written by Cursor Bugbot for commit 80678b5. This will update automatically on new commits. Configure here.