Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
There was a problem hiding this comment.
Pull request overview
Adds a new heartbeat monitor (HMON) to the Rust health monitoring library and integrates it into the existing monitoring worker/supervisor notification flow.
Changes:
- Introduces
heartbeatmodule (monitor + atomic state) and integrates heartbeat monitors intoHealthMonitorBuilder/HealthMonitor. - Updates the monitor evaluation interface to accept a shared
hmon_starting_pointand wires it through the monitoring worker thread. - Refactors
SupervisorAPIClientinto a dedicated module with selectable implementations via Cargo features.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/health_monitoring_lib/rust/worker.rs | Passes a shared HMON start instant into monitor evaluations; moves supervisor client trait out. |
| src/health_monitoring_lib/rust/supervisor_api_client/mod.rs | Adds feature-selected SupervisorAPIClient + implementation alias. |
| src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs | New stub client implementation. |
| src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs | New SCORE client implementation. |
| src/health_monitoring_lib/rust/lib.rs | Adds heartbeat monitors to builder + start flow; uses new supervisor client impl selector. |
| src/health_monitoring_lib/rust/heartbeat/mod.rs | Exposes heartbeat monitor API. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_state.rs | Adds atomic packed heartbeat state and tests. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_monitor.rs | Implements heartbeat monitor logic + tests (incl. loom). |
| src/health_monitoring_lib/rust/deadline/deadline_monitor.rs | Adapts deadline evaluation to new evaluator signature and shared start instant. |
| src/health_monitoring_lib/rust/common.rs | Extends evaluation error types; adds duration_to_u32; updates evaluator trait signature. |
| src/health_monitoring_lib/Cargo.toml | Adds optional monitor_rs, loom target dep, and feature defaults. |
| src/health_monitoring_lib/BUILD | Enables score_supervisor_api_client feature in Bazel builds. |
| Cargo.toml | Updates workspace defaults and adds cfg(loom) lint configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
f07c0db to
e65f6a6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e65f6a6 to
bd438c3
Compare
bd438c3 to
c5f4b84
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs
Outdated
Show resolved
Hide resolved
c5f4b84 to
b012f37
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
b012f37 to
862da21
Compare
862da21 to
59c92ee
Compare
59c92ee to
cf14efb
Compare
cf14efb to
67fe6cc
Compare
806a239 to
50a33ae
Compare
50a33ae to
3365dc5
Compare
3365dc5 to
a6a4166
Compare
a6a4166 to
a1e6154
Compare
a1e6154 to
656c5fe
Compare
656c5fe to
0014124
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 18 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| /// Create range with values offset by timestamp. | ||
| fn offset(&self, timestamp: u32) -> Self { | ||
| Self::new(self.min + timestamp, self.max + timestamp) |
There was a problem hiding this comment.
InternalRange::offset adds timestamp to min/max with unchecked u32 arithmetic. This can wrap silently in release builds near the u32 limit, causing wrong range comparisons. Use checked_add (and handle overflow explicitly) rather than +.
| Self::new(self.min + timestamp, self.max + timestamp) | |
| let min = self | |
| .min | |
| .checked_add(timestamp) | |
| .expect("overflow when computing min offset in InternalRange::offset"); | |
| let max = self | |
| .max | |
| .checked_add(timestamp) | |
| .expect("overflow when computing max offset in InternalRange::offset"); | |
| Self::new(min, max) |
| assert!(value < 1 << 29, "provided heartbeat offset is out of range"); | ||
| self.0 = ((value as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); |
There was a problem hiding this comment.
set_heartbeat_timestamp_offset uses an unconditional assert! on the 29-bit limit. Since this can be reached from normal runtime behavior (e.g., long time between heartbeats), consider returning a Result/saturating instead of panicking, or ensure all callers validate inputs so this cannot trigger in production.
| assert!(value < 1 << 29, "provided heartbeat offset is out of range"); | |
| self.0 = ((value as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); | |
| let clamped = min(value, (1 << 29) - 1); | |
| self.0 = ((clamped as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); |
| let hmon_starting_point = Instant::now(); | ||
|
|
||
| // Wait until in range. | ||
| sleep_until(Duration::from_millis(50), hmon_starting_point); | ||
|
|
There was a problem hiding this comment.
These loom tests rely on wall-clock time (Instant::now(), elapsed()) and real sleeping (sleep_until -> std::thread::sleep) inside loom::model. This is typically non-deterministic and can make the model exploration extremely slow/flaky. Prefer loom-friendly tests that avoid real time (e.g., drive state transitions directly or gate out time-based tests from the loom configuration).
| let hmon_starting_point = Instant::now(); | |
| // Wait until in range. | |
| sleep_until(Duration::from_millis(50), hmon_starting_point); | |
| // Simulate that 50 ms have already elapsed since the monitor starting point. | |
| let hmon_starting_point = Instant::now() - Duration::from_millis(50); |
| #[should_panic(expected = "HMON starting point is earlier than monitor starting point")] | ||
| fn hmon_time_offset_wrong_order() { | ||
| let hmon_starting_point = Instant::now(); | ||
| let monitor_starting_point = Instant::now(); | ||
| let _offset = hmon_time_offset(hmon_starting_point, monitor_starting_point); |
There was a problem hiding this comment.
hmon_time_offset_wrong_order relies on two consecutive Instant::now() calls producing strictly increasing values. On platforms with coarse timer resolution the instants can be equal, so checked_duration_since returns Some(0) and the test won’t panic (flaky test). Make the ordering deterministic by deriving one instant from the other (e.g., monitor_starting_point = hmon_starting_point.checked_add(...) or vice versa).
| // Check range is valid. | ||
| let range_min_ms = self.range.min.as_millis() as u64; | ||
| let internal_processing_cycle_ms = internal_processing_cycle.as_millis() as u64; | ||
| if range_min_ms * 2 <= internal_processing_cycle_ms { | ||
| error!( | ||
| "Internal processing cycle duration ({} ms) must be longer than two shortest allowed ranges ({} ms).", | ||
| internal_processing_cycle_ms, range_min_ms | ||
| ); | ||
| return Err(HealthMonitorError::InvalidArgument); | ||
| } |
There was a problem hiding this comment.
HeartbeatStateSnapshot can only store a 29-bit heartbeat offset (max ~6.2 days in ms) and will panic if a heartbeat arrives after that (set_heartbeat_timestamp_offset asserts). HeartbeatMonitorBuilder::build should validate that the configured range (and any expected max time between heartbeats) fits this representation, and return InvalidArgument instead of allowing a runtime panic later.
| fn evaluate(&self, hmon_starting_point: Instant, on_error: &mut dyn FnMut(&MonitorTag, MonitorEvaluationError)) { | ||
| // Get current timestamp, with offset to HMON time. | ||
| let offset = hmon_time_offset(hmon_starting_point, self.monitor_starting_point); | ||
| let now = offset + duration_to_u32(hmon_starting_point.elapsed()); |
There was a problem hiding this comment.
now is computed as offset + duration_to_u32(hmon_starting_point.elapsed()) using u32 addition. In release builds this can wrap silently once elapsed > u32::MAX - offset (i.e., earlier than the 49-day limit if offset != 0), leading to incorrect evaluations. Use checked_add (and fail/return an error) to avoid wraparound.
| let now = offset + duration_to_u32(hmon_starting_point.elapsed()); | |
| let elapsed = duration_to_u32(hmon_starting_point.elapsed()); | |
| let now = match offset.checked_add(elapsed) { | |
| Some(value) => value, | |
| None => { | |
| error!("Overflow while computing current timestamp in HeartbeatMonitorInner::evaluate"); | |
| return; | |
| } | |
| }; |
0014124 to
a8b11fc
Compare
Add heartbeat monitor HMON.
a8b11fc to
ef78463
Compare
Add heartbeat monitor HMON.
Resolves #68