-
Notifications
You must be signed in to change notification settings - Fork 24
feat: Operator Doppelgänger Protection #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
feat: Operator Doppelgänger Protection #692
Conversation
Add configuration options for operator doppelgänger protection: - --operator-dg: Enable/disable the feature (default: true) - --operator-dg-wait-epochs: Epochs to wait in monitor mode (default: 2) - --operator-dg-fresh-k: Freshness threshold for twin detection (default: 3) This implements the configuration layer for issue sigp#627, allowing operators to detect if their operator ID is already active elsewhere and shut down to prevent equivocation. Related to sigp#627
Add a new service module for detecting operator doppelgängers: - State machine with Monitor/Active modes - Track recent max consensus height per committee - Freshness threshold (K) to prevent false positives from replays - Check for single-signer messages with own operator ID - Comprehensive unit tests for state machine logic The service detects if the operator's ID appears in fresh SSV messages during monitor mode, indicating another instance is already running. Part of sigp#627
Integrates the operator doppelgänger protection with the message flow: - Adds DoppelgangerConfig struct to message_receiver for checking messages - NetworkMessageReceiver checks QBFT messages for doppelgänger detection - Client initializes doppelgänger service when operator_dg is enabled - Fatal shutdown triggered via TaskExecutor when twin operator detected - Spawns background task to listen for shutdown signal - Tests updated to use correct CommitteeId constructor ([u8; 32]) - Allows clippy::too_many_arguments for NetworkMessageReceiver::new This implements the core detection and shutdown mechanism specified in sigp#627 (comment) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
let own_operator_id = operator_id | ||
.get() | ||
.ok_or_else(|| "Operator ID not yet available".to_string())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash all nodes on first startup, as they are not synced yet
#[clap( | ||
long, | ||
value_name = "HEIGHTS", | ||
help = "The freshness threshold for detecting operator twins. Only messages within \ | ||
this many consensus heights from the maximum observed height are considered \ | ||
fresh evidence of a twin. This prevents false positives from replayed old messages.", | ||
display_order = 0, | ||
default_value_t = 3, | ||
requires = "operator_dg" | ||
)] | ||
pub operator_dg_fresh_k: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is necessary. Have you tested this and confirmed there to be an issue without this feature?
slot_clock.clone(), | ||
config.operator_dg_wait_epochs, | ||
config.operator_dg_fresh_k, | ||
true, // enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only ever pass true
here, we might as well remove the parameter
if let Some(config) = &receiver.doppelganger_config | ||
&& (config.checker)(&signed_ssv_message, &qbft_message) | ||
{ | ||
error!( | ||
gossipsub_message_id = ?message_id, | ||
ssv_msg_id = ?msg_id, | ||
"Operator doppelgänger detected! Triggering shutdown." | ||
); | ||
|
||
// Trigger shutdown - we'll only do this once | ||
if let Ok(mut guard) = config.shutdown_tx.lock() | ||
&& let Some(tx) = guard.take() | ||
{ | ||
let _ = tx.send(()); | ||
} | ||
|
||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about partial signature messages?
default_value_t = 2, | ||
requires = "operator_dg" | ||
)] | ||
pub operator_dg_wait_epochs: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not actually waiting anywhere
It's still a heavily drafted PR; it is not ready for review |
Follows codebase pattern of handling slot_clock.now() returning None explicitly rather than silently falling back to Epoch 0. The current epoch is now required as a parameter to the service constructor, following the pattern used by other services in the codebase.
Add explicit error handling when slot_clock.now() returns None. If we can't read the current slot, we can't reliably determine the epoch or update the mode, so we skip the doppelgänger check and log a warning.
Replace RwLock with Mutex for DoppelgangerState since all operations are fast (HashMap lookups/updates) and the RwLock complexity isn't justified. Changes: - Replace Arc<RwLock<DoppelgangerState>> with Arc<Mutex<DoppelgangerState>> - Add update_and_check_freshness() method that atomically updates max height and checks freshness in one lock acquisition - Make update_max_height() and is_fresh() private helper methods - Simplify check_message() logic by removing drop/re-acquire pattern This provides cleaner API surface with better separation of concerns: - State handles data operations - Service handles policy decisions
Stale messages during doppelgänger monitoring are expected (network delays, replays) and not actionable. Using debug level reduces noise while keeping the information available for troubleshooting.
No description provided.