Skip to content

Conversation

diegomrsantos
Copy link
Member

No description provided.

diegomrsantos and others added 4 commits October 15, 2025 22:18
Add configuration options for operator doppelgänger protection:
- --operator-dg: Enable/disable the feature (default: true)
- --operator-dg-wait-epochs: Epochs to wait in monitor mode (default: 2)
- --operator-dg-fresh-k: Freshness threshold for twin detection (default: 3)

This implements the configuration layer for issue sigp#627, allowing operators
to detect if their operator ID is already active elsewhere and shut down
to prevent equivocation.

Related to sigp#627
Add a new service module for detecting operator doppelgängers:
- State machine with Monitor/Active modes
- Track recent max consensus height per committee
- Freshness threshold (K) to prevent false positives from replays
- Check for single-signer messages with own operator ID
- Comprehensive unit tests for state machine logic

The service detects if the operator's ID appears in fresh SSV messages
during monitor mode, indicating another instance is already running.

Part of sigp#627
Integrates the operator doppelgänger protection with the message flow:
- Adds DoppelgangerConfig struct to message_receiver for checking messages
- NetworkMessageReceiver checks QBFT messages for doppelgänger detection
- Client initializes doppelgänger service when operator_dg is enabled
- Fatal shutdown triggered via TaskExecutor when twin operator detected
- Spawns background task to listen for shutdown signal
- Tests updated to use correct CommitteeId constructor ([u8; 32])
- Allows clippy::too_many_arguments for NetworkMessageReceiver::new

This implements the core detection and shutdown mechanism specified in
sigp#627 (comment)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment on lines +475 to +477
let own_operator_id = operator_id
.get()
.ok_or_else(|| "Operator ID not yet available".to_string())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash all nodes on first startup, as they are not synced yet

Comment on lines +512 to +522
#[clap(
long,
value_name = "HEIGHTS",
help = "The freshness threshold for detecting operator twins. Only messages within \
this many consensus heights from the maximum observed height are considered \
fresh evidence of a twin. This prevents false positives from replayed old messages.",
display_order = 0,
default_value_t = 3,
requires = "operator_dg"
)]
pub operator_dg_fresh_k: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is necessary. Have you tested this and confirmed there to be an issue without this feature?

slot_clock.clone(),
config.operator_dg_wait_epochs,
config.operator_dg_fresh_k,
true, // enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only ever pass true here, we might as well remove the parameter

Comment on lines +182 to +199
if let Some(config) = &receiver.doppelganger_config
&& (config.checker)(&signed_ssv_message, &qbft_message)
{
error!(
gossipsub_message_id = ?message_id,
ssv_msg_id = ?msg_id,
"Operator doppelgänger detected! Triggering shutdown."
);

// Trigger shutdown - we'll only do this once
if let Ok(mut guard) = config.shutdown_tx.lock()
&& let Some(tx) = guard.take()
{
let _ = tx.send(());
}

return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about partial signature messages?

default_value_t = 2,
requires = "operator_dg"
)]
pub operator_dg_wait_epochs: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not actually waiting anywhere

@diegomrsantos
Copy link
Member Author

It's still a heavily drafted PR; it is not ready for review

@diegomrsantos diegomrsantos self-assigned this Oct 16, 2025
Follows codebase pattern of handling slot_clock.now() returning None
explicitly rather than silently falling back to Epoch 0. The current
epoch is now required as a parameter to the service constructor,
following the pattern used by other services in the codebase.
Add explicit error handling when slot_clock.now() returns None.
If we can't read the current slot, we can't reliably determine
the epoch or update the mode, so we skip the doppelgänger check
and log a warning.
Replace RwLock with Mutex for DoppelgangerState since all operations
are fast (HashMap lookups/updates) and the RwLock complexity isn't
justified.

Changes:
- Replace Arc<RwLock<DoppelgangerState>> with Arc<Mutex<DoppelgangerState>>
- Add update_and_check_freshness() method that atomically updates max
  height and checks freshness in one lock acquisition
- Make update_max_height() and is_fresh() private helper methods
- Simplify check_message() logic by removing drop/re-acquire pattern

This provides cleaner API surface with better separation of concerns:
- State handles data operations
- Service handles policy decisions
Stale messages during doppelgänger monitoring are expected (network
delays, replays) and not actionable. Using debug level reduces noise
while keeping the information available for troubleshooting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants