Subsequent signature posit round are slow

## Problem

If a signature generator times out, it takes a long time for the next posit round to succeed.

## Background 

When a signature generator times out, we try to generate the same signature with a new set of participants. This timeout case cannot be avoided, as we cannot guarantee participants to stay online.

A single `SignTask` is used through all retries. The phase can switch back from `SignPhase::Generating` back to `SignPhase::Organizing` due to retries. See loop below, in the last line, the phase is adjusted arbitrarily.

https://github.com/sig-net/mpc/blob/5f2bbb90184843f73a03f09dddb12a9b2d54199e/chain-signatures/node/src/protocol/signature.rs#L1051-L1083

We preserve the `SignState` across retries but clean it up / refresh it as necessary.

https://github.com/sig-net/mpc/blob/5f2bbb90184843f73a03f09dddb12a9b2d54199e/chain-signatures/node/src/protocol/signature.rs#L70-L87

New incoming cait-sith messages are identified by the signature request id + presignature id, which avoids conflicts between retries.

However, there seems to be an issue that makes the next posit rounds fail, after a generator was aborted. For example, running `test_sign_contention_5_nodes`.

```
# (a new posit proposer timeout every 20s)
2026-03-05T10:05:24.559855Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=1 threshold=4
2026-03-05T10:05:44.663955Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=1 threshold=4
2026-03-05T10:06:04.666168Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=1 threshold=4
2026-03-05T10:06:24.667963Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=2 threshold=4
2026-03-05T10:06:36.718891Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=2 threshold=4
2026-03-05T10:07:04.670150Z WARN mpc_node::protocol::signature: proposer posit deadline reached, expiring round sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") accepts=3 threshold=4

# (finally we have another established posit)
2026-03-05T10:07:12.762125Z INFO mpc_node::protocol::signature: proposer broadcasting Start sign_id=SignId("0101010101010101010101010101010101010101010101010101010101010101") round=7 me=Participant(3) participants=[Participant(0), Participant(2), Participant(1), Participant(3)]
```

Notice how the low number of accepts and how it slowly increases with each new attempt.  This is unexpected.

## Task

Identify why posits do are misaligned after a generator is aborted and resolve the issue.

We also need a test that replicates this specific case consistently. (`test_sign_contention_5_nodes` runs into it by accident due to other issues.)

	loop {
	// Check if we should abort due to resharing or epoch change
	if let Some(contract_state) = self.contract.state() {
	match contract_state {
	crate::protocol::ProtocolState::Resharing(_) => {
	tracing::info!(
	?sign_id,
	epoch = task_epoch,
	"signature task interrupted: contract is resharing"
	);
	return Err(SignError::Aborted);
	}
	crate::protocol::ProtocolState::Running(running)
	if running.epoch != task_epoch =>
	{
	tracing::info!(
	?sign_id,
	old_epoch = task_epoch,
	new_epoch = running.epoch,
	"signature task interrupted: epoch changed"
	);
	return Err(SignError::Aborted);
	}
	_ => {}
	}
	}

	phase = match phase.advance(&self, &mut state, &mut task_rx).await {
	SignPhase::Complete(result) => return result,
	other => other,
	}
	}
	}

	struct SignState {
	round: usize,
	indexed: IndexedSignRequest,
	mesh_state: watch::Receiver<MeshState>,
	/// Budget for the current organizing+posit attempt.
	budget: TimeoutBudget,
	/// The highest round sent by a peer
	highest_seen_round: usize,
	/// Posit message for `highest_seen_round` round.
	///
	/// These are later processed, if the task reaches the `highest_seen_round`
	/// as a deliberator. Proposers do not reprocess old messages. A valid peer
	/// would not have sent a posit message before the proposer proposes.
	///
	/// INVARIANT: All messages stored here are for `highest_seen_round`. Must
	/// be cleared when `highest_seen_round` changes.
	buffered_messages: VecDeque<SignTaskMessage>,
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsequent signature posit round are slow #689

Problem

Background

Task

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Subsequent signature posit round are slow #689

Description

Problem

Background

Task

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions