Limit concurrent SignTasks

(Continuing a discussion I had with @volovyks today.)

## Problem

Too many concurrent active sign tasks can overwhelm the system. In practice, this is most likely to happen right after a reboot, when recovering from the backlog and catching up.

However, limiting signature tasks (bidirectional or not) is more challenging than limiting Presignature and Triple generating tasks.

The current architecture guarantees:

- Every node always has an active `SignTask` for a request that has been indexed until the the response has been indexed
- A `SignTask` keeps timing out and retrying until a delivered signature is confirmed. Positor / deliberator roles may switch between retries.

This is great to ensure we don't drop requests. But it assumes infinite capacity for handling concurrent tasks.

In this architecture, if we decide to only have a fixed number of active `SignTask`s, it is unclear to me how we would coordinate among nodes which requests are currently active. If we don't coordinate that, it will be up to chance to have a threshold of nodes ready to work on the same request.


## Brainstorming

A possible resolution could be to only limit how many `SignTask`s are active with the node as the proposer. And allow an unlimited number of tasks where we are deliberator. Messages meant for deliberators can thus always be processed and reactivate a sleeping task. The total active tasks would still be limited, assuming all nodes behave well.

But just slapping that on top of what we have makes our posit handling even more complex... I don't like the direction we are going here. We have too many ifs inside the `SignTask` already.


Maybe there is a design that simplifies the current setup and allows limiting the concurrent tasks. This is really just brainstorming:

- Separate posit and generator tasks: `SignatureSpawner` maintains `SignProposeTask` and `SignGeneratorTask` instead of `SignTask`.
- Separate deliberator / proposer handling logic: A `SignProposeTask` only contains the posit logic we need to do when acting as proposer. (should simplify it quite a bit compared to what is now in `SignTask`)
- Incoming messages where we are deliberator can be handled stateless, directly inside the SignatureSpawner. To decide on accept / reject, directly read the db / backlog and active tasks. This should be a side-effect free function that doesn't need to maintain any state for sent posit messages. We might accept multiple proposals for the same signature but that shouldn't be a problem.
- The SignatureSpawner spawns a `SignGeneratorTask` when it receives a START message from another node. We may even have multiple parallel sigantures ongoing for the same request. Not ideal but better than failing to produce a signature.
- The SignatureSpawner decides when it is time for a node to try (or retry) signing as a proposer. It can look at ongoing protocols where we are deliberators to make this decision. When proposing succeeds, it also spawns a `SignGeneratorTask`.

I believe this would reduce the overall complexity and avoid many of the posit problems we face today.

Scheduling (= deciding which tasks we should be proposer for) becomes a local decision of the `SignatureSpawner` that does not need to be synchronize with other nodes.

Overlapping attempts from different proposers are now handled in parallel. This increases the chance that we produce too many signatures. But it completely removes the race-conditions that we have seen on devnet which stopped posits from going through. We can even remove the per-round message buffer that I added in an attempt to deal with overlapping rounds in a brute-force way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit concurrent SignTasks #731

Problem

Brainstorming

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Limit concurrent SignTasks #731

Description

Problem

Brainstorming

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions