Bidirectional Sync (Part 2) by volovyks · Pull Request #671 · sig-net/mpc

volovyks · 2026-02-26T14:30:49Z

Added bidirectional state sync. Now, when A calls B, B returns a list of Ids that were not found in node B's storage. Node A is removing node B from holders of these not_found artifacts.
Now we distinguish participants and holders. Participants are those who participated in the generation, and holders are those who still have the shares. The list of participants is not used anywhere, but I decided to keep it. It can be useful for debugging, etc.
Holders are not a part of the centralized artifact for efficiency.
fetch_owned now returns Result, so we can get an empty list of owned artifacts and send it to other nodes
While processing the sync response, Node A will prune artifacts if the number of holders is < T. If any of the artifacts were pruned, we may want to run the sync again to remove them from the remaining holders list, but I avoided that complexity for now.
Now the sync process is not considered completed if any of the steps fail
I've removed artifact reinsertion and added a check for the number of active participants when stockpiling presignatures to prevent wasting them (such a check was already added for triples)

We need to decide whether we want to include generating/reserved/used in the state sync. Details: #671 (comment) Can be addressed separately.

- extract shared dummy triple/presignature helpers - move triple insertion/assert helper usage into shared module - move participants test helper into shared helpers

chain-signatures/node/src/protocol/sync/mod.rs

chain-signatures/node/src/storage/protocol_storage.rs

volovyks · 2026-02-26T14:39:50Z

integration-tests/tests/cases/sync.rs

-            .insert(dummy_pair(id), node)
-            .await;
+#[test_log::test(tokio::test)]
+async fn test_state_sync_e2e() {


For now, I'm using a simple Integration test for State sync. I may work on the component layer implementation later.

volovyks · 2026-03-09T17:49:48Z

After I removed the union of owned and reserved in State Sync, the tests are passing successfully (except for cases::mpc::test_sign_contention_5_nodes).
That is expected, since reserved includes Ids that are not owned by this node.

Overall, the purpose of used, reserved, and ArtifactSlot is not fully clear to me. We need to at least fully document it.

Each T or P is going through "generating" -> "stored" -> "used" lifecycle, and reserved or any other additions adds complexity that may not be required.

jakmeier · 2026-03-12T10:43:36Z

chain-signatures/node/src/protocol/sync/mod.rs

 const SYNC_RESPONSE_TIMEOUT: Duration = Duration::from_secs(5);

+/// Timeout for the entire broadcast operation (waiting for all peers to respond)
+const BROADCAST_TIMEOUT: Duration = Duration::from_secs(10);


10s feels a bit short, sync operations can require many DB reads for large state.
But we can try it and increase it if we run into it.

jakmeier · 2026-03-12T10:49:27Z

chain-signatures/node/src/protocol/presignature.rs

+    /// Original protocol participants
    pub participants: Vec<Participant>,
+    /// Nodes still holding their share of the artifact
+    pub holders: Option<Vec<Participant>>,


Why do we need to track these separately? Could we not just remove non-holders from the participants list?

We absolutely can. But I wanted to distinguish holders and participants. Mostly for debugging and storage analysis. Also, holders are not a part of the serialized object to avoid deserialization/serailzation of each artifact.

Okay, it just seemed like a lot of extra complexity with the separate tracking in Redis. But if you see enough value in having it, I have no problem with it.

chain-signatures/node/src/protocol/signature.rs

…nts" This reverts commit 55a5004.

jakmeier · 2026-03-12T11:08:16Z

After I removed the union of owned and reserved in State Sync, the tests are passing successfully (except for cases::mpc::test_sign_contention_5_nodes). That is expected, since reserved includes Ids that are not owned by this node.

Overall, the purpose of used, reserved, and ArtifactSlot is not fully clear to me. We need to at least fully document it.

Each T or P is going through "generating" -> "stored" -> "used" lifecycle, and reserved or any other additions adds complexity that may not be required.

IIRC, reserved is to track ids that are in the generating state. I suggest renaming or removing it if we can track "generating" in other ways.

volovyks · 2026-03-12T11:23:50Z

Here is the implementation of reserve():

pub async fn reserve(&self, id: A::Id) -> Option<ArtifactSlot<A>> {
        let used = self.used.read().await;
        if used.contains(&id) {
            return None;
        }
        if !self.reserved.write().await.insert(id) {
            return None;
        }
        drop(used);

        let start = Instant::now();
        let Some(mut conn) = self.connect().await else {
            self.reserved.write().await.remove(&id);
            return None;
        };

        // Check directly whether the artifact is already stored in Redis.
        let artifact_exists: Result<bool, _> = conn.hexists(&self.artifact_key, id).await;
        let elapsed = start.elapsed();
        crate::metrics::storage::REDIS_LATENCY
            .with_label_values(&[A::METRIC_LABEL, "reserve"])
            .observe(elapsed.as_millis() as f64);

        match artifact_exists {
            Ok(true) => {
                // artifact already stored, reserve cannot be done, remove reservation
                self.reserved.write().await.remove(&id);
                None
            }
            // artifact does not exist, reservation successful
            Ok(false) => Some(ArtifactSlot {
                id,
                storage: self.clone(),
                stored: false,
            }),
            Err(err) => {
                self.reserved.write().await.remove(&id);
                tracing::warn!(id, ?err, ?elapsed, "failed to reserve artifact");
                None
            }
        }
    }

I'm afraid it is much more complicated than just "generating". I'm looking into it now, but I want to address it separately.

jakmeier · 2026-03-12T15:58:42Z

chain-signatures/node/src/storage/protocol_storage.rs

+            })?;

-        owned.union(&*self.reserved.read().await).copied().collect()
+        Ok(owned.into_iter().collect())


Hm, so if we move ahead with this change, we open the door for the race condition @ChaoticTempest described here #649 (comment)

But I guess since all tests pass, it is no too common. I say we can merge this as-is and address the race condition in a following PR.

Yes, reserved does not represent "owned by me now". For triples that are generating, that is not even possible. I'm looking into that now, not sure if that is a real concern.

jakmeier · 2026-03-12T16:00:40Z

I'm afraid it is much more complicated than just "generating". I'm looking into it now, but I want to address it separately.

Yes, it manages exclusive access to a redis entry. We should probably keep that as it is. Even if we can simplify it, I wouldn't do that together with these changes.

However, with respect to state sync, what we have marked as "reserved" can be treated the same as "Generating" or "Available". (See this table: https://github.com/sig-net/mpc/blob/develop/doc/mpc_node_specification.md#non-owner-action-on-state-sync)

So, the existing union kind of makes sense. But the problem is (as you pointed out here that it also includes non-owned entries.

For Ts, until generation is done, we simply don't know who will be the owner. Maybe instead of a union, we should sync reserved ids in a separate list. The peer will then know not to delete local Ts but otherwise can ignore the list of reserved ids.

volovyks added 6 commits February 24, 2026 13:48

do not treat failed syncs as successfull

e5d6db9

broadcast_sync refactor, bug fixes, dummy storage layer sync processing

abcbd99

remove sync unit tests (tmp)

5840d53

naive prune and bidyrectional sync test implementation

97bb642

refactor helpers, remove old sync test

a33b416

- extract shared dummy triple/presignature helpers - move triple insertion/assert helper usage into shared module - move participants test helper into shared helpers

e2e sync integration test

aab9394

volovyks commented Feb 26, 2026

View reviewed changes

chain-signatures/node/src/protocol/sync/mod.rs Show resolved Hide resolved

volovyks commented Feb 26, 2026

View reviewed changes

chain-signatures/node/src/storage/protocol_storage.rs Outdated Show resolved Hide resolved

volovyks commented Feb 26, 2026

View reviewed changes

volovyks added 11 commits February 26, 2026 17:00

clippy

78a46c4

Merge branch 'develop' into serhii/sync-to-active-fix

7c5fa6e

Merge branch 'develop' into serhii/sync-to-active-fix

432e33b

use Redis to manage holders

d5bb994

distinguish participands and holders

e5f1dc3

clippy

69db15c

Merge branch 'develop' into serhii/sync-to-active-fix

c5412ca

set holders in tets

d93b951

return Result on fetch_owned

847e369

define type for sync response

46b1d6d

do not recycle presignatures, add a guard on active participants

55a5004

volovyks mentioned this pull request Mar 4, 2026

fix: fetch_owned should not include reserved #649

Open

volovyks added 3 commits March 5, 2026 13:56

Merge branch 'develop' into serhii/sync-to-active-fix

37851a5

Merge branch 'develop' into serhii/sync-to-active-fix

0af3865

do not add reserved to owned during sync

24c373d

volovyks requested review from ChaoticTempest, jakmeier and ppca March 9, 2026 17:43

volovyks marked this pull request as ready for review March 9, 2026 17:43

volovyks mentioned this pull request Mar 12, 2026

Stop premature burning of presignatures #472

Open

jakmeier reviewed Mar 12, 2026

View reviewed changes

volovyks added 2 commits March 12, 2026 13:02

Revert "do not recycle presignatures, add a guard on active participa…

52138a8

…nts" This reverts commit 55a5004.

Merge branch 'develop' into serhii/sync-to-active-fix

8dc0c3a

Merge branch 'develop' into serhii/sync-to-active-fix

1b207be

volovyks requested a review from jakmeier March 12, 2026 15:00

jakmeier approved these changes Mar 12, 2026

View reviewed changes

volovyks merged commit 94a22a7 into develop Mar 12, 2026
3 of 4 checks passed

volovyks deleted the serhii/sync-to-active-fix branch March 12, 2026 16:08

Conversation

volovyks commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

volovyks Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

volovyks commented Mar 9, 2026

Uh oh!

jakmeier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jakmeier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

volovyks Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jakmeier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakmeier commented Mar 12, 2026

Uh oh!

volovyks commented Mar 12, 2026

Uh oh!

jakmeier Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

volovyks Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jakmeier commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

volovyks commented Feb 26, 2026 •

edited

Loading