Conversation
- extract shared dummy triple/presignature helpers - move triple insertion/assert helper usage into shared module - move participants test helper into shared helpers
| .insert(dummy_pair(id), node) | ||
| .await; | ||
| #[test_log::test(tokio::test)] | ||
| async fn test_state_sync_e2e() { |
There was a problem hiding this comment.
For now, I'm using a simple Integration test for State sync. I may work on the component layer implementation later.
|
After I removed the union of Overall, the purpose of Each T or P is going through "generating" -> "stored" -> "used" lifecycle, and |
| const SYNC_RESPONSE_TIMEOUT: Duration = Duration::from_secs(5); | ||
|
|
||
| /// Timeout for the entire broadcast operation (waiting for all peers to respond) | ||
| const BROADCAST_TIMEOUT: Duration = Duration::from_secs(10); |
There was a problem hiding this comment.
10s feels a bit short, sync operations can require many DB reads for large state.
But we can try it and increase it if we run into it.
| /// Original protocol participants | ||
| pub participants: Vec<Participant>, | ||
| /// Nodes still holding their share of the artifact | ||
| pub holders: Option<Vec<Participant>>, |
There was a problem hiding this comment.
Why do we need to track these separately? Could we not just remove non-holders from the participants list?
There was a problem hiding this comment.
We absolutely can. But I wanted to distinguish holders and participants. Mostly for debugging and storage analysis. Also, holders are not a part of the serialized object to avoid deserialization/serailzation of each artifact.
There was a problem hiding this comment.
Okay, it just seemed like a lot of extra complexity with the separate tracking in Redis. But if you see enough value in having it, I have no problem with it.
IIRC, reserved is to track ids that are in the generating state. I suggest renaming or removing it if we can track "generating" in other ways. |
|
Here is the implementation of pub async fn reserve(&self, id: A::Id) -> Option<ArtifactSlot<A>> {
let used = self.used.read().await;
if used.contains(&id) {
return None;
}
if !self.reserved.write().await.insert(id) {
return None;
}
drop(used);
let start = Instant::now();
let Some(mut conn) = self.connect().await else {
self.reserved.write().await.remove(&id);
return None;
};
// Check directly whether the artifact is already stored in Redis.
let artifact_exists: Result<bool, _> = conn.hexists(&self.artifact_key, id).await;
let elapsed = start.elapsed();
crate::metrics::storage::REDIS_LATENCY
.with_label_values(&[A::METRIC_LABEL, "reserve"])
.observe(elapsed.as_millis() as f64);
match artifact_exists {
Ok(true) => {
// artifact already stored, reserve cannot be done, remove reservation
self.reserved.write().await.remove(&id);
None
}
// artifact does not exist, reservation successful
Ok(false) => Some(ArtifactSlot {
id,
storage: self.clone(),
stored: false,
}),
Err(err) => {
self.reserved.write().await.remove(&id);
tracing::warn!(id, ?err, ?elapsed, "failed to reserve artifact");
None
}
}
}I'm afraid it is much more complicated than just "generating". I'm looking into it now, but I want to address it separately. |
| })?; | ||
|
|
||
| owned.union(&*self.reserved.read().await).copied().collect() | ||
| Ok(owned.into_iter().collect()) |
There was a problem hiding this comment.
Hm, so if we move ahead with this change, we open the door for the race condition @ChaoticTempest described here #649 (comment)
But I guess since all tests pass, it is no too common. I say we can merge this as-is and address the race condition in a following PR.
There was a problem hiding this comment.
Yes, reserved does not represent "owned by me now". For triples that are generating, that is not even possible. I'm looking into that now, not sure if that is a real concern.
Yes, it manages exclusive access to a redis entry. We should probably keep that as it is. Even if we can simplify it, I wouldn't do that together with these changes. However, with respect to state sync, what we have marked as "reserved" can be treated the same as "Generating" or "Available". (See this table: https://github.com/sig-net/mpc/blob/develop/doc/mpc_node_specification.md#non-owner-action-on-state-sync) So, the existing union kind of makes sense. But the problem is (as you pointed out here that it also includes non-owned entries. For Ts, until generation is done, we simply don't know who will be the owner. Maybe instead of a union, we should sync reserved ids in a separate list. The peer will then know not to delete local Ts but otherwise can ignore the list of reserved ids. |
not_foundartifacts.participantsandholders. Participants are those who participated in the generation, and holders are those who still have the shares. The list of participants is not used anywhere, but I decided to keep it. It can be useful for debugging, etc.fetch_ownednow returns Result, so we can get an empty list of owned artifacts and send it to other nodesWe need to decide whether we want to include generating/reserved/used in the state sync. Details: #671 (comment) Can be addressed separately.