fix(framework): Make message delivery safe for multiple LinkState replicas#6785
fix(framework): Make message delivery safe for multiple LinkState replicas#6785
Conversation
Signed-off-by: Patrick Foley <patrick@flower.ai>
There was a problem hiding this comment.
Pull request overview
This PR updates the SQL-backed LinkState message retrieval logic to prevent duplicate delivery when multiple SuperLink/LinkState replicas concurrently pull the same instruction or reply from a shared SQLite database.
Changes:
- Make
get_message_insclaim instructions more safely by only updating rows that are still undelivered (delivered_at = '') and relying onRETURNINGto fetch the claimed rows. - Make
get_message_resatomically claim eligible reply messages via a singleUPDATE ... RETURNINGinstead ofSELECTthenUPDATE. - Add concurrency tests to validate that concurrent replicas cannot both claim the same instruction/reply.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py |
Tightens delivery semantics by atomically claiming replies and preventing double-claim of instructions via delivered_at guards. |
framework/py/flwr/server/superlink/linkstate/linkstate_test.py |
Adds multi-replica concurrency tests for unique claiming of instruction and reply messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Signed-off-by: Patrick Foley <patrick@flower.ai>
Signed-off-by: Patrick Foley <patrick@flower.ai>
Signed-off-by: Patrick Foley <patrick@flower.ai>
There was a problem hiding this comment.
Pull request overview
This PR makes SQL-backed message delivery safe under contention when multiple SuperLink LinkState replicas share the same SQLite DB, by turning “read then mark delivered” into an atomic “claim and return” operation.
Changes:
- Atomically “claim” instruction messages in
get_message_insviaUPDATE ... SET delivered_at ... RETURNING(including a deterministic subquery whenlimitis set). - Atomically “claim” reply messages in
get_message_resviaUPDATE ... SET delivered_at ... RETURNING. - Add concurrency tests validating uniqueness of claims across two
SqlLinkStateinstances sharing the same database file.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py |
Switches message fetching to atomic claim-and-return updates to prevent duplicate delivery across replicas. |
framework/py/flwr/server/superlink/linkstate/linkstate_test.py |
Adds multi-threaded tests exercising concurrent message claiming across two SQL linkstate replicas. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Signed-off-by: Patrick Foley <patrick@flower.ai>
There was a problem hiding this comment.
Pull request overview
This PR updates the SQLite-backed SqlLinkState message retrieval logic to make instruction/reply delivery safe under concurrent access from multiple LinkState replicas, and adds regression tests to validate unique claiming behavior under contention.
Changes:
- Make
get_message_insatomically “claim” messages byUPDATE ... SET delivered_at ... RETURNING *(with a follow-up re-read after validity checks). - Make
get_message_resatomically “claim” reply messages viaUPDATE ... RETURNING *, removing the non-atomic select-then-update flow. - Add multi-threaded tests using a shared SQLite file to ensure unique claiming and work distribution across two replicas.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py |
Switches message retrieval to atomic claim semantics using UPDATE ... RETURNING for safe multi-replica delivery. |
framework/py/flwr/server/superlink/linkstate/linkstate_test.py |
Adds concurrent replica tests to validate unique claiming and contention behavior against a shared SQLite DB. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Patrick Foley <patrick@flower.ai>
|
@danieljanes @psfoley , this implementation looks correct to me. I'm in favor of merging it. Once we have this and the running state issue addressed, we should run another analysis on concurrency risks. |
Issue
Description
Related issues/PRs
Proposal
Explanation
Checklist
#contributions)Any other comments?