Skip to content

fix(framework): Make message delivery safe for multiple LinkState replicas#6785

Open
psfoley wants to merge 14 commits intomainfrom
atomic-message-delivery
Open

fix(framework): Make message delivery safe for multiple LinkState replicas#6785
psfoley wants to merge 14 commits intomainfrom
atomic-message-delivery

Conversation

@psfoley
Copy link
Member

@psfoley psfoley commented Mar 17, 2026

Issue

Description

Related issues/PRs

Proposal

Explanation

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

Any other comments?

Signed-off-by: Patrick Foley <patrick@flower.ai>
Copilot AI review requested due to automatic review settings March 17, 2026 21:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SQL-backed LinkState message retrieval logic to prevent duplicate delivery when multiple SuperLink/LinkState replicas concurrently pull the same instruction or reply from a shared SQLite database.

Changes:

  • Make get_message_ins claim instructions more safely by only updating rows that are still undelivered (delivered_at = '') and relying on RETURNING to fetch the claimed rows.
  • Make get_message_res atomically claim eligible reply messages via a single UPDATE ... RETURNING instead of SELECT then UPDATE.
  • Add concurrency tests to validate that concurrent replicas cannot both claim the same instruction/reply.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py Tightens delivery semantics by atomically claiming replies and preventing double-claim of instructions via delivered_at guards.
framework/py/flwr/server/superlink/linkstate/linkstate_test.py Adds multi-replica concurrency tests for unique claiming of instruction and reply messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Patrick Foley <patrick@flower.ai>
@github-actions github-actions bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Mar 17, 2026
Signed-off-by: Patrick Foley <patrick@flower.ai>
Signed-off-by: Patrick Foley <patrick@flower.ai>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes SQL-backed message delivery safe under contention when multiple SuperLink LinkState replicas share the same SQLite DB, by turning “read then mark delivered” into an atomic “claim and return” operation.

Changes:

  • Atomically “claim” instruction messages in get_message_ins via UPDATE ... SET delivered_at ... RETURNING (including a deterministic subquery when limit is set).
  • Atomically “claim” reply messages in get_message_res via UPDATE ... SET delivered_at ... RETURNING.
  • Add concurrency tests validating uniqueness of claims across two SqlLinkState instances sharing the same database file.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py Switches message fetching to atomic claim-and-return updates to prevent duplicate delivery across replicas.
framework/py/flwr/server/superlink/linkstate/linkstate_test.py Adds multi-threaded tests exercising concurrent message claiming across two SQL linkstate replicas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Patrick Foley <patrick@flower.ai>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SQLite-backed SqlLinkState message retrieval logic to make instruction/reply delivery safe under concurrent access from multiple LinkState replicas, and adds regression tests to validate unique claiming behavior under contention.

Changes:

  • Make get_message_ins atomically “claim” messages by UPDATE ... SET delivered_at ... RETURNING * (with a follow-up re-read after validity checks).
  • Make get_message_res atomically “claim” reply messages via UPDATE ... RETURNING *, removing the non-atomic select-then-update flow.
  • Add multi-threaded tests using a shared SQLite file to ensure unique claiming and work distribution across two replicas.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py Switches message retrieval to atomic claim semantics using UPDATE ... RETURNING for safe multi-replica delivery.
framework/py/flwr/server/superlink/linkstate/linkstate_test.py Adds concurrent replica tests to validate unique claiming and contention behavior against a shared SQLite DB.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

psfoley and others added 8 commits March 17, 2026 17:34
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Patrick Foley <patrick@flower.ai>
@msheller
Copy link
Member

@danieljanes @psfoley , this implementation looks correct to me. I'm in favor of merging it. Once we have this and the running state issue addressed, we should run another analysis on concurrency risks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants