Skip to content

[Workflow API] Enable participants in FederatedRuntime to review and approve plan#5

Open
nguptax1987 wants to merge 23 commits intodevelopfrom
review_experiment
Open

[Workflow API] Enable participants in FederatedRuntime to review and approve plan#5
nguptax1987 wants to merge 23 commits intodevelopfrom
review_experiment

Conversation

@nguptax1987
Copy link
Owner

SUMMARY

This proposal introduces a structured mechanism for reviewing and agreeing on plans for executing FL experiments in FederatedRuntime within the Workflow API.
This feature enables all participants, including the Director and Envoys, to review and approve an experiment plan before execution.

The Director, acting as the orchestrator, manages this review process to ensure consistency, trust, and readiness across the federation.
Additionally, the User/Data Scientist who submitted the experiment plan will be notified of the outcome (approval or rejection) on completion of the review process.

MOTIVATION

Currently in FederatedRuntime, users can initiate and execute Federated Machine Learning experiments without explicit approval from all participants (including Directors or Envoys).
While this approach allows for a swift action, the absence of formal agreement or review process may lead to the following challenges:

  • Trust Concerns: Directors and Envoy admins, as owners of sensitive data, may seek greater control and insights regarding the experiments being conducted.
    A formal review process is highly desirable for owners of sensitive data, allowing them to review the plan and enhance trust.

  • Risk of Execution Failures: If an Envoy is unprepared or disagrees with the proposed plan, it can result in execution failures.
    A structured agreement process can help ensure readiness and consensus among all parties involved.

SCOPE

  • This proposal is applicable only for the FederatedRuntime.
  • The scenario to dynamically add envoys to an ongoing experiment is out of scope of the current design.
  • Support for a configurable review policy that allows the user to control the outcome of the review process is not included in the current proposal.
    The current design is flexible and can be extended in the future if this feature is needed.

KEY REQUIREMENTS

Feature Configuration

  • Each participant in the Federation shall be able to configure this feature individually.
    The feature can be enabled or disabled by including an optional review_experiment flag in each participant's configuration (YAML) file.
    The behavior of review_experiment in each participant's configuration file shall be as follows:

    • review_experiment: false → plan is auto-approved
    • review_experiment: true → node-admin review and approval is required
  • Default: If the review_experiment setting is not present in the configuration, the participant defaults to manually approving the experiment
    (i.e., as if review_experiment: true).

    • This ensures stricter control over experiment execution in case this setting has been overlooked by the node admin.

CONFIGURATION FILE INTERFACE

  • To enable the plan review workflow, node-admin shall set the review_experiment flag to true in the configuration file as shown below:
settings:
  listen_host: localhost
  listen_port: 50050
  envoy_health_check_period: 5  # in seconds
  review_experiment: True

PERSISTENCE OF CONFIGURATION

  • Configuration is read by the participant at startup and applies to all the experiments conducted after the node is started.
    Changes to the configuration require a participant node to be restarted.

DIRECTOR AND ENVOY INTERFACE FOR REVIEWING AND APPROVING THE PLAN

  • If the review_experiment flag is enabled, the plan for the received experiment shall be printed on the console and the following interface will be presented to the Director and Envoys for reviewing and responding to the experiment plan.
image

REVIEW TIMELINE

  • If the review_experiment flag is set to true, Director and Envoy shall wait for manual approval (or rejection) of the plan.

NOTIFY PLAN REVIEW OUTCOME TO USER/DATA SCIENTIST

  • On completion of the review process, the User shall be provided a notification including:
    • Outcome of the review: Accepted / Rejected
    • Individual participants' approval/rejection status including timestamps (e.g., outcome as shown below)
image

EXPERIMENT EXECUTION ON CONSENSUS

  • The experiment shall only be executed when all participants have reviewed and accepted the plan.

PROPOSED APPROACH
The mechanism consists of two phases: Review Phase and Execution Phase.

Phase 1: Review Phase

Director Review

  • Director reviews the plan.
    • If the Director rejects the plan: The experiment is aborted immediately, and the User is notified of the outcome (Rejected) along with the Director's review status and timestamp.
    • If the Director approves the plan: The experiment is forwarded to all authorized Envoys for further review.

Envoy Review

  • Each Envoy receives the experiment plan for review and provides their response to the Director.

Director Consensus

  • Director waits for responses from all Envoys:
    • If all Envoys approve: The User is notified that the experiment was approved, including timestamps of responses by the Director and all Envoys. The experiment moves to the execution phase.
    • If any Envoy rejects: The User is notified that the plan was rejected, including timestamps of responses by the Director and all Envoys. The experiment is aborted, and cleanup is triggered on all participants.

Phase 2: Execution Phase

If the Director and all Envoys approve the experiment:

  • Director starts the Aggregator.
  • All Envoys are notified to run the Collaborator.

SCENARIOS

Scenario A: Director rejects the plan

image

Scenario B: Director approves the plan and (any) Envoy rejects the plan

image

Scenario C: ALL participants approve the plan

image

RISKS AND MITIGATION

Different combinations of participant approvals and rejections shall be validated thoroughly to ensure that an experiment proceeds only when all participants have approved the plan.


VALIDATION

Manual


DEMONSTRATION

Feature shall be demonstrated by modifying the existing
openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants