feat: add dynamic data seed option for flowertune examples by xiaoyanshen799 · Pull Request #6831 · flwrlabs/flower

xiaoyanshen799 · 2026-03-24T17:05:23Z

Issue

Description

When max_steps is used in FlowerTune LLM examples, clients repeatedly train on the same data subset every round due to a fixed default seed, causing the remaining data in each client's partition to never be seen during training.

Related issues/PRs

#6808

Proposal

Explanation

Added train.dynamic-data-seed to pyproject.toml to allow clients to cover more training data across rounds.
Default is false to maintain backward compatibility.
Updated client_app.py in each example to dynamically set data_seed when the train.dynamic-data-seed = true:
base_seed = int(training_arguments.data_seed or training_arguments.seed)
training_arguments.data_seed = base_seed + server_round - 1
All changes are applied consistently across all four FlowerTune LLM examples (finance, medical, code, general-nlp).

Closes #6808

Checklist

Implement proposed change
Write tests
Update documentation
Address LLM-reviewer comments, if applicable (e.g., GitHub Copilot)
Make CI checks pass
Ping maintainers on Slack (channel #contributions)

Any other comments?

Copilot

Pull request overview

Adds an opt-in configuration to vary the per-round data sampling seed in FlowerTune LLM example clients, addressing repeated training on the same early subset when max_steps is used.

Changes:

Introduces train.dynamic-data-seed = false (default off) in each FlowerTune LLM example pyproject.toml.
Updates each example client_app.py to (optionally) derive a per-round TrainingArguments.data_seed based on server-round.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
examples/flowertune-llm-medical/pyproject.toml	Adds `train.dynamic-data-seed` config toggle.
examples/flowertune-llm-medical/flowertune_medical/client_app.py	Optionally sets `data_seed` per round and reuses `server_round` variable.
examples/flowertune-llm-general-nlp/pyproject.toml	Adds `train.dynamic-data-seed` config toggle.
examples/flowertune-llm-general-nlp/flowertune_generalnlp/client_app.py	Optionally sets `data_seed` per round and reuses `server_round` variable.
examples/flowertune-llm-finance/pyproject.toml	Adds `train.dynamic-data-seed` config toggle.
examples/flowertune-llm-finance/flowertune_finance/client_app.py	Optionally sets `data_seed` per round and reuses `server_round` variable.
examples/flowertune-llm-code/pyproject.toml	Adds `train.dynamic-data-seed` config toggle.
examples/flowertune-llm-code/flowertune_code/client_app.py	Optionally sets `data_seed` per round and reuses `server_round` variable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T17:07:53Z

examples/flowertune-llm-medical/flowertune_medical/client_app.py

    training_arguments = TrainingArguments(**cfg.train.training_arguments)

+    if cfg.train.get("dynamic_data_seed", False):
+        base_seed = int(training_arguments.data_seed or training_arguments.seed)


base_seed = int(training_arguments.data_seed or training_arguments.seed) treats data_seed=0 as falsy and will incorrectly fall back to seed. Use an explicit is not None check (or getattr default) so 0 remains a valid seed value.

Suggested change

base_seed = int(training_arguments.data_seed or training_arguments.seed)

if training_arguments.data_seed is not None:

base_seed = int(training_arguments.data_seed)

elif training_arguments.seed is not None:

base_seed = int(training_arguments.seed)

else:

raise ValueError("Neither data_seed nor seed is set in TrainingArguments.")

Copilot · 2026-03-24T17:07:54Z

examples/flowertune-llm-general-nlp/flowertune_generalnlp/client_app.py

    training_arguments = TrainingArguments(**cfg.train.training_arguments)

+    if cfg.train.get("dynamic_data_seed", False):
+        base_seed = int(training_arguments.data_seed or training_arguments.seed)


base_seed = int(training_arguments.data_seed or training_arguments.seed) treats data_seed=0 as falsy and will incorrectly fall back to seed. Use an explicit is not None check (or getattr default) so 0 remains a valid seed value.

Suggested change

base_seed = int(training_arguments.data_seed or training_arguments.seed)

base_seed = int(

training_arguments.data_seed

if training_arguments.data_seed is not None

else training_arguments.seed

)

Copilot · 2026-03-24T17:07:54Z

examples/flowertune-llm-finance/flowertune_finance/client_app.py

    training_arguments = TrainingArguments(**cfg.train.training_arguments)

+    if cfg.train.get("dynamic_data_seed", False):
+        base_seed = int(training_arguments.data_seed or training_arguments.seed)


base_seed = int(training_arguments.data_seed or training_arguments.seed) treats data_seed=0 as falsy and will incorrectly fall back to seed. Use an explicit is not None check (or getattr default) so 0 remains a valid seed value.

Suggested change

base_seed = int(training_arguments.data_seed or training_arguments.seed)

base_seed_source = (

training_arguments.data_seed

if training_arguments.data_seed is not None

else training_arguments.seed

)

base_seed = int(base_seed_source)

Copilot · 2026-03-24T17:07:54Z

examples/flowertune-llm-code/flowertune_code/client_app.py

    training_arguments = TrainingArguments(**cfg.train.training_arguments)

+    if cfg.train.get("dynamic_data_seed", False):
+        base_seed = int(training_arguments.data_seed or training_arguments.seed)


base_seed = int(training_arguments.data_seed or training_arguments.seed) treats data_seed=0 as falsy and will incorrectly fall back to seed. Use an explicit is not None check (or getattr default) so 0 remains a valid seed value.

Suggested change

base_seed = int(training_arguments.data_seed or training_arguments.seed)

base_seed = (

int(training_arguments.data_seed)

if getattr(training_arguments, "data_seed", None) is not None

else int(training_arguments.seed)

)

feat: add dynamic data seed option for flowertune examples

c018116

Copilot AI review requested due to automatic review settings March 24, 2026 17:05

xiaoyanshen799 requested review from chongshenng, danieljanes, jafermarq, panh99, tanertopal and yan-gao-GY as code owners March 24, 2026 17:05

Copilot started reviewing on behalf of xiaoyanshen799 March 24, 2026 17:05 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

fix: use explicit None check for data_seed

e4c123b

github-actions bot added the Contributor Used to determine what PRs (mainly) come from external contributors. label Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dynamic data seed option for flowertune examples#6831

feat: add dynamic data seed option for flowertune examples#6831
xiaoyanshen799 wants to merge 2 commits intoflwrlabs:mainfrom
xiaoyanshen799:feature/dynamic-data-seed-flowertune

xiaoyanshen799 commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        base_seed = int(training_arguments.data_seed or training_arguments.seed)
+        if training_arguments.data_seed is not None:
+            base_seed = int(training_arguments.data_seed)
+        elif training_arguments.seed is not None:
+            base_seed = int(training_arguments.seed)
+        else:
+            raise ValueError("Neither data_seed nor seed is set in TrainingArguments.")

-        base_seed = int(training_arguments.data_seed or training_arguments.seed)
+        base_seed = (
+            int(training_arguments.data_seed)
+            if getattr(training_arguments, "data_seed", None) is not None
+            else int(training_arguments.seed)
+        )

Conversation

xiaoyanshen799 commented Mar 24, 2026

Issue

Description

Related issues/PRs

Proposal

Explanation

Checklist

Any other comments?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants