feat(examples): Add top-k and per-class eval metrics to quickstart-pytorch#6638
feat(examples): Add top-k and per-class eval metrics to quickstart-pytorch#6638SalimELMARDI wants to merge 6 commits intoflwrlabs:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances the examples/quickstart-pytorch evaluation reporting to include top-3 accuracy and per-class (CIFAR-10) top-1 accuracies, exposing these metrics from both client-side evaluation and centralized server-side evaluation while keeping existing metric keys.
Changes:
- Extended
test(...)to compute top-3 accuracy and per-class top-1 accuracies alongside existing loss/top-1 accuracy. - Updated client and server apps to emit the additional metrics under new, backward-compatible metric keys.
- Documented the newly reported metrics in the example README.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| examples/quickstart-pytorch/pytorchexample/task.py | Computes top-3 and per-class accuracies and returns a metrics dict from test(). |
| examples/quickstart-pytorch/pytorchexample/client_app.py | Adds client-side metric keys for top-3 and per-class accuracies. |
| examples/quickstart-pytorch/pytorchexample/server_app.py | Adds centralized metric keys for top-3 and per-class accuracies. |
| examples/quickstart-pytorch/README.md | Documents the expanded set of reported metrics (but front matter formatting changed). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hello @SalimELMARDI, thanks for opening this PR. I do agree that it's a useful enhancement to our example PyTorch app. But I'm unsure if we should merge this in this form because we intentionally keep our quickstart apps simple - just run basic training and eval. Maybe it's better to implement this in our |
@chongshenng Thanks for the feedback, that makes sense. I’ll open a new PR with this enhancement in |
Superseded by #6713 |
Issue
Description
The
quickstart-pytorchexample only reported basic evaluation metrics (loss and top-1 accuracy).That made it harder to inspect ranking quality and class-level performance during federated runs.
Related issues/PRs
N/A
Proposal
Explanation
This PR extends evaluation reporting in
examples/quickstart-pytorchwhile keeping existing metrics backward-compatible.Changes:
pytorchexample/task.py:test(...)to compute:class_accuracy_0...class_accuracy_9)pytorchexample/client_app.py:eval_lossandeval_acceval_acc_top3eval_acc_class_0...eval_acc_class_9pytorchexample/server_app.py:lossandaccuracyaccuracy_top3accuracy_class_0...accuracy_class_9examples/quickstart-pytorch/README.md:Validation:
flwr run . --stream --run-config "num-server-rounds=1 batch-size=128 fraction-evaluate=0.1"Checklist
#contributions)Any other comments?
No API-breaking changes. Existing metric keys were kept for compatibility.