Skip to content

feat(examples): Add top-k and per-class eval metrics to quickstart-pytorch#6638

Closed
SalimELMARDI wants to merge 6 commits intoflwrlabs:mainfrom
SalimELMARDI:feat/quickstart-pytorch-rich-metrics
Closed

feat(examples): Add top-k and per-class eval metrics to quickstart-pytorch#6638
SalimELMARDI wants to merge 6 commits intoflwrlabs:mainfrom
SalimELMARDI:feat/quickstart-pytorch-rich-metrics

Conversation

@SalimELMARDI
Copy link

Issue

Description

The quickstart-pytorch example only reported basic evaluation metrics (loss and top-1 accuracy).
That made it harder to inspect ranking quality and class-level performance during federated runs.

Related issues/PRs

N/A

Proposal

Explanation

This PR extends evaluation reporting in examples/quickstart-pytorch while keeping existing metrics backward-compatible.

Changes:

  • Updated pytorchexample/task.py:
    • Extended test(...) to compute:
      • top-1 accuracy (existing behavior)
      • top-3 accuracy
      • per-class top-1 accuracy for CIFAR-10 (class_accuracy_0 ... class_accuracy_9)
  • Updated pytorchexample/client_app.py:
    • Kept existing eval_loss and eval_acc
    • Added eval_acc_top3
    • Added eval_acc_class_0 ... eval_acc_class_9
  • Updated pytorchexample/server_app.py:
    • Kept existing loss and accuracy
    • Added accuracy_top3
    • Added accuracy_class_0 ... accuracy_class_9
  • Updated examples/quickstart-pytorch/README.md:
    • Documented the new reported metrics

Validation:

  • Ran a 1-round simulation locally:
    • flwr run . --stream --run-config "num-server-rounds=1 batch-size=128 fraction-evaluate=0.1"
  • Confirmed new client and server metrics are present in logs.

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

Any other comments?

No API-breaking changes. Existing metric keys were kept for compatibility.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the examples/quickstart-pytorch evaluation reporting to include top-3 accuracy and per-class (CIFAR-10) top-1 accuracies, exposing these metrics from both client-side evaluation and centralized server-side evaluation while keeping existing metric keys.

Changes:

  • Extended test(...) to compute top-3 accuracy and per-class top-1 accuracies alongside existing loss/top-1 accuracy.
  • Updated client and server apps to emit the additional metrics under new, backward-compatible metric keys.
  • Documented the newly reported metrics in the example README.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
examples/quickstart-pytorch/pytorchexample/task.py Computes top-3 and per-class accuracies and returns a metrics dict from test().
examples/quickstart-pytorch/pytorchexample/client_app.py Adds client-side metric keys for top-3 and per-class accuracies.
examples/quickstart-pytorch/pytorchexample/server_app.py Adds centralized metric keys for top-3 and per-class accuracies.
examples/quickstart-pytorch/README.md Documents the expanded set of reported metrics (but front matter formatting changed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the Contributor Used to determine what PRs (mainly) come from external contributors. label Feb 27, 2026
@chongshenng
Copy link
Member

Hello @SalimELMARDI, thanks for opening this PR. I do agree that it's a useful enhancement to our example PyTorch app. But I'm unsure if we should merge this in this form because we intentionally keep our quickstart apps simple - just run basic training and eval.

Maybe it's better to implement this in our advanced-pytorch example? Wdyt?

@SalimELMARDI
Copy link
Author

Hello @SalimELMARDI, thanks for opening this PR. I do agree that it's a useful enhancement to our example PyTorch app. But I'm unsure if we should merge this in this form because we intentionally keep our quickstart apps simple - just run basic training and eval.

Maybe it's better to implement this in our advanced-pytorch example? Wdyt?

@chongshenng Thanks for the feedback, that makes sense. I’ll open a new PR with this enhancement in advanced-pytorch, then close this one as superseded.

@SalimELMARDI
Copy link
Author

Author

Superseded by #6713

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Contributor Used to determine what PRs (mainly) come from external contributors.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants