Skip to content

send evaluation table to wandb when enabled#182

Open
lfoppiano wants to merge 1 commit intomasterfrom
feature/eval-with-wandb
Open

send evaluation table to wandb when enabled#182
lfoppiano wants to merge 1 commit intomasterfrom
feature/eval-with-wandb

Conversation

@lfoppiano
Copy link
Collaborator

This PR allow to send the results of the evaluation to wandb. When running train_eval, it will be attached to the run, but with the eval command, it will be attached to a separate run in wandb. Nothing we can do about that.

@lfoppiano lfoppiano requested a review from Copilot June 6, 2025 09:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables logging of evaluation metrics to Weights & Biases by propagating a new report_to_wandb flag and formatting results into a wandb.Table.

  • Adds a report_to_wandb parameter through the evaluation and training pipeline.
  • Implements to_wandb_table utility to convert metric maps into table format.
  • Updates the Grobid tagger CLI to accept and forward the report_to_wandb flag.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
delft/sequenceLabelling/wrapper.py Propagate report_to_wandb into evaluation and log tables to wandb
delft/sequenceLabelling/trainer.py Rename wandb flag, implement report_to_wandb in Scorer, add helper
delft/applications/grobidTagger.py Update eval_ signature and forwarding of report_to_wandb flag
Comments suppressed due to low confidence (3)

delft/sequenceLabelling/trainer.py:33

  • The constructor parameter is now report_to_wandb but the instance attribute remains self.enable_wandb. Rename the attribute to self.report_to_wandb (or the parameter back to enable_wandb) for consistency.
report_to_wandb = False

delft/sequenceLabelling/trainer.py:497

  • This new utility function isn’t covered by tests. Consider adding unit tests for to_wandb_table to verify correct table structure for various input maps.
def to_wandb_table(report_as_map):

delft/applications/grobidTagger.py:521

  • Passing the wandb module here will lead to a NameError and is semantically incorrect—this parameter expects a boolean. Replace report_to_wandb=wandb with report_to_wandb=True (or the appropriate flag) and ensure you import wandb if needed.
report_to_wandb=wandb



def to_wandb_table(report_as_map):
columns = ["", "precision", "recall", "f1-score", "support"]
Copy link

Copilot AI Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using an empty string for the first column header is unclear. Consider renaming it to "label" or "class" to improve readability in the wandb table.

Suggested change
columns = ["", "precision", "recall", "f1-score", "support"]
columns = ["label", "precision", "recall", "f1-score", "support"]

Copilot uses AI. Check for mistakes.
Comment on lines +525 to +533
micro = report_as_map['macro']
micro_row = [
"all (macro avg.)",
round(micro['precision'], 4),
round(micro['recall'], 4),
round(micro['f1'], 4),
int(micro['support'])
]
data.append(micro_row)
Copy link

Copilot AI Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Reusing the variable micro for macro-averaged metrics can be confusing. Rename it to macro_metrics (or similar) when handling the 'macro' key.

Suggested change
micro = report_as_map['macro']
micro_row = [
"all (macro avg.)",
round(micro['precision'], 4),
round(micro['recall'], 4),
round(micro['f1'], 4),
int(micro['support'])
]
data.append(micro_row)
macro_metrics = report_as_map['macro']
macro_row = [
"all (macro avg.)",
round(macro_metrics['precision'], 4),
round(macro_metrics['recall'], 4),
round(macro_metrics['f1'], 4),
int(macro_metrics['support'])
]
data.append(macro_row)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants