Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR enables logging of evaluation metrics to Weights & Biases by propagating a new report_to_wandb flag and formatting results into a wandb.Table.
- Adds a
report_to_wandbparameter through the evaluation and training pipeline. - Implements
to_wandb_tableutility to convert metric maps into table format. - Updates the Grobid tagger CLI to accept and forward the
report_to_wandbflag.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| delft/sequenceLabelling/wrapper.py | Propagate report_to_wandb into evaluation and log tables to wandb |
| delft/sequenceLabelling/trainer.py | Rename wandb flag, implement report_to_wandb in Scorer, add helper |
| delft/applications/grobidTagger.py | Update eval_ signature and forwarding of report_to_wandb flag |
Comments suppressed due to low confidence (3)
delft/sequenceLabelling/trainer.py:33
- The constructor parameter is now
report_to_wandbbut the instance attribute remainsself.enable_wandb. Rename the attribute toself.report_to_wandb(or the parameter back toenable_wandb) for consistency.
report_to_wandb = False
delft/sequenceLabelling/trainer.py:497
- This new utility function isn’t covered by tests. Consider adding unit tests for
to_wandb_tableto verify correct table structure for various input maps.
def to_wandb_table(report_as_map):
delft/applications/grobidTagger.py:521
- Passing the
wandbmodule here will lead to a NameError and is semantically incorrect—this parameter expects a boolean. Replacereport_to_wandb=wandbwithreport_to_wandb=True(or the appropriate flag) and ensure youimport wandbif needed.
report_to_wandb=wandb
|
|
||
|
|
||
| def to_wandb_table(report_as_map): | ||
| columns = ["", "precision", "recall", "f1-score", "support"] |
There was a problem hiding this comment.
[nitpick] Using an empty string for the first column header is unclear. Consider renaming it to "label" or "class" to improve readability in the wandb table.
| columns = ["", "precision", "recall", "f1-score", "support"] | |
| columns = ["label", "precision", "recall", "f1-score", "support"] |
| micro = report_as_map['macro'] | ||
| micro_row = [ | ||
| "all (macro avg.)", | ||
| round(micro['precision'], 4), | ||
| round(micro['recall'], 4), | ||
| round(micro['f1'], 4), | ||
| int(micro['support']) | ||
| ] | ||
| data.append(micro_row) |
There was a problem hiding this comment.
[nitpick] Reusing the variable micro for macro-averaged metrics can be confusing. Rename it to macro_metrics (or similar) when handling the 'macro' key.
| micro = report_as_map['macro'] | |
| micro_row = [ | |
| "all (macro avg.)", | |
| round(micro['precision'], 4), | |
| round(micro['recall'], 4), | |
| round(micro['f1'], 4), | |
| int(micro['support']) | |
| ] | |
| data.append(micro_row) | |
| macro_metrics = report_as_map['macro'] | |
| macro_row = [ | |
| "all (macro avg.)", | |
| round(macro_metrics['precision'], 4), | |
| round(macro_metrics['recall'], 4), | |
| round(macro_metrics['f1'], 4), | |
| int(macro_metrics['support']) | |
| ] | |
| data.append(macro_row) |
This PR allow to send the results of the evaluation to wandb. When running train_eval, it will be attached to the run, but with the
evalcommand, it will be attached to a separate run in wandb. Nothing we can do about that.