Shift labels during finetuning and provide correct padding mask #101

nailimixaM · 2024-02-16T17:04:43Z

As this is most of the calculation for ppl can we make this one and our gpu_utils implementation consistent, and add a test? Better still would be to call a common subroutine from here and from the the ppl calc implementation. NB Pashmin'as ppl refactoring PR should go in first.

Yes, the perplexity calculation should have the same changes, good idea to combine them

nailimixaM · 2024-02-16T17:05:25Z

Nit: we should double check whether contiguous is required here, as we don't use it in our ppl implementation.

-Original file line number
+Diff line change
@@ Expand Up @@
     class CustomTrainer(Trainer):
         def __init__(self, *args, train_loader=None, test_loader=None, **kwargs):
             super().__init__(*args, **kwargs)
-            self.loss_fn = torch.nn.CrossEntropyLoss(ignore_index=self.model.config.pad_token_id)
+            self.loss_fn = torch.nn.CrossEntropyLoss()
             self.train_loader = train_loader
             self.test_loader = test_loader
@@ Expand All / @@ -62,6 +62,21 @@ def get_train_dataloader(self) -> DataLoader: @@
         def get_eval_dataloader(self, _) -> DataLoader:
             return self.test_loader
+        def compute_loss(self, model, inputs, return_outputs=False):
+            labels = inputs.pop('labels')
+            attention_mask = inputs["attention_mask"]
+            outputs = model(**inputs)
+            labels = labels[..., 1:].contiguous()
+            logits = outputs.logits[..., :-1, :].contiguous()
+            attention_mask = attention_mask[..., :-1].contiguous()
+            # ignore padding tokens when computing the loss
+            logits = logits * attention_mask.unsqueeze(-1)
+            loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))
+            return (loss, outputs) if return_outputs else loss
     def argparser():
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift labels during finetuning and provide correct padding mask #101

Uh oh!

Diff view

Diff view

There are no files selected for viewing

nailimixaM Feb 16, 2024

Uh oh!

LianaMikael Feb 19, 2024

Uh oh!

nailimixaM Feb 16, 2024

Uh oh!

Shift labels during finetuning and provide correct padding mask #101

Are you sure you want to change the base?

Uh oh!

Shift labels during finetuning and provide correct padding mask #101

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

nailimixaM Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

LianaMikael Feb 19, 2024

Choose a reason for hiding this comment

Uh oh!

nailimixaM Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!