Skip to content

Fix batch collator padding for training with batch size > 1#36

Open
stepan-omelka wants to merge 2 commits intoantoniorv6:masterfrom
stepan-omelka:my-selected-changes
Open

Fix batch collator padding for training with batch size > 1#36
stepan-omelka wants to merge 2 commits intoantoniorv6:masterfrom
stepan-omelka:my-selected-changes

Conversation

@stepan-omelka
Copy link
Copy Markdown
Contributor

@stepan-omelka stepan-omelka commented Mar 10, 2026

[bug]

When running the training script with a batch size greater than 1,
the process crashed due to mismatched tensor lengths in the decoder
input and ground truth targets.

This change ensures all sequence tensors within a batch are dynamically
padded to the maximum sequence length using the dataset's padding token.
As a result, the model can safely process batches larger than 1 without
encountering tensor dimension conflicts during training or validation.

  • Integrates BatchCollator for dynamic sequence padding.

run pertaining with batching 4 == orange
image

@stepan-omelka stepan-omelka changed the title fix: batching and validation batching Fix batch collator padding for training with batch size > 1 Mar 10, 2026
@stepan-omelka stepan-omelka marked this pull request as ready for review March 10, 2026 11:52
@antoniorv6 antoniorv6 self-assigned this Mar 11, 2026
@antoniorv6 antoniorv6 added the enhancement New feature or request label Mar 11, 2026
@antoniorv6 antoniorv6 self-requested a review March 12, 2026 12:33
@antoniorv6
Copy link
Copy Markdown
Owner

All the changes seem good for me. However, have you tested on the full-page scenario? Note that this program currently covers both cases (system-level and full-page). Is it possible that you send results on these other scenarios in order to merge?

Copy link
Copy Markdown
Owner

@antoniorv6 antoniorv6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting until full-page results are presented.

@antoniorv6 antoniorv6 requested a review from eric-ayllon March 12, 2026 12:38
@stepan-omelka
Copy link
Copy Markdown
Contributor Author

stepan-omelka commented Mar 24, 2026

Hi, I tried to run the fine-tuning, but I am repeatedly running into an error (I created an issue for it). I also tried to run the pretraining and fine-tuning in the master branch, but I ran into the same error => most likely it's not directly caused by changes in this PR.

Until the issue is solved, I am unable to actually test the fine-tuning on increased batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants