Alternative Losses

Currently, we're using only the softmax classification/cross-entropy loss to create a language-modeling loss for next-token prediction. However, other works such as [T-Few](https://arxiv.org/abs/2205.05638) showed that adding alternative losses for external benefits such as length explicit penalties during training can help downstream-task performance. Additionally, other works like [DCL](https://arxiv.org/abs/2110.06848) and [InfoLOOB](https://arxiv.org/abs/1905.06922) demonstrated that changing the fundamental structure of the loss from softmax classification to something different can help speed up convergence. That's why a similar approach could be beneficial for us.\
In this issue, we'll explore whether InfoLOOB's classification loss for the language-modeling objective helps or if we should change the entire objective.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative Losses #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alternative Losses #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions