-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
MLRequires machine-learning knowledge (can be built up on the fly)Requires machine-learning knowledge (can be built up on the fly)coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactresearchCreative project that might fail but could give high returnsCreative project that might fail but could give high returns
Description
Currently, we're using only the softmax classification/cross-entropy loss to create a language-modeling loss for next-token prediction. However, other works such as T-Few showed that adding alternative losses for external benefits such as length explicit penalties during training can help downstream-task performance. Additionally, other works like DCL and InfoLOOB demonstrated that changing the fundamental structure of the loss from softmax classification to something different can help speed up convergence. That's why a similar approach could be beneficial for us.
In this issue, we'll explore whether InfoLOOB's classification loss for the language-modeling objective helps or if we should change the entire objective.
Metadata
Metadata
Assignees
Labels
MLRequires machine-learning knowledge (can be built up on the fly)Requires machine-learning knowledge (can be built up on the fly)coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactresearchCreative project that might fail but could give high returnsCreative project that might fail but could give high returns