-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
MLRequires machine-learning knowledge (can be built up on the fly)Requires machine-learning knowledge (can be built up on the fly)coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactresearchCreative project that might fail but could give high returnsCreative project that might fail but could give high returns
Description
Currently, our model can be either an encoder or a decoder. Combining these two, as in T5, is not possible. The best approximation we could get at the moment would be to expand the context of our decoder, but the performance of a decoder-only model isn't as good. Ideally, we could run full "attention" for one part and sample autoregressive for the other.
This issue discusses ideas for implementing such a scheme and benchmarking it against the baseline fully-autoregressive model.
Metadata
Metadata
Assignees
Labels
MLRequires machine-learning knowledge (can be built up on the fly)Requires machine-learning knowledge (can be built up on the fly)coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactresearchCreative project that might fail but could give high returnsCreative project that might fail but could give high returns