it seems that the transformer implemented is just a binary classification model as other models. The implementation doesn't seem to follow the original paper in which positional encoding is employed.
it seems that the transformer implemented is just a binary classification model as other models.
The implementation doesn't seem to follow the original paper in which positional encoding is employed.