it seems that the transformer model implemented is not working in an unsupervised way.

it seems that the transformer implemented is just a binary classification model as other models.

The implementation doesn't seem to follow the original paper in which positional encoding is employed.