Hi, I find that as the training goes (beyond 20 epochs), the loss will gradually become negative. May I ask if this is harmful to downstream tasks? Thank you!