when I use your pre-trained model for training,the initialization loss is high for 200+,what's the question about this? and loss is continuous high during the several epochs,the batch_size is 1,other parameters is not changed,Looking forward to your reply,Thanks!