Hi, one other issue I wanted to point out was that the training process seemed to terminate about 27 epochs in, due to a diverging loss. Thanks! <img width="1288" alt="Screenshot 2020-11-23 at 08 46 08" src="https://user-images.githubusercontent.com/38363539/100001813-ea9aa800-2d80-11eb-8376-6f7ac75a9970.png">