I believe the model was trained on 95M sentences (instead of 95K). By the way, I couldn't find how you initialized the model's parameters.