Since the LR scheduling state and the grad scaler are not stored with the rest of the training run data, the model re-initializes them, which leads significantly different results than had the training not been interrupted.
NB: There is also a typo "optimizer_state_dit" should probably be "optimizer_state_dict"