Hi, thanks for the magnificent reproduction first.
Would you please provide a summary of difference between this reproduction and the official implementation?
Are them exactly the same in model, optimizer and scheduler?
I want to use your reproduction on my own dataset and I want to know the difference, thanks!