Thank you for the code and a nice paper.
I'm using your code from https://github.com/chaoyuaw/incubator-mxnet/tree/master/example/gluon/embedding_learning
I wonder if you have tried training the networks other than Resnet50_V2 using the proposed loss?
I'm trying to run VGG11_bn and it doesn't really want to improve much over the random initialization, even when I tune learning rate.
I assume some more hyperparameters should be altered. Could you give me a hint which hyperparameters of you loss function should I tune first (which are more sensitive)?
Thanks in advance,
Artsiom