in the way the master receives the "update" from the workers, for the bn mean weight (the running mean) it would consider the diff as a gradient and do something with this, instead of applying something dedicated to a value that was not updated by gradient descent.
the gamma/beta weights of the bn are ok in this respect.
I fear that this is playing badly with svalleco#3
@duanders if you have any insights on how to modify the mpi-learn-optimizers to take this in consideration, please do tell