Excellent Work! I'm quite interested about your work. I've a question about where to do CWD in the network. Do you mean only do CWD in the last output channel? Like for CityScapes, the output channel will be 19 ,each one will be responsible for one class and only do Channel-wise softmax on these 19 channels separately and then compute loss based on KL divergency. Or do you mean we need to insert this CWD module in the middle part of the network too? Can you share more details about it? Thanks!