Hello! After reading your paper, I have gained great inspiration. However, there is one point that I don't fully understand and hope to receive a reply from you. Regarding the method mentioned in Section 3.4 of your paper, "Gradient descent on the dilation rates," I have examined your code and found that the dilation rates are hardcoded. Therefore, based on the description in your paper, can I understand it as follows: You first train the model using deformable convolutions to find the optimal parameters, and then use these parameters as the dilation rates to retrain the network?

Hello! After reading your paper, I have gained great inspiration. However, there is one point that I don't fully understand and hope to receive a reply from you. Regarding the method mentioned in Section 3.4 of your paper, "Gradient descent on the dilation rates," I have examined your code and found that the dilation rates are hardcoded. Therefore, based on the description in your paper, can I understand it as follows: You first train the model using deformable convolutions to find the optimal parameters, and then use these parameters as the dilation rates to retrain the network?
