In robot control, the action sent by the policy could not be executed completely in one step, which breaks mdp principle (more like pomdp). In your paper, the control frequence is 10hz.
So i'd like to ask how you solve this problem or you just ignore it.
If you achieve it, you clip the action range into which robot could implement in one step? Or how
Thanks for your reply in advance! I think it's a very basic and important problem for deployment of RL. I appreciate if you could reply.