-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi
Thanks for providing this interesting package.
I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with
A = np.array([[1.0 0.95], [0.0, -0.9]]),
B = np.array([[0.0], [1.0]])
C = np.array([[1.0, 0]])
Q , R = I
gaussian process noise, zero observation noise
which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same.
Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again M_t) but in the implementation, the previous controls based on M_{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one.
I was wondering if you have any code example for the DRC algorithm that works?
[1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020.
Thanks a lot,
Sincerely,
Farnaz