Implementation of drc

Hi

Thanks for providing this interesting package.

I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with
A = np.array([[1.0 0.95], [0.0, -0.9]]), 
B = np.array([[0.0], [1.0]])
C = np.array([[1.0, 0]])
Q , R = I
gaussian process noise, zero observation noise
which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same.
Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again M_t) but in the implementation, the previous controls based on M_{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one.

I was wondering if you have any code example for the DRC algorithm that works?
[1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020.

Thanks a lot,
Sincerely,
Farnaz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of drc #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of drc #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions