Critic q loss in PPO agent seems to be wrong

In [line 282](https://github.com/youngwoon/robot-learning/blob/master/algorithms/ppo_agent.py#L282) of ppo_agent.py, the critic is trained using:
```
value_loss = self._config.value_loss_coeff * (ret - value_pred).pow(2).mean()

```
where ``` ret ``` is computed as [```ret = adv + vpred[:-1]```](https://github.com/youngwoon/robot-learning/blob/master/algorithms/ppo_agent.py#L134). 

This way of calculating return actually gives q_loss but the critic actually predicts v as [here](https://github.com/youngwoon/robot-learning/blob/master/algorithms/ppo_agent.py#L101). 

It seems like the critic is trained using q-loss but it's used to predict only state values. Could you clarify on this?

Thanks 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critic q loss in PPO agent seems to be wrong #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Critic q loss in PPO agent seems to be wrong #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions