Extracting total-loss, PPO-loss, rewards per step, returns per step in RLHF-PPO implementation

### 📚 The doc issue

I need help extracting total-loss, PPO-loss, rewards per step, returns per step in RLHF-PPO implementation.

### Suggest a potential alternative/fix

_No response_