Tensorflow BOOT DQN agent loses performance after first iteration

Hi,

I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by. 
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause. 

Code for reproduction (double-checked in a newly installed env):

```
import bsuite
from bsuite.baselines.tf import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment

bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]

for id in bsuite_sweep:
    env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
    agent = boot_dqn.default_agent(
        obs_spec=env.observation_spec(),
        action_spec=env.action_spec(),
    )
    
    experiment.run(agent, env, num_episodes=300)
```
Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).

In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <bsuite.baselines.tf> with <bsuite.baselines.jax>).

The same can be observed in colab:
https://colab.research.google.com/drive/1hnJMDLG-aXCKKsjFqVd6YWGY4luz29ku?usp=sharing

best,
anyboby

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow BOOT DQN agent loses performance after first iteration #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tensorflow BOOT DQN agent loses performance after first iteration #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions