-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Hi, I set eval_period to be 400 and this is my run_actor.sh and run_learner.sh :
run_actor.sh
export XLA_PYTHON_CLIENT_PREALLOCATE=false &&
export XLA_PYTHON_CLIENT_MEM_FRACTION=.1 &&
python async_drq_sim.py "$@"
--actor
--render
--exp_name=serl_dev_drq_sim_test_resnet
--seed 0
--random_steps 1000
--encoder_type resnet-pretrained
run_learner.sh
export XLA_PYTHON_CLIENT_PREALLOCATE=false &&
export XLA_PYTHON_CLIENT_MEM_FRACTION=.2 &&
python async_drq_sim.py "$@"
--learner
--exp_name=serl_dev_drq_sim_test_resnet
--seed 0
--training_starts 1000
--critic_actor_ratio 4
--encoder_type resnet-pretrained
--demo_path franka_lift_cube_image_20_trajs.pkl
--checkpoint_period 150
--checkpoint_path /home/serl/examples/async_drq_sim/checkpoints/drq_resnet/
The other files are all default. I found that the wandb logger did not record the reward curve exactly at an interval of 400 steps. Does anyone know the reason?