From da66d0d5118721a3d9d85a21880f227ea1c56897 Mon Sep 17 00:00:00 2001 From: atlevesque <45173896+atlevesque@users.noreply.github.com> Date: Sat, 25 Apr 2020 18:11:43 -0400 Subject: [PATCH] Update ddqn_agent.py to prevent RuntimeError with newer pytorch version When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError: RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57) (no backtrace available)' My guess is that there is a diamond shape dependency when running the backward method as the `self.q_eval` network parameters affect the loss via `q_pred` and `q_eval`. I fixed the issue by explicitly detaching the `max_actions` tensor from the computational tree as it is a discrete value and small changes in the `self.q_eval` network parameters should not change the max_actions taken. The derivative of the loss with respect to the `self.q_eval` network parameters thus only comes from the q_pred calculation. I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError. --- DDQN/ddqn_agent.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DDQN/ddqn_agent.py b/DDQN/ddqn_agent.py index 66a15fe..a357e18 100644 --- a/DDQN/ddqn_agent.py +++ b/DDQN/ddqn_agent.py @@ -83,7 +83,7 @@ def learn(self): q_next = self.q_next.forward(states_) q_eval = self.q_eval.forward(states_) - max_actions = T.argmax(q_eval, dim=1) + max_actions = T.argmax(q_eval, dim=1).detach() q_next[dones] = 0.0 q_target = rewards + self.gamma*q_next[indices, max_actions]