From da66d0d5118721a3d9d85a21880f227ea1c56897 Mon Sep 17 00:00:00 2001
From: atlevesque <45173896+atlevesque@users.noreply.github.com>
Date: Sat, 25 Apr 2020 18:11:43 -0400
Subject: [PATCH] Update ddqn_agent.py to prevent RuntimeError with newer
 pytorch version

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:

RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57)
(no backtrace available)'

My guess is that there is a diamond shape dependency when running the backward method as the `self.q_eval` network parameters affect the loss via `q_pred` and `q_eval`.

I fixed the issue by explicitly detaching the `max_actions` tensor from the computational tree as it is a discrete value and small changes in the `self.q_eval` network parameters should not change the max_actions taken. The derivative of the loss with respect to the `self.q_eval` network parameters thus only comes from the q_pred calculation.

I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.
---
 DDQN/ddqn_agent.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/DDQN/ddqn_agent.py b/DDQN/ddqn_agent.py
index 66a15fe..a357e18 100644
--- a/DDQN/ddqn_agent.py
+++ b/DDQN/ddqn_agent.py
@@ -83,7 +83,7 @@ def learn(self):
         q_next = self.q_next.forward(states_)
         q_eval = self.q_eval.forward(states_)
 
-        max_actions = T.argmax(q_eval, dim=1)
+        max_actions = T.argmax(q_eval, dim=1).detach()
         q_next[dones] = 0.0
 
         q_target = rewards + self.gamma*q_next[indices, max_actions]