I think that in the Dyna-Q+ class, the reset method should not include the command: self.time = 0, since this will create the issue of NaN values in the Q-function, due to the fact that we will have self.time - _time < 0, which is then inserted in a square-root!
in Line: