Tests should check that grad has correct dtype

Many random-float64-promotion complaints from hw2 can be traced to problematic ops implementation in hw1. Given that autograd didn't provide interface to cast Tensor's dtype, the dtype must be consistent from the beginning. Tests for gradient() methods should check that grad.dtype is "smaller" (won't cause promotion) than inputs[i].dtype