Deploy each agent on a GPU, reward should be on CPU, as a follow up issue on #3
Deploy each agent on a GPU, reward should be on CPU, as a follow up issue on #3