Add Tests for Evaluation Output

Add some test cases to ensure that all of the evaluation methods we support are actually giving us the outputs we’re expecting.

This could be done very simply, with a few short episodes of interaction simulated on a very simple MDP, both with and without skills. See the existing run_agent test cases for inspiration.