I ran ppo_run.py and got a .pkl file for HopperRandParamsEnv, of which the average reward was about 200
But when I ran meta_test.py with ProMP-trained policy, the average reward dropped to around 10...
I don't understand where the problem is.
Can somebody help me?