Maybe we can use Eureka-like (https://eureka-research.github.io/) method to generate the reward function? Thanks!