can it be ensured that the std deviation cannot be negative? this throws an error in line 63 of reward_model.py.
this is an edge case as the reward should never be negative but sometimes the opt problem can generate a flow of -0.00 and we want to be robust to that.