std deviation in reward computation can be negative sometimes

can it be ensured that the std deviation cannot be negative? this throws an error in `line 63` of `reward_model.py`. 

this is an edge case as the reward should never be negative but sometimes the opt problem can generate a flow of -0.00 and we want to be robust to that.