remove mean embedding of dataset as preprocessing

maybe this will slightly help overfitting to stock video distribution

need to find a way to efficiently take zero_masks in training and eval since the way I tried is super slow