Originally: #177 (comment)
I think we want to track (whatever estimator state we use for estimating behaviors/fingerprints per second, currently "inputs since new coverage") independently for blackbox and mutator, and do a multi-armed bandit optimization between the two to generate p_mutate. This will cleanly inherit any thompson sampling improvements.
Originally: #177 (comment)