Skip to content

Safer bayesian optimization#341

Open
roussel-ryan wants to merge 8 commits intomainfrom
safe_bo
Open

Safer bayesian optimization#341
roussel-ryan wants to merge 8 commits intomainfrom
safe_bo

Conversation

@roussel-ryan
Copy link
Collaborator

@roussel-ryan roussel-ryan commented Jul 3, 2025

This pull request introduces support for nonlinear inequality constraints in numerical optimization of the acquisition function. The changes include enhancements to the LBFGSOptimizer and GridOptimizer classes, updates to acquisition functions, and new testing capabilities. These additions enable using the probability of feasibility as a nonlinear constraint when optimizing the acquisition function.

Currently, generating initial points and optimizing the acquisition function is relatively slow for LBFGSOptimizer so it is not recommended to use this functionality for high-D parameter spaces.

Enhancements to Numerical Optimization:

  • Support for nonlinear inequality constraints in LBFGSOptimizer: Added logic to handle nonlinear constraints, including a random initial condition generator for sampling feasible points. If candidate generation fails, the optimizer falls back to random valid samples. (xopt/numerical_optimizer.py) [1] [2]
  • Support for nonlinear inequality constraints in GridOptimizer: Integrated constraint handling by filtering grid points based on feasibility. Raises an error if no feasible points are found. (xopt/numerical_optimizer.py)

Updates to Bayesian Generator:

  • Feasibility tolerance parameter: Added a new feasibility_tolerance field to BayesianGenerator to enable constrained acquisition function optimization based on predicted probability of feasibility. (xopt/generators/bayesian/bayesian_generator.py)
  • Log feasibility acquisition function: Introduced _get_log_feasibility method to calculate feasibility constraints for optimization. (xopt/generators/bayesian/bayesian_generator.py)
  • Enhanced candidate proposal: Modified propose_candidates to include nonlinear inequality constraints when feasibility_tolerance is set. (xopt/generators/bayesian/bayesian_generator.py)

Testing Improvements:

  • Tests for nonlinear constraints: Added unit tests for LBFGSOptimizer and GridOptimizer to validate handling of nonlinear constraints, including edge cases where no feasible points exist. (xopt/tests/test_numerical_optimizer.py) [1] [2]
  • Random initial condition generator tests: Verified functionality of the random initial condition generator with nonlinear constraints. (xopt/tests/test_numerical_optimizer.py)

@roussel-ryan roussel-ryan requested a review from nikitakuklev July 3, 2025 19:52
@roussel-ryan roussel-ryan changed the title Safe bayesian optimization Safer bayesian optimization Jul 3, 2025
Copy link
Collaborator

@nikitakuklev nikitakuklev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, what this PR does is

  1. compute constraints-only acquisition function multiplier aka 'probability of feasibility'
  2. instead of multiplying base acq like regular constrained BO, it then uses 1) to generate a model of an inequality constraint on the inputs

Question 1:
Are you sure that same effect cannot be accomplished with a sharper edge function for constraints and multiplying base acquisition the usual way?

Question 2:
Doesn't this effectively double penalize the acquisition function if constraints are applied both to base acquisition and as inequalities? Can the base acquisition function be used directly in this case?

Question 3:
For better performance, is there a neat way to cache the constraint call results to avoid double evaluating them? Seems that would require some surgery on the acquisition functions to store result of apply_constraints, so probably not worth it.

Overall though, no objections. Once this PR is merged, I will be adding variance limits/constraints. We found them very useful to ensure sampling near known locations for machines like Booster where absolutely no bad shots can be permitted during global BE. So far I treated them as just additional fatmoid clipped constraints with special processing, and will explore your inequality approach.


# Apply nonlinear constraints -- remove f_value point where constraints(x) does not satisfy the constraints
if nonlinear_inequality_constraints is not None:
mask = torch.ones(f_values.shape, dtype=torch.bool, device=f_values.device)
Copy link
Collaborator

@nikitakuklev nikitakuklev Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always force to cpu to be same as mesh_pts (or send all tensors to specific device)?

logger.debug("getting random initial conditions")
start = time.time()
lower, upper = bounds[0], bounds[1]
rand = torch.rand(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BoTorch gen_batch_initial_conditions, they use sobol random sampling - should we copy that approach?

assert candidates.shape == torch.Size([ncandidate, ndim])

# test nonlinear constraints
def constraint1(X):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are linear

function: Callable,
bounds: Tensor,
n_candidates: int = 1,
nonlinear_inequality_constraints: (list[tuple[Callable, bool]] | None) = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to docstring

A tensor specifying the bounds for the optimization. It must have the shape [2, ndim].
n_candidates : int, optional
The number of candidates to generate (default is 1).
nonlinear_inequality_constraints : Optional[list[Callable]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong docstring


warnings.warn(
"Nonlinear inequality constraints are provided for LBFGS numerical optimization, "
"using a random initial condition generator which may take a long time to sample enough points.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplying nonlinear_inequality_constraints also switches algo to SLSQP (see here). That might explain why convergence slows down.


sampler = self._get_sampler(model)

log_feasibility = qLogProbabilityOfFeasibility(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can analytic version be used for better perf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, will need to check if it can be used for a given model

@roussel-ryan
Copy link
Collaborator Author

Thanks for the comments @nikitakuklev . In its current implementation, it actually both weights the acquisition function with the feasibility multiplier and restricts the numerical optimization of the acquisition function with a nonlinear feasibility constraint. This might be redundant and adds computational complexity now that I think of it.
To address your questions

  1. My concern is maintaining differentiability + preventing vanishing gradients. If you have an alternative function to try I'm all for it.
  2. You are correct that it is a double penalty which could be removed to improve computational efficiency
  3. Yes we will have to see if this can be used in some way

To clarify you add in a constraint to acquisition function optimization that measures the model uncertainty? So, in this case the acquisition function will only be optimized in regions where the model has a high degree of confidence?

I will take a look at making changes to improve the computational efficiency based on your suggestions in the next few weeks. If you want it faster we can ID a subset of the changes you proposes an merge these features over 2 PRs

@nikitakuklev
Copy link
Collaborator

nikitakuklev commented Jul 9, 2025

Good point on the gradients - making cutoff sharper = changing eta for sigmoid, no new functions come to mind. One could try min-clipping the feasibility to something like 1e-6 - that is yet another transformation however. Logspace does help a bit here.

Thinking more, the smooth constraint acquisition function penalty helps ensure the search stays away from the edge, whereas the hard inequality limit will result in all samples being right along the edge (assuming true optimum is in that direction). Here is a 'highly advanced drawing' - black is objective, green the constraint, black dashes is where constraint is violated, red the constraint acq multiplier, blue the sample with constrained acq, purple the sample with inequality only.

For noise constraints, yes, constraint on variance of the MC samples - this formulates tasks like "optimize in areas where efficiency uncertainty< 0.1%". Setting higher min_noise_prior helps to control model fit. Second version is in normalized space, using the output_transform.stdvs to convert. Third version uses the trained kernel noise distribution to compute threshold, making it a relative 'better-known regions' bias. I need to look more into which noise prior is best, since BoTorch changed defaults recently. (Edit: noise constraint can be applied on objective and/or constraint models)

Coming back to the PR, since the features are optional and interesting, let's merge as is after the minor comments are fixed. After noise constraints PR, we'll do some 'violation count' benchmarking to determine the recommended defaults for safety-critical problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants