Skip to content

Conversation

@till-m
Copy link
Member

@till-m till-m commented Nov 18, 2025

This introduces a user-facing .predict() method to the optimizer to simplify making predictions with the GP. I think that with the support for optimization over non-float parameters, calling .predict() on the GP now requires a bit too much parameter transformation and mangling. Rather than forcing users to handle this complexity themselves, this change provides an interface that handles all necessary conversions internally.

Summary by CodeRabbit

  • New Features

    • Added a predict() method to generate predictions for specified parameters with optional standard deviation or covariance outputs.
  • Documentation

    • Updated maximize() documentation to reference the new prediction capability.

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.77%. Comparing base (c410d51) to head (1e51b28).

Files with missing lines Patch % Lines
bayes_opt/bayesian_optimization.py 92.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #593      +/-   ##
==========================================
- Coverage   97.89%   97.77%   -0.12%     
==========================================
  Files          10       10              
  Lines        1185     1212      +27     
==========================================
+ Hits         1160     1185      +25     
- Misses         25       27       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@till-m
Copy link
Member Author

till-m commented Nov 19, 2025

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

A new predict() method was added to BayesianOptimization for computing predictions on given parameters. It accepts single or multiple parameter inputs, optionally fits the GP, and returns mean with optional uncertainty estimates. Documentation in maximize() was updated. Test utilities including a custom parameter class and triangle area function were added.

Changes

Cohort / File(s) Summary
Core Feature
bayes_opt/bayesian_optimization.py
Added public method predict(params, return_std=False, return_cov=False, fit_gp=True) that accepts single dict or list of dicts, optionally fits GP, normalizes inputs to 2D array, and returns predictions with optional std/covariance. Updated maximize() documentation to reference the new prediction capability.
Test Utilities
tests/test_bayesian_optimization.py
Added helper class FixedPerimeterTriangleParameter (subclass of BayesParameter) for continuous sampling via Dirichlet distributions with perimeter constraints. Added standalone function area_of_triangle(sides) computing area via Heron's formula.

Sequence Diagram

sequenceDiagram
    participant User
    participant BO as BayesianOptimization
    participant GP as GaussianProcess
    
    User->>BO: predict(params, fit_gp=True)
    
    alt fit_gp is True
        BO->>BO: Check if observations exist
        alt No observations
            BO-->>User: RuntimeError
        else Observations exist
            BO->>GP: Fit GP via _fit_gp()
            GP-->>BO: GP fitted
        end
    end
    
    BO->>BO: Normalize params to 2D array
    BO->>GP: Predict mean (and std/cov if requested)
    GP-->>BO: Predictions returned
    
    BO->>BO: Denormalize output shape
    
    alt return_std or return_cov
        BO-->>User: (mean, uncertainty)
    else Neither requested
        BO-->>User: mean only
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • Input validation & normalization logic: Verify parameter conversion from single dict/list of dicts to 2D array and shape consistency
  • GP fitting pathway: Ensure fit_gp conditional logic correctly delegates to _fit_gp() and handles edge cases (zero observations)
  • Return shape handling: Validate that output shapes differ correctly between single vs. multiple inputs and with/without uncertainty estimates
  • Test coverage: Review test utilities and their integration with the new prediction functionality

Poem

🐰 A prediction method hops in with grace,
Parameters fed, predictions embrace,
Uncertainty whispered with std or cov,
Triangles tested with Heron's beloved,
The Bayesian warren grows strong, here's the trove! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: adding a new .predict() method to the BayesianOptimization optimizer class.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/test_bayesian_optimization.py (2)

19-64: Custom triangle parameter and area helper: clarify intent and edge conditions

  • Triangle sampling vs. perimeter and bounds.
    FixedPerimeterTriangleParameter.random_sample draws Dirichlet samples scaled by self.perimeter and only enforces self.bounds. For the current tests (perimeter=1.0, bounds [0, 1] for each side) this is fine, but for inconsistent combinations (e.g. perimeter > 3 * upper_bound) the while loop could run indefinitely because no samples satisfy the bounds. If this helper is ever reused with different parameters, consider either:

    • Asserting a compatible relationship between perimeter and bounds at construction time, or
    • Adding a max-tries safeguard with a clear error message.
  • Heron’s formula vs. current area_of_triangle implementation.
    Standard Heron’s formula uses the semiperimeter s = (a + b + c) / 2, whereas here s is the full perimeter and the comment explicitly says # perimeter. This function is only used as a test objective, so correctness up to a constant factor may be acceptable, but the name area_of_triangle is misleading as‑is. You might want to either:

    • Implement the usual formula:

  • s = np.sum(sides, axis=-1) # perimeter

  • return np.sqrt(s * (s - a) * (s - b) * (s - c))

  • s = 0.5 * np.sum(sides, axis=-1) # semiperimeter
  • return np.sqrt(s * (s - a) * (s - b) * (s - c))
    ```
  • Or rename/comment this as a synthetic triangle-like objective to avoid confusion.

537-590: Custom-parameter save/load test: avoid debug prints and private GP calls

  • The print statements for pbounds (Lines 546–548) look like leftover debugging and add noise to test output. They can safely be removed without losing coverage.

  • Forcing a GP fit via optimizer._gp.fit(...) (Lines 557–558) reaches into a private attribute. With the new public predict(..., fit_gp=True) hook, you could instead trigger a GP fit in a more idiomatic way (e.g., call optimizer.predict(...) once) or rely on the existing logic that fits the GP inside maximize(). If the extra fit is really needed for this scenario, a brief comment explaining why would help future maintainers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c410d51 and 1e51b28.

📒 Files selected for processing (2)
  • bayes_opt/bayesian_optimization.py (2 hunks)
  • tests/test_bayesian_optimization.py (2 hunks)
🔇 Additional comments (1)
bayes_opt/bayesian_optimization.py (1)

369-376: maximize() warning update looks good

The updated warning that points users to predict(..., fit_gp=True) after maximize() is accurate given the new API and clarifies GP fitting expectations without changing behavior.

Comment on lines +178 to +245
def predict(
self, params: dict[str, Any] | list[dict[str, Any]], return_std=False, return_cov=False, fit_gp=True
) -> tuple[float | NDArray[Float], float | NDArray[Float]]:
"""Predict the target function value at given parameters.
Parameters
---------
params: dict or list
The parameters where the prediction is made.
return_std: bool, optional(default=True)
If True, the standard deviation of the prediction is returned.
return_cov: bool, optional(default=False)
If True, the covariance of the prediction is returned.
fit_gp: bool, optional(default=True)
If True, the internal Gaussian Process model is fitted before
making the prediction.
Returns
-------
mean: float or np.ndarray
The predicted mean of the target function at the given parameters.
std_or_cov: float or np.ndarray
The predicted standard deviation or covariance of the target function
at the given parameters.
"""
if isinstance(params, list):
# convert list of dicts to 2D array
params_array = np.array([self._space.params_to_array(p) for p in params])
single_param = False
elif isinstance(params, dict):
params_array = self._space.params_to_array(params).reshape(1, -1)
single_param = True

if fit_gp:
if len(self._space) == 0:
msg = (
"The Gaussian Process model cannot be fitted with zero observations. To use predict(), "
"without fitting the GP, set fit_gp=False. The predictions will then be made using the "
"GP prior."
)
raise RuntimeError(msg)
self.acquisition_function._fit_gp(self._gp, self._space)

res = self._gp.predict(params_array, return_std=return_std, return_cov=return_cov)

if return_std or return_cov:
mean, std_or_cov = res
else:
mean = res

if not single_param and mean.ndim == 0:
mean = np.atleast_1d(mean)
# ruff complains when nesting conditionals, so this three-way split is necessary
if not single_param and (return_std or return_cov) and std_or_cov.ndim == 0:
std_or_cov = np.atleast_1d(std_or_cov)

if single_param and mean.ndim > 0:
mean = mean[0]
if single_param and (return_std or return_cov) and std_or_cov.ndim > 0:
std_or_cov = std_or_cov[0]

if return_std or return_cov:
return mean, std_or_cov
return mean
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

predict(): tighten input validation, clarify typing/docs, and guard std vs cov

A few points here:

  • Invalid params types lead to UnboundLocalError.
    If params is neither dict nor list, params_array / single_param are never set and you’ll get an unhelpful runtime error. Consider explicitly validating and raising a TypeError:

    -        if isinstance(params, list):
    +        if isinstance(params, list):
               # convert list of dicts to 2D array
               params_array = np.array([self._space.params_to_array(p) for p in params])
               single_param = False
           elif isinstance(params, dict):
               params_array = self._space.params_to_array(params).reshape(1, -1)
               single_param = True
  •    else:
    
  •        msg = "params must be a dict or a list of dicts."
    
  •        raise TypeError(msg)
    
    
    
  • Return type annotation doesn’t match behavior.
    The function sometimes returns just mean (float / NDArray) and sometimes a (mean, std_or_cov) tuple. The annotation -> tuple[...] is misleading for callers and type checkers. Consider something like:

    def predict(...) -> float | NDArray[Float] | tuple[float | NDArray[Float], float | NDArray[Float]]:
        ...

    or using typing.overload for the main common combinations.

  • Docstring default for return_std is wrong.
    The signature uses return_std=False, but the docstring says optional(default=True). That will confuse users. Please fix the docstring to optional(default=False).

  • return_std and return_cov should be mutually exclusive.
    GaussianProcessRegressor.predict only allows one of return_std or return_cov to be True at a time; otherwise it raises.
    It would be clearer to check upfront and raise a user-facing ValueError:

  •    if fit_gp:
    
  •    if return_std and return_cov:
    
  •        msg = "At most one of return_std and return_cov can be True."
    
  •        raise ValueError(msg)
    
  •    if fit_gp:
           if len(self._space) == 0:
               ...
    
    
    
    
  • Shape handling vs. list-of-one semantics.
    For params given as a single dict you correctly return scalars (or scalar-like) when possible via single_param=True. For params as a list with a single element (e.g. [{"p1": 0, "p2": 0}]), single_param is False so mean/std_or_cov stay 1D arrays. Several new tests call predict([point]) and then do scalar-style checks like np.isclose(means, 0, ...) or stds < 0.02, which will raise ValueError if means/stds are arrays. You probably want either:

    • API choice A (simpler): Always treat list inputs as “multi-point”, even of length 1, and adjust the tests to index or call .item(), or
    • API choice B (more user-friendly): If params is a list of length 1, treat it like a single param (single_param=True) so predict([point]) behaves like predict(point) in terms of scalar vs array output.

    Whichever you decide, aligning implementation, tests, and docstring explicitly around “single dict vs list, including length-1 lists” will avoid confusing shape bugs.

  • Minor: handling of single_param with return_cov.
    When return_cov=True and params is a single dict, std_or_cov starts as a (1, 1) covariance matrix and becomes (1,) after std_or_cov = std_or_cov[0]. If you intend a scalar variance for a single point, consider also unwrapping the remaining dimension to a float for the single-parameter case, or documenting that you return a 1‑element array for covariance.


I need to verify the concerns in the review comment against the actual codebase. Let me gather information:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find the predict() method definition in full to see if validation exists
rg -n -A 80 "def predict\(" bayes_opt/bayesian_optimization.py | head -120

Length of output: 3434


🏁 Script executed:

#!/bin/bash
# Check the full type annotation for the predict method
rg -n "def predict" bayes_opt/bayesian_optimization.py -A 5 --type python

Length of output: 30


🏁 Script executed:

#!/bin/bash
# Find test files that call predict() to see edge cases and assertions
fd -e "test_*.py" -o -e "*_test.py" | xargs rg -l "\.predict\(" 2>/dev/null | head -5

Length of output: 279


🏁 Script executed:

#!/bin/bash
# Search for any validation of return_std and return_cov together
rg -n "return_std.*return_cov|return_cov.*return_std" bayes_opt/bayesian_optimization.py

Length of output: 462


🏁 Script executed:

#!/bin/bash
# Look for the else clause handling invalid params types
rg -n -B 5 -A 5 "else:" bayes_opt/bayesian_optimization.py | grep -A 10 "params"

Length of output: 779


I need to check the test file and verify the GP behavior:


🏁 Script executed:

#!/bin/bash
# Find predict() calls in test file with single list elements or list inputs
rg -n "\.predict\(" tests/test_bayesian_optimization.py -B 2 -A 3 | head -80

Length of output: 3252


🏁 Script executed:

#!/bin/bash
# Look for assertions after predict() calls to see what output shape is expected
rg -n "\.predict\(\[" tests/test_bayesian_optimization.py -A 5 | head -60

Length of output: 1468


🏁 Script executed:

#!/bin/bash
# Check what happens when calling sklearn GP predict with both return_std and return_cov
grep -r "return_std.*return_cov\|return_cov.*return_std" . --include="*.py" | head -10

Length of output: 660


🏁 Script executed:

#!/bin/bash
# Search for any documentation or error handling about mutual exclusivity in GP wrapper
rg -n "return_std\|return_cov" bayes_opt/ --type py -B 2 -A 2 | grep -E "(ValueError|TypeError|raise|either|mutual|exclusive)" -C 3

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if GaussianProcessRegressor docs mention std/cov constraints
rg -n "class.*GaussianProcessRegressor" bayes_opt/ --type py -A 30

Length of output: 0


I need to verify the mutual exclusivity behavior with sklearn's GaussianProcessRegressor:


🌐 Web query:

scikit-learn GaussianProcessRegressor predict method return_std return_cov mutual exclusive

💡 Result:

Yes — they are mutually exclusive. The docs state the predict method may return the mean and optionally either the standard deviation (return_std=True) or the joint covariance (return_cov=True), and "at most one of the two can be requested." [1][2]

Sources:

  • scikit-learn GaussianProcessRegressor.predict (stable docs). [1]
  • scikit-learn GaussianProcessRegressor.predict (v1.3 docs). [2]

predict(): fix input validation, return type annotation, docstring, and std/cov mutual exclusivity; resolve shape handling ambiguity

Several issues require fixes:

  • Invalid params types lead to UnboundLocalError.
    If params is neither dict nor list, params_array and single_param are never set. Add an else clause:

    else:
        msg = "params must be a dict or a list of dicts."
        raise TypeError(msg)
  • Return type annotation is incorrect.
    Line 180 declares -> tuple[...] but line 245 returns just mean (non-tuple) when return_std and return_cov are both False. Fix to:

    def predict(...) -> float | NDArray[Float] | tuple[float | NDArray[Float], float | NDArray[Float]]:
  • Docstring default for return_std contradicts signature.
    Line 188 says optional(default=True) but line 179 has return_std=False. Fix docstring to optional(default=False).

  • return_std and return_cov are mutually exclusive but not validated.
    sklearn's GaussianProcessRegressor allows at most one of return_std or return_cov to be requested. Add upfront validation:

    if return_std and return_cov:
        msg = "At most one of return_std and return_cov can be True."
        raise ValueError(msg)
  • Shape handling for single-element lists is ambiguous.
    Tests at lines 628–642 call predict([single_point]) with scalar assertions like np.isclose(means, 0) and stds < 0.02, which fail if the output is an array. Yet lines 664–667 treat the same call as returning an array with len(means) == 1 and means[0] indexing. Clarify whether predict([point]) should:

    • Always return arrays (adjust tests at 628–642 to index or .item()), or
    • Return scalars when the list has one element (adjust implementation to match).
      Align implementation, docstring, and tests once decided.
🤖 Prompt for AI Agents
In bayes_opt/bayesian_optimization.py around lines 178–245, the predict() method
lacks validation for params types (causing UnboundLocalError), has an incorrect
return type annotation, a wrong docstring default for return_std, doesn't
enforce mutual exclusivity of return_std and return_cov, and handles
single-element list shapes ambiguously; fix by (1) adding an else branch that
raises TypeError if params is neither dict nor list[dict], (2) updating the
function signature return annotation to allow either a scalar/NDArray or a tuple
of scalar/NDArray, (3) correcting the docstring to state return_std default is
False, (4) validating up-front that not both return_std and return_cov are True
and raising ValueError if they are, and (5) making shape semantics
deterministic: treat dict input as returning scalars and any list input (even
length 1) as returning arrays (adjust the existing shape-conversion code
accordingly so single-element lists remain 1D arrays and single dict inputs are
reduced to scalars), then update tests/docstrings to match this behavior.

Comment on lines +592 to +803
def test_predict():
"""Test the predict method of the optimizer."""
optimizer = BayesianOptimization(f=target_func, pbounds=PBOUNDS, random_state=1, verbose=0)

# Register some points
optimizer.register(params={"p1": 1, "p2": 2}, target=3)
optimizer.register(params={"p1": 4, "p2": 5}, target=9)
optimizer.register(params={"p1": 7, "p2": 8}, target=15)

# Points to predict
test_points = [{"p1": 2, "p2": 3}, {"p1": 5, "p2": 6}, {"p1": 8, "p2": 9}]

# Perform predictions
means = optimizer.predict(test_points)

# Check that means have correct length
assert len(means) == len(test_points)

# Also test with return_std=True to get std
means, stds = optimizer.predict(test_points, return_std=True)
assert len(means) == len(test_points)
assert len(stds) == len(test_points)


def test_predict_example():
"""Test the predict method with known outputs."""
optimizer = BayesianOptimization(f=target_func, pbounds=PBOUNDS, random_state=1, verbose=0)

# Register some points
optimizer.register(params={"p1": 0, "p2": 0}, target=0)
optimizer.register(params={"p1": 10, "p2": 10}, target=20)

# Point to predict
test_point = {"p1": 0, "p2": 0}

# Perform prediction
means = optimizer.predict([test_point])
assert np.isclose(means, 0, atol=1e-3)

# Test with return_std=True
means, stds = optimizer.predict([test_point], return_std=True)
assert np.isclose(means, 0, atol=1e-3)
assert stds < 0.02 # std should be small but not tiny due to GP uncertainty

test_point = {"p1": 10, "p2": 10}
means = optimizer.predict([test_point])
assert np.isclose(means, 20, atol=1e-3)

means, stds = optimizer.predict([test_point], return_std=True)
assert np.isclose(means, 20, atol=1e-3)
assert stds < 0.02


def test_predict_no_fit():
"""Test the predict method when GP has not been fitted."""
optimizer = BayesianOptimization(f=target_func, pbounds=PBOUNDS, random_state=1, verbose=0)

# Perform prediction with fit_gp=True should raise error when no data
with pytest.raises(RuntimeError):
optimizer.predict({"p1": 5, "p2": 5}, fit_gp=True)

# Predict without fitting GP using single dict - get scalar mean by default
mean = optimizer.predict({"p1": 5, "p2": 5}, fit_gp=False)
# Since GP is not fitted, mean should be close to 0
assert np.isclose(mean, 0, atol=1e-4)

# Get std when not fitting GP
mean, std = optimizer.predict({"p1": 5, "p2": 5}, fit_gp=False, return_std=True)
# Since GP is not fitted, std should be large
assert std > 1e-2

# Test with list - returns array
means = optimizer.predict([{"p1": 5, "p2": 5}], fit_gp=False)
# With a list, even single point returns array
assert len(means) == 1
assert np.isclose(means[0], 0, atol=1e-4)


def test_predict_return_cov():
"""Test the predict method with return_cov=True."""
optimizer = BayesianOptimization(f=target_func, pbounds=PBOUNDS, random_state=1, verbose=0)

optimizer.register(params={"p1": 1, "p2": 2}, target=3)
optimizer.register(params={"p1": 4, "p2": 5}, target=9)

test_points = [{"p1": 2, "p2": 3}, {"p1": 5, "p2": 6}]

means, cov = optimizer.predict(test_points, return_cov=True)

assert len(means) == len(test_points)
assert cov.shape == (len(test_points), len(test_points))


def test_predict_integer_params():
"""Test the predict method with integer parameters."""
int_pbounds = {"p1": (0, 10, int), "p2": (0, 10, int)}
optimizer = BayesianOptimization(f=target_func, pbounds=int_pbounds, random_state=1, verbose=0)
optimizer.register(params={"p1": 1, "p2": 2}, target=3)
optimizer.register(params={"p1": 4, "p2": 5}, target=9)
test_points = [{"p1": 2, "p2": 3}, {"p1": 5, "p2": 6}]
means = optimizer.predict(test_points)
assert len(means) == len(test_points)

# Test with return_std
means, stds = optimizer.predict(test_points, return_std=True)
assert len(means) == len(test_points)
assert len(stds) == len(test_points)

float_points = [{"p1": 2.7, "p2": 3.3}, {"p1": 5.9, "p2": 6.1}]
means_float = optimizer.predict(float_points)
assert len(means_float) == len(float_points)

means_float, stds_float = optimizer.predict(float_points, return_std=True)
assert len(means_float) == len(float_points)
assert len(stds_float) == len(float_points)
# Check that rounding float inputs gives similar predictions as integer inputs

for i in range(len(test_points)):
rounded_point = {k: round(v) for k, v in float_points[i].items()}
mean_rounded = optimizer.predict([rounded_point])
assert np.isclose(means_float[i], mean_rounded, atol=1e-1)

# Also check with std
for i in range(len(test_points)):
rounded_point = {k: round(v) for k, v in float_points[i].items()}
mean_rounded, std_rounded = optimizer.predict([rounded_point], return_std=True)
assert np.isclose(means_float[i], mean_rounded, atol=1e-1)
assert np.isclose(stds_float[i], std_rounded, atol=1e-1)


def test_predict_categorical_params():
"""Test the predict method with categorical parameters."""

def cat_target_func(param1: str, param2: str) -> float:
value_map = {"low": 1.0, "medium": 2.0, "high": 3.0}
return value_map[param1] + value_map[param2]

cat_pbounds = {"param1": ["low", "medium", "high"], "param2": ["low", "medium", "high"]}

optimizer = BayesianOptimization(f=cat_target_func, pbounds=cat_pbounds, random_state=1, verbose=0)

optimizer.register(params={"param1": "low", "param2": "low"}, target=2.0)
optimizer.register(params={"param1": "high", "param2": "high"}, target=6.0)

test_points = [{"param1": "medium", "param2": "medium"}, {"param1": "low", "param2": "high"}]

means = optimizer.predict(test_points)

assert len(means) == len(test_points)
assert np.isclose(means[0], 4.0, atol=1.0)
assert np.isclose(means[1], 4.0, atol=1.0)

# Test with return_std
means, stds = optimizer.predict(test_points, return_std=True)
assert len(means) == len(test_points)
assert len(stds) == len(test_points)
assert stds[0] > 0.0
assert stds[1] > 0.0


def test_predict_no_points_registered():
"""Test the predict method when no points have been registered."""
optimizer = BayesianOptimization(f=target_func, pbounds=PBOUNDS, random_state=1, verbose=0)

test_points = [{"p1": 2, "p2": 3}, {"p1": 5, "p2": 6}]

means = optimizer.predict(test_points, fit_gp=False)

assert len(means) == len(test_points)
for mean in means:
assert np.isclose(mean, 0, atol=1e-4)

# Test with return_std to get uncertainty
means, stds = optimizer.predict(test_points, fit_gp=False, return_std=True)
assert len(means) == len(test_points)
assert len(stds) == len(test_points)
for std in stds:
assert std > 1e-2


def test_predict_custom_parameter():
"""Test the predict method with a custom parameter type."""

param = FixedPerimeterTriangleParameter(
name="sides", bounds=np.array([[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]), perimeter=1.0
)
pbounds = {"sides": param}

optimizer = BayesianOptimization(f=area_of_triangle, pbounds=pbounds, random_state=1, verbose=0)

optimizer.register(
params={"sides": np.array([0.3, 0.4, 0.3])}, target=area_of_triangle(np.array([0.3, 0.4, 0.3]))
)
optimizer.register(
params={"sides": np.array([0.2, 0.5, 0.3])}, target=area_of_triangle(np.array([0.2, 0.5, 0.3]))
)

test_points = [{"sides": np.array([0.25, 0.5, 0.25])}, {"sides": np.array([0.4, 0.4, 0.2])}]

means = optimizer.predict(test_points)

assert len(means) == len(test_points)
for i, point in enumerate(test_points):
expected_area = area_of_triangle(point["sides"])
assert np.isclose(means[i], expected_area, atol=0.1)

# Test with return_std
means, stds = optimizer.predict(test_points, return_std=True)
assert len(means) == len(test_points)
assert len(stds) == len(test_points)
for i in range(len(test_points)):
assert stds[i] > 0.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Prediction tests largely cover the new API, but shape assumptions need to match implementation

Overall these tests do a good job exercising:

  • Prior vs posterior behavior (fit_gp=True/False, zero vs nonzero observations).
  • return_std and return_cov paths.
  • Integer, categorical, and custom parameter handling.
  • Behavior when no points are registered.

A few focused issues/suggestions:

  • Scalar vs array ambiguity when calling predict([point]).
    Several tests treat predict([point]) as returning scalars:

    • test_predict_example: means = optimizer.predict([test_point]) followed by assert np.isclose(means, 0, ...) and later means, stds = optimizer.predict([test_point], return_std=True) with assert stds < 0.02.
    • test_predict_integer_params: inside loops, mean_rounded = optimizer.predict([rounded_point]) and mean_rounded, std_rounded = optimizer.predict([rounded_point], return_std=True) both compared via np.isclose to scalar values.

    With the current implementation of predict, a list input (even of length 1) yields a 1D array for mean/std, so these assertions will hit the “ambiguous truth value of array” error or silently assume array-scalar behavior that doesn’t match the docstring. You should either:

    • Update tests to treat predict([point]) as returning arrays (means[0], stds[0], or .item()), or
    • Adjust predict so that a list of length 1 is treated as a single point (see earlier comment on single_param) and document that behavior.
  • Mutual exclusivity of return_std and return_cov.
    There are no tests that exercise the invalid case where both flags are True. Once predict enforces this (see previous file comment), consider adding a small negative test, e.g.:

    with pytest.raises(ValueError):
        optimizer.predict(test_points, return_std=True, return_cov=True)

    to lock in the contract.

  • Integer, categorical, and custom parameter tests look solid but are a bit brittle.

    • test_predict_example and several others assert that posterior means at training points are very close to the observed targets and that std is < 0.02. This is reasonable with the current GaussianProcessRegressor defaults (Matern kernel, alpha=1e-6, normalize_y=True), but could become brittle if GP hyperparameters or scikit-learn defaults change. Loosening tolerances slightly or asserting monotonic/relative behavior instead of exact thresholds would make the suite more robust over time.

    • test_predict_custom_parameter uses absolute closeness atol=0.1 between the GP mean and area_of_triangle with only two training points. That’s probably okay, but again somewhat tied to current kernel behavior; a brief comment about this being a smoke test (not strict accuracy validation) would clarify intent.

  • No test for fit_gp=False with existing observations.
    You test:

    • fit_gp=True with data, and
    • fit_gp=False with no data (prior-only).

    It may be worth adding a small test where you register a few points, call predict(..., fit_gp=True) once to fit, and then call predict(..., fit_gp=False) again to ensure that using the “already-fitted GP without re-fitting” path behaves as expected. This would exercise the “don’t touch GP state” branch explicitly.

Overall, once the shape expectations for predict([point]) are aligned with the implementation, this test suite will provide good coverage for the new .predict API and its interaction with non-float parameters.


I'll verify the review comment claims by examining the actual implementation of the predict method.

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Find the predict method implementation
fd -t f -e py | xargs rg -l "def predict" | head -20

Length of output: 84


🏁 Script executed:

#!/bin/bash
# Search for predict method with context to understand implementation
rg -n -A 50 "def predict\(" --type py | head -150

Length of output: 10940


🏁 Script executed:

#!/bin/bash
# Look for the BayesianOptimization class and predict method
ast-grep --pattern $'class BayesianOptimization {
  $$$
  predict($_, $_) {
    $$$
  }
  $$$
}'

Length of output: 0


I need to see the complete predict method implementation to verify the return value handling.

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Get complete predict method implementation
rg -n -A 100 "def predict\(" bayes_opt/bayesian_optimization.py | head -200

Length of output: 4585


🏁 Script executed:

#!/bin/bash
# Check if there's validation for return_std and return_cov together
rg -n "return_std.*return_cov\|return_cov.*return_std" bayes_opt/bayesian_optimization.py

Length of output: 0


Now let me examine the actual test code to confirm the specific assertions mentioned in the review:

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Check test_predict_example assertions - lines around 615-640
sed -n '615,640p' tests/test_bayesian_optimization.py

Length of output: 921


🏁 Script executed:

#!/bin/bash
# Check test_predict_integer_params - lines around 690-720
sed -n '690,720p' tests/test_bayesian_optimization.py

Length of output: 1406


🏁 Script executed:

#!/bin/bash
# Check if tests actually pass or have issues with array comparisons
rg -n "predict\(\[" tests/test_bayesian_optimization.py | head -20

Length of output: 461


🏁 Script executed:

#!/bin/bash
# Look for fit_gp=False patterns in tests to see if that case is covered
rg -n "fit_gp=False" tests/test_bayesian_optimization.py

Length of output: 370


Fix array/scalar mismatches in predict() assertions—list inputs return arrays, not scalars.

Several test assertions incorrectly treat predict([point]) output as scalars when the implementation returns arrays for list inputs:

  • test_predict_example (lines 628, 632, 637, 640): means = optimizer.predict([test_point]) followed by assert np.isclose(means, 0, ...) and assert stds < 0.02 will fail or produce misleading results since predict() returns 1D arrays for list inputs.
  • test_predict_integer_params (lines 711, 717): Inside the loop, mean_rounded = optimizer.predict([rounded_point]) and subsequent assertions comparing against scalars have the same issue.

Update these to either extract the scalar (e.g., means[0] or .item()) or wrap the point in a dict instead: optimizer.predict(rounded_point) returns a scalar directly.

Additionally, add a test for the case where return_std=True and return_cov=True are both passed to verify mutual exclusivity is enforced once the implementation validates this. Include a test where points are registered, predict(..., fit_gp=True) is called to fit the model, and then predict(..., fit_gp=False) is called to exercise the "use already-fitted GP" path without re-fitting.

🤖 Prompt for AI Agents
In tests/test_bayesian_optimization.py around lines 592 to 803, several
assertions treat the output of optimizer.predict([point]) as scalars but the
implementation returns 1D arrays for list inputs; change those assertions to
index the first element (e.g., means[0] or stds[0] or .item()) or call predict
with a single dict (predict(point)) when a scalar is expected (update
test_predict_example lines ~628/632/637/640 and test_predict_integer_params
loops around ~711/717 accordingly). Also add a small test verifying that passing
both return_std=True and return_cov=True raises or is rejected (mutual
exclusivity), and add a test that registers points then calls predict(...,
fit_gp=True) followed by predict(..., fit_gp=False) to exercise the "use
already-fitted GP" path. Ensure all assertions compare scalars to scalars and
array outputs are asserted on length/shape before indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant