Detailed model training control and batch modelling by nikitakuklev · Pull Request #365 · xopt-org/Xopt

nikitakuklev · 2025-09-29T20:16:22Z

This PR introduces batch GP models and finer control over model training. The former can be useful for scalarized objectives, while latter is necessary to speed up BO in operational contexts. As recently demonstrated at a NAPAC25 talk, one can significantly relax fitting tolerances to meet real-time requirements without impact on convergence, especially with scalarized objectives. There is physical motivation - we cannot set the physical devices precisely enough for exact fitting to matter.

Changes:

New model constructor parameters to control training (that will be passed to the optimizer) + Pydantic classes wrapping key Adam/LBFGS knobs
New option for using adam/torch as optimizer, with appropriate defaults
New batched GP model constructor. It is not used by default for now, but once stuff like visualizer supports it, we should probably make it so for large datasets.
Complete rework of the benchmarking scripts, moving them into resources for easier import. It is now possible to run profiling and benchmarking for snippets with either fixed run count or fixed time budget.

Caveats:

Batched model hyperparameter do not match list model exactly, except if data is identical on all outputs. This is an issue somewhere in botorch, since bare gpytorch works as expected.
Batched model is slower on small problems on GPU, and sometimes on CPU. Several variables determine crossover point, such as switch from cholesky to CG-based solvers. As a rough guideline, n <= 100. For large problems, there are significant gains on GPU which makes batch modelling worthwhile.
Default behavior with cached hyperparameters has changed - model is now trained in all cases. With cached hyperparameters, this results in fine tuning of previous state. To disable this, set train_model=False.

Benchmarks for n_vars=12, n_obj=5, n_constr=2, n=500
CPU:

	f	n	t_avg	t_med	t_max	t_min	t_tot	std
0	bench_build_standard	10	1.266	1.220	1.684	1.214	12.660	0.139
1	bench_build_batched	10	1.563	1.561	1.580	1.553	15.630	0.009
2	bench_build_standard_adam	10	2.689	2.607	3.407	2.572	26.892	0.241
3	bench_build_batched_adam	10	2.977	2.953	3.051	2.943	29.772	0.040
4	bench_build_standard_gpytorch	10	15.057	15.042	15.415	14.909	150.569	0.144
5	bench_build_batched_gpytorch	10	14.926	14.867	15.100	14.820	149.261	0.100

GPU (RTX 3070, H100 todo):

	f	n	t_avg	t_med	t_max	t_min	t_tot	std
0	bench_build_standard	10	0.924	0.858	1.493	0.844	9.238	0.190
1	bench_build_batched	10	0.726	0.716	0.826	0.666	7.258	0.041
2	bench_build_standard_adam	10	1.918	1.884	2.505	1.690	19.177	0.218
3	bench_build_batched_adam	10	1.148	1.136	1.314	1.063	11.475	0.087
4	bench_build_standard_gpytorch	10	7.773	7.762	7.890	7.706	77.725	0.061
5	bench_build_batched_gpytorch	10	5.884	5.861	6.171	5.746	58.838	0.116

To reproduce:
python bench_runner.py bench_build_standard bench_build_batched bench_build_standard_adam bench_build_batched_adam bench_build_standard_gpytorch bench_build_batched_gpytorch -n 10 -device cpu

roussel-ryan · 2025-10-01T15:01:20Z

Looking good! LMK when this is ready for review and we can have a short discussion to go over it

nikitakuklev force-pushed the model_training branch from 4d8076a to 745516e Compare September 29, 2025 22:17

nikitakuklev added the enhancement New feature or request label Oct 13, 2025

nikitakuklev added 27 commits February 6, 2026 15:53

Add model training kwargs

0aba031

Add batched subclass

a6417d4

Add batch support for simple mean config

63f1db1

Add comparison test

435a10f

Benchmarking helper class

7fc8d01

Add standard vs batched benchmark

3852924

Fix model training kwargs

d07c843

Benchmark fixes

e1f5f69

Switch test to variable vocs params

b17e72b

Add batch option to model utils

7e06a8f

Add profiler to dev deps

fc56d40

New benchmarking code

6c54367

Add gpytorch core bench functions

68de2cb

Improve bench reporting

5583804

Make tests vocs-adaptive

6ae55b3

Support multiple functions per run

5572f75

Add pydantic objects for numopt

2874db2

Test fixes

52f6f02

Add generator tests

46a4997

Add generator benchmarklets

2442b8b

fix test func

f485ae0

Support time override in benchmarks

61c5ebc

Document pydantic params

9e60420

Add explicit gradient/loss comparison test

19b4719

Add test with custom params

bfe086f

fmt

34147f2

Revert defaults to botorch ones

2b2d5df

nikitakuklev added 2 commits February 6, 2026 15:55

Use explicit kwargs

1b00dd9

Batched model notebook

a76ae75

nikitakuklev force-pushed the model_training branch from 5f75338 to a76ae75 Compare February 6, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed model training control and batch modelling#365

Detailed model training control and batch modelling#365
nikitakuklev wants to merge 29 commits intoxopt-org:mainfrom
nikitakuklev:model_training

nikitakuklev commented Sep 29, 2025 •

edited

Loading

Uh oh!

roussel-ryan commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikitakuklev commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roussel-ryan commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikitakuklev commented Sep 29, 2025 •

edited

Loading