Skip to content

Bayesian generators

Bases: Generator, ABC

Bayesian Generator for Bayesian Optimization.

Attributes:

name : str The name of the Bayesian Generator.

Optional[Model]

The BoTorch model used by the generator to perform optimization.

int

The number of Monte Carlo samples to use in the optimization process.

SerializeAsAny[Optional[TurboController]]

The Turbo Controller for trust-region Bayesian Optimization.

bool

A flag to enable or disable CUDA usage if available.

SerializeAsAny[ModelConstructor]

The constructor used to generate the model for Bayesian Optimization.

SerializeAsAny[NumericalOptimizer]

The optimizer used to optimize the acquisition function in Bayesian Optimization.

Optional[List[float]]

The limits for travel distances between points in normalized space.

Optional[Dict[str, float]]

The fixed features used in Bayesian Optimization.

Optional[pd.DataFrame]

A data frame tracking computation time in seconds.

Optional[bool]

Flag to determine if final acquisition function value should be log-transformed before optimization.

Optional[PositiveInt]

Number of interpolation points to generate between last observation and next observation, requires n_candidates to be 1.

int

The number of candidates to generate in each optimization step.

Methods:

generate(self, n_candidates: int) -> List[Dict]: Generate candidates for Bayesian Optimization.

add_data(self, new_data: pd.DataFrame): Add new data to the generator for Bayesian Optimization.

train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module: Train a Bayesian model for Bayesian Optimization.

propose_candidates(self, model, n_candidates=1) -> Tensor: Propose candidates for Bayesian Optimization.

get_input_data(self, data: pd.DataFrame) -> torch.Tensor: Get input data in torch.Tensor format.

get_acquisition(self, model) -> AcquisitionFunction: Get the acquisition function for Bayesian Optimization.

Source code in xopt/generators/bayesian/bayesian_generator.py
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
class BayesianGenerator(Generator, ABC):
    """Bayesian Generator for Bayesian Optimization.

    Attributes:
    -----------
    name : str
        The name of the Bayesian Generator.

    model : Optional[Model]
        The BoTorch model used by the generator to perform optimization.

    n_monte_carlo_samples : int
        The number of Monte Carlo samples to use in the optimization process.

    turbo_controller : SerializeAsAny[Optional[TurboController]]
        The Turbo Controller for trust-region Bayesian Optimization.

    use_cuda : bool
        A flag to enable or disable CUDA usage if available.

    gp_constructor : SerializeAsAny[ModelConstructor]
        The constructor used to generate the model for Bayesian Optimization.

    numerical_optimizer : SerializeAsAny[NumericalOptimizer]
        The optimizer used to optimize the acquisition function in Bayesian Optimization.

    max_travel_distances : Optional[List[float]]
        The limits for travel distances between points in normalized space.

    fixed_features : Optional[Dict[str, float]]
        The fixed features used in Bayesian Optimization.

    computation_time : Optional[pd.DataFrame]
        A data frame tracking computation time in seconds.

    log_transform_acquisition_function: Optional[bool]
        Flag to determine if final acquisition function value should be
        log-transformed before optimization.

    n_interpolate_samples: Optional[PositiveInt]
        Number of interpolation points to generate between last observation and next
        observation, requires n_candidates to be 1.

    n_candidates : int
        The number of candidates to generate in each optimization step.

    Methods:
    --------
    generate(self, n_candidates: int) -> List[Dict]:
        Generate candidates for Bayesian Optimization.

    add_data(self, new_data: pd.DataFrame):
        Add new data to the generator for Bayesian Optimization.

    train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
        Train a Bayesian model for Bayesian Optimization.

    propose_candidates(self, model, n_candidates=1) -> Tensor:
        Propose candidates for Bayesian Optimization.

    get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
        Get input data in torch.Tensor format.

    get_acquisition(self, model) -> AcquisitionFunction:
        Get the acquisition function for Bayesian Optimization.

    """

    name = "base_bayesian_generator"
    model: Optional[Model] = Field(
        None, description="botorch model used by the generator to perform optimization"
    )
    n_monte_carlo_samples: int = Field(
        128, description="number of monte carlo samples to use"
    )
    turbo_controller: SerializeAsAny[Optional[TurboController]] = Field(
        default=None, description="turbo controller for trust-region BO"
    )
    use_cuda: bool = Field(False, description="flag to enable cuda usage if available")
    gp_constructor: SerializeAsAny[ModelConstructor] = Field(
        StandardModelConstructor(), description="constructor used to generate model"
    )
    numerical_optimizer: SerializeAsAny[NumericalOptimizer] = Field(
        LBFGSOptimizer(),
        description="optimizer used to optimize the acquisition " "function",
    )
    max_travel_distances: Optional[List[float]] = Field(
        None,
        description="limits for travel distance between points in normalized space",
    )
    fixed_features: Optional[Dict[str, float]] = Field(
        None, description="fixed features used in Bayesian optimization"
    )
    computation_time: Optional[pd.DataFrame] = Field(
        None,
        description="data frame tracking computation time in seconds",
    )
    log_transform_acquisition_function: Optional[bool] = Field(
        False,
        description="flag to log transform the acquisition function before optimization",
    )
    n_interpolate_points: Optional[PositiveInt] = None

    n_candidates: int = 1

    @field_validator("model", mode="before")
    def validate_torch_modules(cls, v):
        if isinstance(v, str):
            if v.startswith("base64:"):
                v = decode_torch_module(v)
            elif os.path.exists(v):
                v = torch.load(v)
        return v

    @field_validator("gp_constructor", mode="before")
    def validate_gp_constructor(cls, value):
        constructor_dict = {"standard": StandardModelConstructor}
        if value is None:
            value = StandardModelConstructor()
        elif isinstance(value, ModelConstructor):
            value = value
        elif isinstance(value, str):
            if value in constructor_dict:
                value = constructor_dict[value]()
            else:
                raise ValueError(f"{value} not found")
        elif isinstance(value, dict):
            name = value.pop("name")
            if name in constructor_dict:
                value = constructor_dict[name](**value)
            else:
                raise ValueError(f"{value} not found")

        return value

    @field_validator("numerical_optimizer", mode="before")
    def validate_numerical_optimizer(cls, value):
        optimizer_dict = {"grid": GridOptimizer, "LBFGS": LBFGSOptimizer}
        if value is None:
            value = LBFGSOptimizer()
        elif isinstance(value, NumericalOptimizer):
            pass
        elif isinstance(value, str):
            if value in optimizer_dict:
                value = optimizer_dict[value]()
            else:
                raise ValueError(f"{value} not found")
        elif isinstance(value, dict):
            name = value.pop("name")
            if name in optimizer_dict:
                value = optimizer_dict[name](**value)
            else:
                raise ValueError(f"{value} not found")
        return value

    @field_validator("turbo_controller", mode="before")
    def validate_turbo_controller(cls, value, info: ValidationInfo):
        """note default behavior is no use of turbo"""
        optimizer_dict = {
            "optimize": OptimizeTurboController,
            "safety": SafetyTurboController,
        }
        if isinstance(value, TurboController):
            pass
        elif isinstance(value, str):
            # create turbo controller from string input
            if value in optimizer_dict:
                value = optimizer_dict[value](info.data["vocs"])
            else:
                raise ValueError(
                    f"{value} not found, available values are "
                    f"{optimizer_dict.keys()}"
                )
        elif isinstance(value, dict):
            # create turbo controller from dict input
            if "name" not in value:
                raise ValueError("turbo input dict needs to have a `name` attribute")
            name = value.pop("name")
            if name in optimizer_dict:
                # pop unnecessary elements
                for ele in ["dim"]:
                    value.pop(ele, None)

                value = optimizer_dict[name](vocs=info.data["vocs"], **value)
            else:
                raise ValueError(
                    f"{value} not found, available values are "
                    f"{optimizer_dict.keys()}"
                )
        return value

    @field_validator("computation_time", mode="before")
    def validate_computation_time(cls, value):
        if isinstance(value, dict):
            value = pd.DataFrame(value)

        return value

    def add_data(self, new_data: pd.DataFrame):
        self.data = pd.concat([self.data, new_data], axis=0)

    def generate(self, n_candidates: int) -> list[dict]:
        """
        Generate candidates using Bayesian Optimization.

        Parameters:
        -----------
        n_candidates : int
            The number of candidates to generate in each optimization step.

        Returns:
        --------
        List[Dict]
            A list of dictionaries containing the generated candidates.

        Raises:
        -------
        NotImplementedError
            If the number of candidates is greater than 1, and the generator does not
            support batch candidate generation.

        RuntimeError
            If no data is contained in the generator, the 'add_data' method should be
            called to add data before generating candidates.

        Notes:
        ------
        This method generates candidates for Bayesian Optimization based on the
        provided number of candidates. It updates the internal model with the current
        data and calculates the candidates by optimizing the acquisition function.
        The method returns the generated candidates in the form of a list of dictionaries.
        """

        self.n_candidates = n_candidates
        if n_candidates > 1 and not self.supports_batch_generation:
            raise NotImplementedError(
                "This Bayesian algorithm does not currently support parallel candidate "
                "generation"
            )

        # if no data exists raise error
        if self.data is None:
            raise RuntimeError(
                "no data contained in generator, call `add_data` "
                "method to add data, see also `Xopt.random_evaluate()`"
            )

        else:
            # dict to track runtimes
            timing_results = {}

            # update internal model with internal data
            start_time = time.perf_counter()
            model = self.train_model(self.data)
            timing_results["training"] = time.perf_counter() - start_time

            # propose candidates given model
            start_time = time.perf_counter()
            candidates = self.propose_candidates(model, n_candidates=n_candidates)
            timing_results["acquisition_optimization"] = (
                time.perf_counter() - start_time
            )

            # post process candidates
            result = self._process_candidates(candidates)

            # append timing results to dataframe (if it exists)
            if self.computation_time is not None:
                self.computation_time = pd.concat(
                    (
                        self.computation_time,
                        pd.DataFrame(timing_results, index=[0]),
                    ),
                    ignore_index=True,
                )
            else:
                self.computation_time = pd.DataFrame(timing_results, index=[0])

            if self.n_interpolate_points is not None:
                if self.n_candidates > 1:
                    raise RuntimeError(
                        "cannot generate interpolated points for "
                        "multiple candidate generation"
                    )
                else:
                    assert len(result) == 1
                    result = interpolate_points(
                        pd.concat(
                            (self.data.iloc[-1:][self.vocs.variable_names], result),
                            axis=0,
                            ignore_index=True,
                        ),
                        num_points=self.n_interpolate_points,
                    )

            return result.to_dict("records")

    def train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
        """
        Returns a ModelListGP containing independent models for the objectives and
        constraints

        """
        if data is None:
            data = self.data
        if data.empty:
            raise ValueError("no data available to build model")

        # get input bounds
        variable_bounds = deepcopy(self.vocs.variables)

        # add fixed feature bounds if requested
        if self.fixed_features is not None:
            # get bounds for each fixed_feature (vocs bounds take precedent)
            for key in self.fixed_features:
                if key not in variable_bounds:
                    if key not in data:
                        raise KeyError(
                            "generator data needs to contain fixed feature "
                            f"column name `{key}`"
                        )
                    f_data = data[key]
                    bounds = [f_data.min(), f_data.max()]
                    if bounds[1] - bounds[0] < 1e-8:
                        bounds[1] = bounds[0] + 1e-8
                    variable_bounds[key] = bounds

        _model = self.gp_constructor.build_model(
            self.model_input_names,
            self.vocs.output_names,
            data,
            {name: variable_bounds[name] for name in self.model_input_names},
            **self._tkwargs,
        )

        if update_internal:
            self.model = _model
        return _model

    def propose_candidates(self, model, n_candidates=1):
        """
        given a GP model, propose candidates by numerically optimizing the
        acquisition function

        """
        # update TurBO state if used with the last `n_candidates` points
        if self.turbo_controller is not None:
            self.turbo_controller.update_state(self.data, n_candidates)

        # calculate optimization bounds
        bounds = self._get_optimization_bounds()

        # get acquisition function
        acq_funct = self.get_acquisition(model)

        # get candidates
        candidates = self.numerical_optimizer.optimize(acq_funct, bounds, n_candidates)
        return candidates

    def get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
        """
        Convert input data to a torch tensor.

        Parameters:
        -----------
        data : pd.DataFrame
            The input data in the form of a pandas DataFrame.

        Returns:
        --------
        torch.Tensor
            A torch tensor containing the input data.

        Notes:
        ------
        This method takes a pandas DataFrame as input data and converts it into a
        torch tensor. It specifically selects columns corresponding to the model's
        input names (variables), and the resulting tensor is configured with the data
        type and device settings from the generator.
        """
        return torch.tensor(data[self.model_input_names].to_numpy(), **self._tkwargs)

    def get_acquisition(self, model):
        """
        Define the acquisition function based on the given GP model.

        Parameters:
        -----------
        model : Model
            The BoTorch model to be used for generating the acquisition function.

        Returns:
        --------
        acqusition_function : AcqusitionFunction

        Raises:
        -------
        ValueError
            If the provided 'model' is None. A valid model is required to create the
            acquisition function.
        """
        if model is None:
            raise ValueError("model cannot be None")

        # get base acquisition function
        acq = self._get_acquisition(model)

        # apply constraints if specified in vocs
        # TODO: replace with direct constrainted acquisition function calls
        # see SampleReducingMCAcquisitionFunction in botorch for rationale
        if len(self.vocs.constraints):
            try:
                sampler = acq.sampler
            except AttributeError:
                sampler = self._get_sampler(model)

            acq = ConstrainedMCAcquisitionFunction(
                model, acq, self._get_constraint_callables(), sampler=sampler
            )

        # apply fixed features if specified in the generator
        if self.fixed_features is not None:
            # get input dim
            dim = len(self.model_input_names)
            columns = []
            values = []
            for name, value in self.fixed_features.items():
                columns += [self.model_input_names.index(name)]
                values += [value]

            acq = FixedFeatureAcquisitionFunction(
                acq_function=acq, d=dim, columns=columns, values=values
            )

        if self.log_transform_acquisition_function:
            acq = LogAcquisitionFunction(acq)

        return acq

    def get_optimum(self):
        """select the best point(s) given by the
        model using the Posterior mean"""
        c_posterior_mean = ConstrainedMCAcquisitionFunction(
            self.model,
            qUpperConfidenceBound(
                model=self.model, beta=0.0, objective=self._get_objective()
            ),
            self._get_constraint_callables(),
        )

        result = self.numerical_optimizer.optimize(
            c_posterior_mean, self._get_bounds(), 1
        )

        return self._process_candidates(result)

    def visualize_model(self, **kwargs):
        """displays the GP models"""
        return visualize_generator_model(self, **kwargs)

    def _process_candidates(self, candidates: Tensor):
        """process pytorch candidates from optimizing the acquisition function"""
        logger.debug("Best candidate from optimize", candidates)

        if self.fixed_features is not None:
            results = pd.DataFrame(
                candidates.detach().cpu().numpy(), columns=self._candidate_names
            )
            for name, val in self.fixed_features.items():
                results[name] = val

        else:
            results = self.vocs.convert_numpy_to_inputs(
                candidates.detach().cpu().numpy(), include_constants=False
            )

        return results

    def _get_sampler(self, model):
        input_data = self.get_input_data(self.data)
        sampler = get_sampler(
            model.posterior(input_data),
            sample_shape=torch.Size([self.n_monte_carlo_samples]),
        )
        return sampler

    @abstractmethod
    def _get_acquisition(self, model):
        pass

    def _get_objective(self):
        """return default objective (scalar objective) determined by vocs"""
        return create_mc_objective(self.vocs, self._tkwargs)

    def _get_constraint_callables(self):
        """return default objective (scalar objective) determined by vocs"""
        constraint_callables = create_constraint_callables(self.vocs)
        if len(constraint_callables) == 0:
            constraint_callables = None
        return constraint_callables

    @property
    def _tkwargs(self):
        # set device and data type for generator
        device = "cpu"
        if self.use_cuda:
            if torch.cuda.is_available():
                device = "cuda"
            else:
                warnings.warn(
                    "Cuda requested in generator options but not found on "
                    "machine! Using CPU instead"
                )

        return {"dtype": torch.double, "device": device}

    @property
    def model_input_names(self):
        """variable names corresponding to trained model"""
        variable_names = self.vocs.variable_names
        if self.fixed_features is not None:
            for name, val in self.fixed_features.items():
                if name not in variable_names:
                    variable_names += [name]
        return variable_names

    @property
    def _candidate_names(self):
        """variable names corresponding to generated candidates"""
        variable_names = self.vocs.variable_names
        if self.fixed_features is not None:
            for name in self.fixed_features:
                if name in variable_names:
                    variable_names.remove(name)
        return variable_names

    def _get_bounds(self):
        """convert bounds from vocs to torch tensors"""
        return torch.tensor(self.vocs.bounds, **self._tkwargs)

    def _get_optimization_bounds(self):
        """
        Get optimization bounds based on the union of several domains.

        Returns:
        --------
        torch.Tensor
            Tensor containing the optimized bounds.

        Notes:
        ------
        This method calculates the optimization bounds based on several factors:

        - If 'max_travel_distances' is specified, the bounds are modified to limit
            the maximum travel distances between points in normalized space.
        - If 'turbo_controller' is not None, the bounds are updated according to the
            trust region specified by the controller.
        - If 'fixed_features' are included in the variable names from the VOCS,
            the bounds associated with those features are removed.

        """
        bounds = deepcopy(self._get_bounds())

        # if specified modify bounds to limit maximum travel distances
        if self.max_travel_distances is not None:
            max_travel_bounds = self._get_max_travel_distances_region(bounds)
            bounds = rectilinear_domain_union(bounds, max_travel_bounds)

        # if using turbo, update turbo state and set bounds according to turbo state
        if self.turbo_controller is not None:
            # set the best value
            turbo_bounds = self.turbo_controller.get_trust_region(self.model)
            bounds = rectilinear_domain_union(bounds, turbo_bounds)

        # if fixed features key is in vocs then we need to remove the bounds
        # associated with that key
        if self.fixed_features is not None:
            # grab variable name indices that are NOT in fixed features
            indicies = []
            for idx, name in enumerate(self.vocs.variable_names):
                if name not in self.fixed_features:
                    indicies += [idx]

            # grab indexed bounds
            bounds = bounds.T[indicies].T

        return bounds

    def _get_max_travel_distances_region(self, bounds):
        """
        Calculate the region for maximum travel distances based on the current bounds
        and the last observation.

        Parameters:
        -----------
        bounds : torch.Tensor
            The optimization bounds based on the union of several domains.

        Returns:
        --------
        torch.Tensor
            The bounds for the maximum travel distances region.

        Raises:
        -------
        ValueError
            If the length of max_travel_distances does not match the number of
            variables in bounds.

        Notes:
        ------
        This method calculates the region in which the next candidates for
        optimization should be generated based on the maximum travel distances
        specified. The region is centered around the last observation in the
        optimization space. The `max_travel_distances` parameter should be a list of
        maximum travel distances for each variable.

        """
        if len(self.max_travel_distances) != bounds.shape[-1]:
            raise ValueError(
                f"length of max_travel_distances must match the number of "
                f"variables {bounds.shape[-1]}"
            )

        # get last point
        if self.data is None:
            raise ValueError(
                "No data exists to specify max_travel_distances "
                "from, add data first to use during BO"
            )
        last_point = torch.tensor(
            self.data[self.vocs.variable_names].iloc[-1].to_numpy(), **self._tkwargs
        )

        # bound lengths based on vocs for normalization
        lengths = self.vocs.bounds[1, :] - self.vocs.bounds[0, :]

        # get maximum travel distances
        max_travel_distances = torch.tensor(
            self.max_travel_distances, **self._tkwargs
        ) * torch.tensor(lengths, **self._tkwargs)
        max_travel_bounds = torch.stack(
            (last_point - max_travel_distances, last_point + max_travel_distances)
        )

        return max_travel_bounds

model_input_names property

variable names corresponding to trained model

generate(n_candidates)

Generate candidates using Bayesian Optimization.

Parameters:

n_candidates : int The number of candidates to generate in each optimization step.

Returns:

List[Dict] A list of dictionaries containing the generated candidates.

Raises:

NotImplementedError If the number of candidates is greater than 1, and the generator does not support batch candidate generation.

RuntimeError If no data is contained in the generator, the 'add_data' method should be called to add data before generating candidates.

Notes:

This method generates candidates for Bayesian Optimization based on the provided number of candidates. It updates the internal model with the current data and calculates the candidates by optimizing the acquisition function. The method returns the generated candidates in the form of a list of dictionaries.

Source code in xopt/generators/bayesian/bayesian_generator.py
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def generate(self, n_candidates: int) -> list[dict]:
    """
    Generate candidates using Bayesian Optimization.

    Parameters:
    -----------
    n_candidates : int
        The number of candidates to generate in each optimization step.

    Returns:
    --------
    List[Dict]
        A list of dictionaries containing the generated candidates.

    Raises:
    -------
    NotImplementedError
        If the number of candidates is greater than 1, and the generator does not
        support batch candidate generation.

    RuntimeError
        If no data is contained in the generator, the 'add_data' method should be
        called to add data before generating candidates.

    Notes:
    ------
    This method generates candidates for Bayesian Optimization based on the
    provided number of candidates. It updates the internal model with the current
    data and calculates the candidates by optimizing the acquisition function.
    The method returns the generated candidates in the form of a list of dictionaries.
    """

    self.n_candidates = n_candidates
    if n_candidates > 1 and not self.supports_batch_generation:
        raise NotImplementedError(
            "This Bayesian algorithm does not currently support parallel candidate "
            "generation"
        )

    # if no data exists raise error
    if self.data is None:
        raise RuntimeError(
            "no data contained in generator, call `add_data` "
            "method to add data, see also `Xopt.random_evaluate()`"
        )

    else:
        # dict to track runtimes
        timing_results = {}

        # update internal model with internal data
        start_time = time.perf_counter()
        model = self.train_model(self.data)
        timing_results["training"] = time.perf_counter() - start_time

        # propose candidates given model
        start_time = time.perf_counter()
        candidates = self.propose_candidates(model, n_candidates=n_candidates)
        timing_results["acquisition_optimization"] = (
            time.perf_counter() - start_time
        )

        # post process candidates
        result = self._process_candidates(candidates)

        # append timing results to dataframe (if it exists)
        if self.computation_time is not None:
            self.computation_time = pd.concat(
                (
                    self.computation_time,
                    pd.DataFrame(timing_results, index=[0]),
                ),
                ignore_index=True,
            )
        else:
            self.computation_time = pd.DataFrame(timing_results, index=[0])

        if self.n_interpolate_points is not None:
            if self.n_candidates > 1:
                raise RuntimeError(
                    "cannot generate interpolated points for "
                    "multiple candidate generation"
                )
            else:
                assert len(result) == 1
                result = interpolate_points(
                    pd.concat(
                        (self.data.iloc[-1:][self.vocs.variable_names], result),
                        axis=0,
                        ignore_index=True,
                    ),
                    num_points=self.n_interpolate_points,
                )

        return result.to_dict("records")

get_acquisition(model)

Define the acquisition function based on the given GP model.

Parameters:

model : Model The BoTorch model to be used for generating the acquisition function.

Returns:

acqusition_function : AcqusitionFunction

Raises:

ValueError If the provided 'model' is None. A valid model is required to create the acquisition function.

Source code in xopt/generators/bayesian/bayesian_generator.py
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
def get_acquisition(self, model):
    """
    Define the acquisition function based on the given GP model.

    Parameters:
    -----------
    model : Model
        The BoTorch model to be used for generating the acquisition function.

    Returns:
    --------
    acqusition_function : AcqusitionFunction

    Raises:
    -------
    ValueError
        If the provided 'model' is None. A valid model is required to create the
        acquisition function.
    """
    if model is None:
        raise ValueError("model cannot be None")

    # get base acquisition function
    acq = self._get_acquisition(model)

    # apply constraints if specified in vocs
    # TODO: replace with direct constrainted acquisition function calls
    # see SampleReducingMCAcquisitionFunction in botorch for rationale
    if len(self.vocs.constraints):
        try:
            sampler = acq.sampler
        except AttributeError:
            sampler = self._get_sampler(model)

        acq = ConstrainedMCAcquisitionFunction(
            model, acq, self._get_constraint_callables(), sampler=sampler
        )

    # apply fixed features if specified in the generator
    if self.fixed_features is not None:
        # get input dim
        dim = len(self.model_input_names)
        columns = []
        values = []
        for name, value in self.fixed_features.items():
            columns += [self.model_input_names.index(name)]
            values += [value]

        acq = FixedFeatureAcquisitionFunction(
            acq_function=acq, d=dim, columns=columns, values=values
        )

    if self.log_transform_acquisition_function:
        acq = LogAcquisitionFunction(acq)

    return acq

get_input_data(data)

Convert input data to a torch tensor.

Parameters:

data : pd.DataFrame The input data in the form of a pandas DataFrame.

Returns:

torch.Tensor A torch tensor containing the input data.

Notes:

This method takes a pandas DataFrame as input data and converts it into a torch tensor. It specifically selects columns corresponding to the model's input names (variables), and the resulting tensor is configured with the data type and device settings from the generator.

Source code in xopt/generators/bayesian/bayesian_generator.py
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
def get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
    """
    Convert input data to a torch tensor.

    Parameters:
    -----------
    data : pd.DataFrame
        The input data in the form of a pandas DataFrame.

    Returns:
    --------
    torch.Tensor
        A torch tensor containing the input data.

    Notes:
    ------
    This method takes a pandas DataFrame as input data and converts it into a
    torch tensor. It specifically selects columns corresponding to the model's
    input names (variables), and the resulting tensor is configured with the data
    type and device settings from the generator.
    """
    return torch.tensor(data[self.model_input_names].to_numpy(), **self._tkwargs)

get_optimum()

select the best point(s) given by the model using the Posterior mean

Source code in xopt/generators/bayesian/bayesian_generator.py
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
def get_optimum(self):
    """select the best point(s) given by the
    model using the Posterior mean"""
    c_posterior_mean = ConstrainedMCAcquisitionFunction(
        self.model,
        qUpperConfidenceBound(
            model=self.model, beta=0.0, objective=self._get_objective()
        ),
        self._get_constraint_callables(),
    )

    result = self.numerical_optimizer.optimize(
        c_posterior_mean, self._get_bounds(), 1
    )

    return self._process_candidates(result)

propose_candidates(model, n_candidates=1)

given a GP model, propose candidates by numerically optimizing the acquisition function

Source code in xopt/generators/bayesian/bayesian_generator.py
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
def propose_candidates(self, model, n_candidates=1):
    """
    given a GP model, propose candidates by numerically optimizing the
    acquisition function

    """
    # update TurBO state if used with the last `n_candidates` points
    if self.turbo_controller is not None:
        self.turbo_controller.update_state(self.data, n_candidates)

    # calculate optimization bounds
    bounds = self._get_optimization_bounds()

    # get acquisition function
    acq_funct = self.get_acquisition(model)

    # get candidates
    candidates = self.numerical_optimizer.optimize(acq_funct, bounds, n_candidates)
    return candidates

train_model(data=None, update_internal=True)

Returns a ModelListGP containing independent models for the objectives and constraints

Source code in xopt/generators/bayesian/bayesian_generator.py
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
def train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
    """
    Returns a ModelListGP containing independent models for the objectives and
    constraints

    """
    if data is None:
        data = self.data
    if data.empty:
        raise ValueError("no data available to build model")

    # get input bounds
    variable_bounds = deepcopy(self.vocs.variables)

    # add fixed feature bounds if requested
    if self.fixed_features is not None:
        # get bounds for each fixed_feature (vocs bounds take precedent)
        for key in self.fixed_features:
            if key not in variable_bounds:
                if key not in data:
                    raise KeyError(
                        "generator data needs to contain fixed feature "
                        f"column name `{key}`"
                    )
                f_data = data[key]
                bounds = [f_data.min(), f_data.max()]
                if bounds[1] - bounds[0] < 1e-8:
                    bounds[1] = bounds[0] + 1e-8
                variable_bounds[key] = bounds

    _model = self.gp_constructor.build_model(
        self.model_input_names,
        self.vocs.output_names,
        data,
        {name: variable_bounds[name] for name in self.model_input_names},
        **self._tkwargs,
    )

    if update_internal:
        self.model = _model
    return _model

validate_turbo_controller(value, info)

note default behavior is no use of turbo

Source code in xopt/generators/bayesian/bayesian_generator.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
@field_validator("turbo_controller", mode="before")
def validate_turbo_controller(cls, value, info: ValidationInfo):
    """note default behavior is no use of turbo"""
    optimizer_dict = {
        "optimize": OptimizeTurboController,
        "safety": SafetyTurboController,
    }
    if isinstance(value, TurboController):
        pass
    elif isinstance(value, str):
        # create turbo controller from string input
        if value in optimizer_dict:
            value = optimizer_dict[value](info.data["vocs"])
        else:
            raise ValueError(
                f"{value} not found, available values are "
                f"{optimizer_dict.keys()}"
            )
    elif isinstance(value, dict):
        # create turbo controller from dict input
        if "name" not in value:
            raise ValueError("turbo input dict needs to have a `name` attribute")
        name = value.pop("name")
        if name in optimizer_dict:
            # pop unnecessary elements
            for ele in ["dim"]:
                value.pop(ele, None)

            value = optimizer_dict[name](vocs=info.data["vocs"], **value)
        else:
            raise ValueError(
                f"{value} not found, available values are "
                f"{optimizer_dict.keys()}"
            )
    return value

visualize_model(**kwargs)

displays the GP models

Source code in xopt/generators/bayesian/bayesian_generator.py
518
519
520
def visualize_model(self, **kwargs):
    """displays the GP models"""
    return visualize_generator_model(self, **kwargs)

Bases: BayesianGenerator

Source code in xopt/generators/bayesian/bayesian_exploration.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class BayesianExplorationGenerator(BayesianGenerator):
    name = "bayesian_exploration"
    supports_batch_generation: bool = True

    __doc__ = "Bayesian exploration generator\n" + formatted_base_docstring()

    @field_validator("vocs", mode="after")
    def validate_vocs(cls, v, info: ValidationInfo):
        if v.n_objectives != 0:
            raise ValueError("this generator only supports observables")
        return v

    def _get_acquisition(self, model):
        sampler = self._get_sampler(model)
        qPV = qPosteriorVariance(
            model,
            sampler=sampler,
            objective=self._get_objective(),
        )

        return qPV

    def _get_objective(self):
        """return exploration objective, which only captures the output of the first
        model output"""

        return create_exploration_objective(self.vocs, self._tkwargs)

Bases: MultiObjectiveBayesianGenerator

Source code in xopt/generators/bayesian/mobo.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class MOBOGenerator(MultiObjectiveBayesianGenerator):
    name = "mobo"
    __doc__ = """Implements Multi-Objective Bayesian Optimization using the Expected
            Hypervolume Improvement acquisition function"""

    def _get_objective(self):
        return create_mobo_objective(self.vocs, self._tkwargs)

    def get_acquisition(self, model):
        """
        Returns a function that can be used to evaluate the acquisition function
        """
        if model is None:
            raise ValueError("model cannot be None")

        # get base acquisition function
        acq = self._get_acquisition(model)

        # apply fixed features if specified in the generator
        if self.fixed_features is not None:
            # get input dim
            dim = len(self.model_input_names)
            columns = []
            values = []
            for name, value in self.fixed_features.items():
                columns += [self.model_input_names.index(name)]
                values += [value]

            acq = FixedFeatureAcquisitionFunction(
                acq_function=acq, d=dim, columns=columns, values=values
            )

        return acq

    def _get_acquisition(self, model):
        inputs = self.get_input_data(self.data)
        sampler = self._get_sampler(model)

        if self.log_transform_acquisition_function:
            acqclass = qLogNoisyExpectedHypervolumeImprovement
        else:
            acqclass = qNoisyExpectedHypervolumeImprovement

        acq = acqclass(
            model,
            X_baseline=inputs,
            constraints=self._get_constraint_callables(),
            ref_point=self.torch_reference_point,
            sampler=sampler,
            objective=self._get_objective(),
            cache_root=False,
            prune_baseline=True,
        )

        return acq

get_acquisition(model)

Returns a function that can be used to evaluate the acquisition function

Source code in xopt/generators/bayesian/mobo.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def get_acquisition(self, model):
    """
    Returns a function that can be used to evaluate the acquisition function
    """
    if model is None:
        raise ValueError("model cannot be None")

    # get base acquisition function
    acq = self._get_acquisition(model)

    # apply fixed features if specified in the generator
    if self.fixed_features is not None:
        # get input dim
        dim = len(self.model_input_names)
        columns = []
        values = []
        for name, value in self.fixed_features.items():
            columns += [self.model_input_names.index(name)]
            values += [value]

        acq = FixedFeatureAcquisitionFunction(
            acq_function=acq, d=dim, columns=columns, values=values
        )

    return acq

Bases: BayesianGenerator

Source code in xopt/generators/bayesian/upper_confidence_bound.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class UpperConfidenceBoundGenerator(BayesianGenerator):
    name = "upper_confidence_bound"
    beta: float = Field(2.0, description="Beta parameter for UCB optimization")
    supports_batch_generation: bool = True
    __doc__ = (
        """Bayesian optimization generator using Upper Confidence Bound

Attributes
----------
beta : float, default 2.0
    Beta parameter for UCB optimization, controlling the trade-off between exploration
    and exploitation. Higher values of beta prioritize exploration.

    """
        + formatted_base_docstring()
    )

    @field_validator("vocs")
    def validate_vocs_without_constraints(cls, v):
        if v.constraints:
            warnings.warn(
                f"Using {cls.__name__} with constraints may lead to numerical issues if the base acquisition "
                f"function has negative values."
            )
        return v

    @field_validator("log_transform_acquisition_function")
    def validate_log_transform_acquisition_function(cls, v):
        if v:
            raise ValueError(
                "Log transform cannot be applied to potentially negative UCB "
                "acquisition function."
            )

    def _get_acquisition(self, model):
        if self.n_candidates > 1:
            # MC sampling for generating multiple candidate points
            sampler = self._get_sampler(model)
            acq = qUpperConfidenceBound(
                model,
                sampler=sampler,
                objective=self._get_objective(),
                beta=self.beta,
            )
        else:
            # analytic acquisition function for single candidate generation
            weights = torch.zeros(self.vocs.n_outputs).to(**self._tkwargs)
            weights = set_botorch_weights(weights, self.vocs)
            posterior_transform = ScalarizedPosteriorTransform(weights)
            acq = UpperConfidenceBound(
                model, beta=self.beta, posterior_transform=posterior_transform
            )

        return acq