Bayesian generators

Bases: Generator, ABC

Bayesian Generator for Bayesian Optimization.

Attributes:

name : str The name of the Bayesian Generator.

Optional[Model]

The BoTorch model used by the generator to perform optimization.

int

The number of Monte Carlo samples to use in the optimization process.

SerializeAsAny[Optional[TurboController]]

The Turbo Controller for trust-region Bayesian Optimization.

bool

A flag to enable or disable CUDA usage if available.

SerializeAsAny[ModelConstructor]

The constructor used to generate the model for Bayesian Optimization.

SerializeAsAny[NumericalOptimizer]

The optimizer used to optimize the acquisition function in Bayesian Optimization.

Optional[List[float]]

The limits for travel distances between points in normalized space.

Optional[Dict[str, float]]

The fixed features used in Bayesian Optimization.

Optional[pd.DataFrame]

A data frame tracking computation time in seconds.

Optional[bool]

Flag to determine if final acquisition function value should be log-transformed before optimization.

Optional[PositiveInt]

Number of interpolation points to generate between last observation and next observation, requires n_candidates to be 1.

int

The number of candidates to generate in each optimization step.

Methods:

generate(self, n_candidates: int) -> List[Dict]: Generate candidates for Bayesian Optimization.

add_data(self, new_data: pd.DataFrame): Add new data to the generator for Bayesian Optimization.

train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module: Train a Bayesian model for Bayesian Optimization.

propose_candidates(self, model, n_candidates=1) -> Tensor: Propose candidates for Bayesian Optimization.

get_input_data(self, data: pd.DataFrame) -> torch.Tensor: Get input data in torch.Tensor format.

get_acquisition(self, model) -> AcquisitionFunction: Get the acquisition function for Bayesian Optimization.

Source code in xopt/generators/bayesian/bayesian_generator.py

class BayesianGenerator(Generator, ABC):
    """Bayesian Generator for Bayesian Optimization.

    Attributes:
    -----------
    name : str
        The name of the Bayesian Generator.

    model : Optional[Model]
        The BoTorch model used by the generator to perform optimization.

    n_monte_carlo_samples : int
        The number of Monte Carlo samples to use in the optimization process.

    turbo_controller : SerializeAsAny[Optional[TurboController]]
        The Turbo Controller for trust-region Bayesian Optimization.

    use_cuda : bool
        A flag to enable or disable CUDA usage if available.

    gp_constructor : SerializeAsAny[ModelConstructor]
        The constructor used to generate the model for Bayesian Optimization.

    numerical_optimizer : SerializeAsAny[NumericalOptimizer]
        The optimizer used to optimize the acquisition function in Bayesian Optimization.

    max_travel_distances : Optional[List[float]]
        The limits for travel distances between points in normalized space.

    fixed_features : Optional[Dict[str, float]]
        The fixed features used in Bayesian Optimization.

    computation_time : Optional[pd.DataFrame]
        A data frame tracking computation time in seconds.

    log_transform_acquisition_function: Optional[bool]
        Flag to determine if final acquisition function value should be
        log-transformed before optimization.

    n_interpolate_samples: Optional[PositiveInt]
        Number of interpolation points to generate between last observation and next
        observation, requires n_candidates to be 1.

    n_candidates : int
        The number of candidates to generate in each optimization step.

    Methods:
    --------
    generate(self, n_candidates: int) -> List[Dict]:
        Generate candidates for Bayesian Optimization.

    add_data(self, new_data: pd.DataFrame):
        Add new data to the generator for Bayesian Optimization.

    train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
        Train a Bayesian model for Bayesian Optimization.

    propose_candidates(self, model, n_candidates=1) -> Tensor:
        Propose candidates for Bayesian Optimization.

    get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
        Get input data in torch.Tensor format.

    get_acquisition(self, model) -> AcquisitionFunction:
        Get the acquisition function for Bayesian Optimization.

    """

    name = "base_bayesian_generator"
    model: Optional[Model] = Field(
        None, description="botorch model used by the generator to perform optimization"
    )
    n_monte_carlo_samples: int = Field(
        128, description="number of monte carlo samples to use"
    )
    turbo_controller: SerializeAsAny[Optional[TurboController]] = Field(
        default=None, description="turbo controller for trust-region BO"
    )
    use_cuda: bool = Field(False, description="flag to enable cuda usage if available")
    gp_constructor: SerializeAsAny[ModelConstructor] = Field(
        StandardModelConstructor(), description="constructor used to generate model"
    )
    numerical_optimizer: SerializeAsAny[NumericalOptimizer] = Field(
        LBFGSOptimizer(),
        description="optimizer used to optimize the acquisition " "function",
    )
    max_travel_distances: Optional[List[float]] = Field(
        None,
        description="limits for travel distance between points in normalized space",
    )
    fixed_features: Optional[Dict[str, float]] = Field(
        None, description="fixed features used in Bayesian optimization"
    )
    computation_time: Optional[pd.DataFrame] = Field(
        None,
        description="data frame tracking computation time in seconds",
    )
    log_transform_acquisition_function: Optional[bool] = Field(
        False,
        description="flag to log transform the acquisition function before optimization",
    )
    n_interpolate_points: Optional[PositiveInt] = None

    n_candidates: int = 1

    @field_validator("model", mode="before")
    def validate_torch_modules(cls, v):
        if isinstance(v, str):
            if v.startswith("base64:"):
                v = decode_torch_module(v)
            elif os.path.exists(v):
                v = torch.load(v)
        return v

    @field_validator("gp_constructor", mode="before")
    def validate_gp_constructor(cls, value):
        constructor_dict = {"standard": StandardModelConstructor}
        if value is None:
            value = StandardModelConstructor()
        elif isinstance(value, ModelConstructor):
            value = value
        elif isinstance(value, str):
            if value in constructor_dict:
                value = constructor_dict[value]()
            else:
                raise ValueError(f"{value} not found")
        elif isinstance(value, dict):
            name = value.pop("name")
            if name in constructor_dict:
                value = constructor_dict[name](**value)
            else:
                raise ValueError(f"{value} not found")

        return value

    @field_validator("numerical_optimizer", mode="before")
    def validate_numerical_optimizer(cls, value):
        optimizer_dict = {"grid": GridOptimizer, "LBFGS": LBFGSOptimizer}
        if value is None:
            value = LBFGSOptimizer()
        elif isinstance(value, NumericalOptimizer):
            pass
        elif isinstance(value, str):
            if value in optimizer_dict:
                value = optimizer_dict[value]()
            else:
                raise ValueError(f"{value} not found")
        elif isinstance(value, dict):
            name = value.pop("name")
            if name in optimizer_dict:
                value = optimizer_dict[name](**value)
            else:
                raise ValueError(f"{value} not found")
        return value

    @field_validator("turbo_controller", mode="before")
    def validate_turbo_controller(cls, value, info: ValidationInfo):
        """note default behavior is no use of turbo"""
        optimizer_dict = {
            "optimize": OptimizeTurboController,
            "safety": SafetyTurboController,
        }
        if isinstance(value, TurboController):
            pass
        elif isinstance(value, str):
            # create turbo controller from string input
            if value in optimizer_dict:
                value = optimizer_dict[value](info.data["vocs"])
            else:
                raise ValueError(
                    f"{value} not found, available values are "
                    f"{optimizer_dict.keys()}"
                )
        elif isinstance(value, dict):
            # create turbo controller from dict input
            if "name" not in value:
                raise ValueError("turbo input dict needs to have a `name` attribute")
            name = value.pop("name")
            if name in optimizer_dict:
                # pop unnecessary elements
                for ele in ["dim"]:
                    value.pop(ele, None)

                value = optimizer_dict[name](vocs=info.data["vocs"], **value)
            else:
                raise ValueError(
                    f"{value} not found, available values are "
                    f"{optimizer_dict.keys()}"
                )
        return value

    @field_validator("computation_time", mode="before")
    def validate_computation_time(cls, value):
        if isinstance(value, dict):
            value = pd.DataFrame(value)

        return value

    def add_data(self, new_data: pd.DataFrame):
        self.data = pd.concat([self.data, new_data], axis=0)

    def generate(self, n_candidates: int) -> list[dict]:
        """
        Generate candidates using Bayesian Optimization.

        Parameters:
        -----------
        n_candidates : int
            The number of candidates to generate in each optimization step.

        Returns:
        --------
        List[Dict]
            A list of dictionaries containing the generated candidates.

        Raises:
        -------
        NotImplementedError
            If the number of candidates is greater than 1, and the generator does not
            support batch candidate generation.

        RuntimeError
            If no data is contained in the generator, the 'add_data' method should be
            called to add data before generating candidates.

        Notes:
        ------
        This method generates candidates for Bayesian Optimization based on the
        provided number of candidates. It updates the internal model with the current
        data and calculates the candidates by optimizing the acquisition function.
        The method returns the generated candidates in the form of a list of dictionaries.
        """

        self.n_candidates = n_candidates
        if n_candidates > 1 and not self.supports_batch_generation:
            raise NotImplementedError(
                "This Bayesian algorithm does not currently support parallel candidate "
                "generation"
            )

        # if no data exists raise error
        if self.data is None:
            raise RuntimeError(
                "no data contained in generator, call `add_data` "
                "method to add data, see also `Xopt.random_evaluate()`"
            )

        else:
            # dict to track runtimes
            timing_results = {}

            # update internal model with internal data
            start_time = time.perf_counter()
            model = self.train_model(self.data)
            timing_results["training"] = time.perf_counter() - start_time

            # propose candidates given model
            start_time = time.perf_counter()
            candidates = self.propose_candidates(model, n_candidates=n_candidates)
            timing_results["acquisition_optimization"] = (
                time.perf_counter() - start_time
            )

            # post process candidates
            result = self._process_candidates(candidates)

            # append timing results to dataframe (if it exists)
            if self.computation_time is not None:
                self.computation_time = pd.concat(
                    (
                        self.computation_time,
                        pd.DataFrame(timing_results, index=[0]),
                    ),
                    ignore_index=True,
                )
            else:
                self.computation_time = pd.DataFrame(timing_results, index=[0])

            if self.n_interpolate_points is not None:
                if self.n_candidates > 1:
                    raise RuntimeError(
                        "cannot generate interpolated points for "
                        "multiple candidate generation"
                    )
                else:
                    assert len(result) == 1
                    result = interpolate_points(
                        pd.concat(
                            (self.data.iloc[-1:][self.vocs.variable_names], result),
                            axis=0,
                            ignore_index=True,
                        ),
                        num_points=self.n_interpolate_points,
                    )

            return result.to_dict("records")

    def train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
        """
        Returns a ModelListGP containing independent models for the objectives and
        constraints

        """
        if data is None:
            data = self.data
        if data.empty:
            raise ValueError("no data available to build model")

        # get input bounds
        variable_bounds = deepcopy(self.vocs.variables)

        # add fixed feature bounds if requested
        if self.fixed_features is not None:
            # get bounds for each fixed_feature (vocs bounds take precedent)
            for key in self.fixed_features:
                if key not in variable_bounds:
                    if key not in data:
                        raise KeyError(
                            "generator data needs to contain fixed feature "
                            f"column name `{key}`"
                        )
                    f_data = data[key]
                    bounds = [f_data.min(), f_data.max()]
                    if bounds[1] - bounds[0] < 1e-8:
                        bounds[1] = bounds[0] + 1e-8
                    variable_bounds[key] = bounds

        _model = self.gp_constructor.build_model(
            self.model_input_names,
            self.vocs.output_names,
            data,
            {name: variable_bounds[name] for name in self.model_input_names},
            **self._tkwargs,
        )

        if update_internal:
            self.model = _model
        return _model

    def propose_candidates(self, model, n_candidates=1):
        """
        given a GP model, propose candidates by numerically optimizing the
        acquisition function

        """
        # update TurBO state if used with the last `n_candidates` points
        if self.turbo_controller is not None:
            self.turbo_controller.update_state(self.data, n_candidates)

        # calculate optimization bounds
        bounds = self._get_optimization_bounds()

        # get acquisition function
        acq_funct = self.get_acquisition(model)

        # get candidates
        candidates = self.numerical_optimizer.optimize(acq_funct, bounds, n_candidates)
        return candidates

    def get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
        """
        Convert input data to a torch tensor.

        Parameters:
        -----------
        data : pd.DataFrame
            The input data in the form of a pandas DataFrame.

        Returns:
        --------
        torch.Tensor
            A torch tensor containing the input data.

        Notes:
        ------
        This method takes a pandas DataFrame as input data and converts it into a
        torch tensor. It specifically selects columns corresponding to the model's
        input names (variables), and the resulting tensor is configured with the data
        type and device settings from the generator.
        """
        return torch.tensor(data[self.model_input_names].to_numpy(), **self._tkwargs)

    def get_acquisition(self, model):
        """
        Define the acquisition function based on the given GP model.

        Parameters:
        -----------
        model : Model
            The BoTorch model to be used for generating the acquisition function.

        Returns:
        --------
        acqusition_function : AcqusitionFunction

        Raises:
        -------
        ValueError
            If the provided 'model' is None. A valid model is required to create the
            acquisition function.
        """
        if model is None:
            raise ValueError("model cannot be None")

        # get base acquisition function
        acq = self._get_acquisition(model)

        # apply constraints if specified in vocs
        # TODO: replace with direct constrainted acquisition function calls
        # see SampleReducingMCAcquisitionFunction in botorch for rationale
        if len(self.vocs.constraints):
            try:
                sampler = acq.sampler
            except AttributeError:
                sampler = self._get_sampler(model)

            acq = ConstrainedMCAcquisitionFunction(
                model, acq, self._get_constraint_callables(), sampler=sampler
            )

        # apply fixed features if specified in the generator
        if self.fixed_features is not None:
            # get input dim
            dim = len(self.model_input_names)
            columns = []
            values = []
            for name, value in self.fixed_features.items():
                columns += [self.model_input_names.index(name)]
                values += [value]

            acq = FixedFeatureAcquisitionFunction(
                acq_function=acq, d=dim, columns=columns, values=values
            )

        if self.log_transform_acquisition_function:
            acq = LogAcquisitionFunction(acq)

        return acq

    def get_optimum(self):
        """select the best point(s) given by the
        model using the Posterior mean"""
        c_posterior_mean = ConstrainedMCAcquisitionFunction(
            self.model,
            qUpperConfidenceBound(
                model=self.model, beta=0.0, objective=self._get_objective()
            ),
            self._get_constraint_callables(),
        )

        result = self.numerical_optimizer.optimize(
            c_posterior_mean, self._get_bounds(), 1
        )

        return self._process_candidates(result)

    def visualize_model(self, **kwargs):
        """displays the GP models"""
        return visualize_generator_model(self, **kwargs)

    def _process_candidates(self, candidates: Tensor):
        """process pytorch candidates from optimizing the acquisition function"""
        logger.debug("Best candidate from optimize", candidates)

        if self.fixed_features is not None:
            results = pd.DataFrame(
                candidates.detach().cpu().numpy(), columns=self._candidate_names
            )
            for name, val in self.fixed_features.items():
                results[name] = val

        else:
            results = self.vocs.convert_numpy_to_inputs(
                candidates.detach().cpu().numpy(), include_constants=False
            )

        return results

    def _get_sampler(self, model):
        input_data = self.get_input_data(self.data)
        sampler = get_sampler(
            model.posterior(input_data),
            sample_shape=torch.Size([self.n_monte_carlo_samples]),
        )
        return sampler

    @abstractmethod
    def _get_acquisition(self, model):
        pass

    def _get_objective(self):
        """return default objective (scalar objective) determined by vocs"""
        return create_mc_objective(self.vocs, self._tkwargs)

    def _get_constraint_callables(self):
        """return default objective (scalar objective) determined by vocs"""
        constraint_callables = create_constraint_callables(self.vocs)
        if len(constraint_callables) == 0:
            constraint_callables = None
        return constraint_callables

    @property
    def _tkwargs(self):
        # set device and data type for generator
        device = "cpu"
        if self.use_cuda:
            if torch.cuda.is_available():
                device = "cuda"
            else:
                warnings.warn(
                    "Cuda requested in generator options but not found on "
                    "machine! Using CPU instead"
                )

        return {"dtype": torch.double, "device": device}

    @property
    def model_input_names(self):
        """variable names corresponding to trained model"""
        variable_names = self.vocs.variable_names
        if self.fixed_features is not None:
            for name, val in self.fixed_features.items():
                if name not in variable_names:
                    variable_names += [name]
        return variable_names

    @property
    def _candidate_names(self):
        """variable names corresponding to generated candidates"""
        variable_names = self.vocs.variable_names
        if self.fixed_features is not None:
            for name in self.fixed_features:
                if name in variable_names:
                    variable_names.remove(name)
        return variable_names

    def _get_bounds(self):
        """convert bounds from vocs to torch tensors"""
        return torch.tensor(self.vocs.bounds, **self._tkwargs)

    def _get_optimization_bounds(self):
        """
        Get optimization bounds based on the union of several domains.

        Returns:
        --------
        torch.Tensor
            Tensor containing the optimized bounds.

        Notes:
        ------
        This method calculates the optimization bounds based on several factors:

        - If 'max_travel_distances' is specified, the bounds are modified to limit
            the maximum travel distances between points in normalized space.
        - If 'turbo_controller' is not None, the bounds are updated according to the
            trust region specified by the controller.
        - If 'fixed_features' are included in the variable names from the VOCS,
            the bounds associated with those features are removed.

        """
        bounds = deepcopy(self._get_bounds())

        # if specified modify bounds to limit maximum travel distances
        if self.max_travel_distances is not None:
            max_travel_bounds = self._get_max_travel_distances_region(bounds)
            bounds = rectilinear_domain_union(bounds, max_travel_bounds)

        # if using turbo, update turbo state and set bounds according to turbo state
        if self.turbo_controller is not None:
            # set the best value
            turbo_bounds = self.turbo_controller.get_trust_region(self.model)
            bounds = rectilinear_domain_union(bounds, turbo_bounds)

        # if fixed features key is in vocs then we need to remove the bounds
        # associated with that key
        if self.fixed_features is not None:
            # grab variable name indices that are NOT in fixed features
            indicies = []
            for idx, name in enumerate(self.vocs.variable_names):
                if name not in self.fixed_features:
                    indicies += [idx]

            # grab indexed bounds
            bounds = bounds.T[indicies].T

        return bounds

    def _get_max_travel_distances_region(self, bounds):
        """
        Calculate the region for maximum travel distances based on the current bounds
        and the last observation.

        Parameters:
        -----------
        bounds : torch.Tensor
            The optimization bounds based on the union of several domains.

        Returns:
        --------
        torch.Tensor
            The bounds for the maximum travel distances region.

        Raises:
        -------
        ValueError
            If the length of max_travel_distances does not match the number of
            variables in bounds.

        Notes:
        ------
        This method calculates the region in which the next candidates for
        optimization should be generated based on the maximum travel distances
        specified. The region is centered around the last observation in the
        optimization space. The `max_travel_distances` parameter should be a list of
        maximum travel distances for each variable.

        """
        if len(self.max_travel_distances) != bounds.shape[-1]:
            raise ValueError(
                f"length of max_travel_distances must match the number of "
                f"variables {bounds.shape[-1]}"
            )

        # get last point
        if self.data is None:
            raise ValueError(
                "No data exists to specify max_travel_distances "
                "from, add data first to use during BO"
            )
        last_point = torch.tensor(
            self.data[self.vocs.variable_names].iloc[-1].to_numpy(), **self._tkwargs
        )

        # bound lengths based on vocs for normalization
        lengths = self.vocs.bounds[1, :] - self.vocs.bounds[0, :]

        # get maximum travel distances
        max_travel_distances = torch.tensor(
            self.max_travel_distances, **self._tkwargs
        ) * torch.tensor(lengths, **self._tkwargs)
        max_travel_bounds = torch.stack(
            (last_point - max_travel_distances, last_point + max_travel_distances)
        )

        return max_travel_bounds

`model_input_names` `property`

variable names corresponding to trained model

`generate(n_candidates)`

Generate candidates using Bayesian Optimization.

Parameters:

n_candidates : int The number of candidates to generate in each optimization step.

Returns:

List[Dict] A list of dictionaries containing the generated candidates.

Raises:

NotImplementedError If the number of candidates is greater than 1, and the generator does not support batch candidate generation.

RuntimeError If no data is contained in the generator, the 'add_data' method should be called to add data before generating candidates.

Notes:

This method generates candidates for Bayesian Optimization based on the provided number of candidates. It updates the internal model with the current data and calculates the candidates by optimizing the acquisition function. The method returns the generated candidates in the form of a list of dictionaries.

Source code in xopt/generators/bayesian/bayesian_generator.py

def generate(self, n_candidates: int) -> list[dict]:
    """
    Generate candidates using Bayesian Optimization.

    Parameters:
    -----------
    n_candidates : int
        The number of candidates to generate in each optimization step.

    Returns:
    --------
    List[Dict]
        A list of dictionaries containing the generated candidates.

    Raises:
    -------
    NotImplementedError
        If the number of candidates is greater than 1, and the generator does not
        support batch candidate generation.

    RuntimeError
        If no data is contained in the generator, the 'add_data' method should be
        called to add data before generating candidates.

    Notes:
    ------
    This method generates candidates for Bayesian Optimization based on the
    provided number of candidates. It updates the internal model with the current
    data and calculates the candidates by optimizing the acquisition function.
    The method returns the generated candidates in the form of a list of dictionaries.
    """

    self.n_candidates = n_candidates
    if n_candidates > 1 and not self.supports_batch_generation:
        raise NotImplementedError(
            "This Bayesian algorithm does not currently support parallel candidate "
            "generation"
        )

    # if no data exists raise error
    if self.data is None:
        raise RuntimeError(
            "no data contained in generator, call `add_data` "
            "method to add data, see also `Xopt.random_evaluate()`"
        )

    else:
        # dict to track runtimes
        timing_results = {}

        # update internal model with internal data
        start_time = time.perf_counter()
        model = self.train_model(self.data)
        timing_results["training"] = time.perf_counter() - start_time

        # propose candidates given model
        start_time = time.perf_counter()
        candidates = self.propose_candidates(model, n_candidates=n_candidates)
        timing_results["acquisition_optimization"] = (
            time.perf_counter() - start_time
        )

        # post process candidates
        result = self._process_candidates(candidates)

        # append timing results to dataframe (if it exists)
        if self.computation_time is not None:
            self.computation_time = pd.concat(
                (
                    self.computation_time,
                    pd.DataFrame(timing_results, index=[0]),
                ),
                ignore_index=True,
            )
        else:
            self.computation_time = pd.DataFrame(timing_results, index=[0])

        if self.n_interpolate_points is not None:
            if self.n_candidates > 1:
                raise RuntimeError(
                    "cannot generate interpolated points for "
                    "multiple candidate generation"
                )
            else:
                assert len(result) == 1
                result = interpolate_points(
                    pd.concat(
                        (self.data.iloc[-1:][self.vocs.variable_names], result),
                        axis=0,
                        ignore_index=True,
                    ),
                    num_points=self.n_interpolate_points,
                )

        return result.to_dict("records")

`get_acquisition(model)`

Define the acquisition function based on the given GP model.

Parameters:

model : Model The BoTorch model to be used for generating the acquisition function.

Returns:

acqusition_function : AcqusitionFunction

Raises:

ValueError If the provided 'model' is None. A valid model is required to create the acquisition function.

Source code in xopt/generators/bayesian/bayesian_generator.py

def get_acquisition(self, model):
    """
    Define the acquisition function based on the given GP model.

    Parameters:
    -----------
    model : Model
        The BoTorch model to be used for generating the acquisition function.

    Returns:
    --------
    acqusition_function : AcqusitionFunction

    Raises:
    -------
    ValueError
        If the provided 'model' is None. A valid model is required to create the
        acquisition function.
    """
    if model is None:
        raise ValueError("model cannot be None")

    # get base acquisition function
    acq = self._get_acquisition(model)

    # apply constraints if specified in vocs
    # TODO: replace with direct constrainted acquisition function calls
    # see SampleReducingMCAcquisitionFunction in botorch for rationale
    if len(self.vocs.constraints):
        try:
            sampler = acq.sampler
        except AttributeError:
            sampler = self._get_sampler(model)

        acq = ConstrainedMCAcquisitionFunction(
            model, acq, self._get_constraint_callables(), sampler=sampler
        )

    # apply fixed features if specified in the generator
    if self.fixed_features is not None:
        # get input dim
        dim = len(self.model_input_names)
        columns = []
        values = []
        for name, value in self.fixed_features.items():
            columns += [self.model_input_names.index(name)]
            values += [value]

        acq = FixedFeatureAcquisitionFunction(
            acq_function=acq, d=dim, columns=columns, values=values
        )

    if self.log_transform_acquisition_function:
        acq = LogAcquisitionFunction(acq)

    return acq

`get_input_data(data)`

Convert input data to a torch tensor.

Parameters:

data : pd.DataFrame The input data in the form of a pandas DataFrame.

Returns:

torch.Tensor A torch tensor containing the input data.

Notes:

This method takes a pandas DataFrame as input data and converts it into a torch tensor. It specifically selects columns corresponding to the model's input names (variables), and the resulting tensor is configured with the data type and device settings from the generator.

Source code in xopt/generators/bayesian/bayesian_generator.py

def get_input_data(self, data: pd.DataFrame) -> torch.Tensor:
    """
    Convert input data to a torch tensor.

    Parameters:
    -----------
    data : pd.DataFrame
        The input data in the form of a pandas DataFrame.

    Returns:
    --------
    torch.Tensor
        A torch tensor containing the input data.

    Notes:
    ------
    This method takes a pandas DataFrame as input data and converts it into a
    torch tensor. It specifically selects columns corresponding to the model's
    input names (variables), and the resulting tensor is configured with the data
    type and device settings from the generator.
    """
    return torch.tensor(data[self.model_input_names].to_numpy(), **self._tkwargs)

`get_optimum()`

select the best point(s) given by the model using the Posterior mean

Source code in xopt/generators/bayesian/bayesian_generator.py

def get_optimum(self):
    """select the best point(s) given by the
    model using the Posterior mean"""
    c_posterior_mean = ConstrainedMCAcquisitionFunction(
        self.model,
        qUpperConfidenceBound(
            model=self.model, beta=0.0, objective=self._get_objective()
        ),
        self._get_constraint_callables(),
    )

    result = self.numerical_optimizer.optimize(
        c_posterior_mean, self._get_bounds(), 1
    )

    return self._process_candidates(result)

`propose_candidates(model, n_candidates=1)`

given a GP model, propose candidates by numerically optimizing the acquisition function

Source code in xopt/generators/bayesian/bayesian_generator.py

def propose_candidates(self, model, n_candidates=1):
    """
    given a GP model, propose candidates by numerically optimizing the
    acquisition function

    """
    # update TurBO state if used with the last `n_candidates` points
    if self.turbo_controller is not None:
        self.turbo_controller.update_state(self.data, n_candidates)

    # calculate optimization bounds
    bounds = self._get_optimization_bounds()

    # get acquisition function
    acq_funct = self.get_acquisition(model)

    # get candidates
    candidates = self.numerical_optimizer.optimize(acq_funct, bounds, n_candidates)
    return candidates

`train_model(data=None, update_internal=True)`

Returns a ModelListGP containing independent models for the objectives and constraints

Source code in xopt/generators/bayesian/bayesian_generator.py

def train_model(self, data: pd.DataFrame = None, update_internal=True) -> Module:
    """
    Returns a ModelListGP containing independent models for the objectives and
    constraints

    """
    if data is None:
        data = self.data
    if data.empty:
        raise ValueError("no data available to build model")

    # get input bounds
    variable_bounds = deepcopy(self.vocs.variables)

    # add fixed feature bounds if requested
    if self.fixed_features is not None:
        # get bounds for each fixed_feature (vocs bounds take precedent)
        for key in self.fixed_features:
            if key not in variable_bounds:
                if key not in data:
                    raise KeyError(
                        "generator data needs to contain fixed feature "
                        f"column name `{key}`"
                    )
                f_data = data[key]
                bounds = [f_data.min(), f_data.max()]
                if bounds[1] - bounds[0] < 1e-8:
                    bounds[1] = bounds[0] + 1e-8
                variable_bounds[key] = bounds

    _model = self.gp_constructor.build_model(
        self.model_input_names,
        self.vocs.output_names,
        data,
        {name: variable_bounds[name] for name in self.model_input_names},
        **self._tkwargs,
    )

    if update_internal:
        self.model = _model
    return _model

`validate_turbo_controller(value, info)`

note default behavior is no use of turbo

Source code in xopt/generators/bayesian/bayesian_generator.py

@field_validator("turbo_controller", mode="before")
def validate_turbo_controller(cls, value, info: ValidationInfo):
    """note default behavior is no use of turbo"""
    optimizer_dict = {
        "optimize": OptimizeTurboController,
        "safety": SafetyTurboController,
    }
    if isinstance(value, TurboController):
        pass
    elif isinstance(value, str):
        # create turbo controller from string input
        if value in optimizer_dict:
            value = optimizer_dict[value](info.data["vocs"])
        else:
            raise ValueError(
                f"{value} not found, available values are "
                f"{optimizer_dict.keys()}"
            )
    elif isinstance(value, dict):
        # create turbo controller from dict input
        if "name" not in value:
            raise ValueError("turbo input dict needs to have a `name` attribute")
        name = value.pop("name")
        if name in optimizer_dict:
            # pop unnecessary elements
            for ele in ["dim"]:
                value.pop(ele, None)

            value = optimizer_dict[name](vocs=info.data["vocs"], **value)
        else:
            raise ValueError(
                f"{value} not found, available values are "
                f"{optimizer_dict.keys()}"
            )
    return value

`visualize_model(**kwargs)`

displays the GP models

Source code in xopt/generators/bayesian/bayesian_generator.py

def visualize_model(self, **kwargs):
    """displays the GP models"""
    return visualize_generator_model(self, **kwargs)

Bases: BayesianGenerator

Source code in xopt/generators/bayesian/bayesian_exploration.py

class BayesianExplorationGenerator(BayesianGenerator):
    name = "bayesian_exploration"
    supports_batch_generation: bool = True

    __doc__ = "Bayesian exploration generator\n" + formatted_base_docstring()

    @field_validator("vocs", mode="after")
    def validate_vocs(cls, v, info: ValidationInfo):
        if v.n_objectives != 0:
            raise ValueError("this generator only supports observables")
        return v

    def _get_acquisition(self, model):
        sampler = self._get_sampler(model)
        qPV = qPosteriorVariance(
            model,
            sampler=sampler,
            objective=self._get_objective(),
        )

        return qPV

    def _get_objective(self):
        """return exploration objective, which only captures the output of the first
        model output"""

        return create_exploration_objective(self.vocs, self._tkwargs)

Bases: MultiObjectiveBayesianGenerator

Source code in xopt/generators/bayesian/mobo.py

class MOBOGenerator(MultiObjectiveBayesianGenerator):
    name = "mobo"
    __doc__ = """Implements Multi-Objective Bayesian Optimization using the Expected
            Hypervolume Improvement acquisition function"""

    def _get_objective(self):
        return create_mobo_objective(self.vocs, self._tkwargs)

    def get_acquisition(self, model):
        """
        Returns a function that can be used to evaluate the acquisition function
        """
        if model is None:
            raise ValueError("model cannot be None")

        # get base acquisition function
        acq = self._get_acquisition(model)

        # apply fixed features if specified in the generator
        if self.fixed_features is not None:
            # get input dim
            dim = len(self.model_input_names)
            columns = []
            values = []
            for name, value in self.fixed_features.items():
                columns += [self.model_input_names.index(name)]
                values += [value]

            acq = FixedFeatureAcquisitionFunction(
                acq_function=acq, d=dim, columns=columns, values=values
            )

        return acq

    def _get_acquisition(self, model):
        inputs = self.get_input_data(self.data)
        sampler = self._get_sampler(model)

        if self.log_transform_acquisition_function:
            acqclass = qLogNoisyExpectedHypervolumeImprovement
        else:
            acqclass = qNoisyExpectedHypervolumeImprovement

        acq = acqclass(
            model,
            X_baseline=inputs,
            constraints=self._get_constraint_callables(),
            ref_point=self.torch_reference_point,
            sampler=sampler,
            objective=self._get_objective(),
            cache_root=False,
            prune_baseline=True,
        )

        return acq

`get_acquisition(model)`

Returns a function that can be used to evaluate the acquisition function

Source code in xopt/generators/bayesian/mobo.py

def get_acquisition(self, model):
    """
    Returns a function that can be used to evaluate the acquisition function
    """
    if model is None:
        raise ValueError("model cannot be None")

    # get base acquisition function
    acq = self._get_acquisition(model)

    # apply fixed features if specified in the generator
    if self.fixed_features is not None:
        # get input dim
        dim = len(self.model_input_names)
        columns = []
        values = []
        for name, value in self.fixed_features.items():
            columns += [self.model_input_names.index(name)]
            values += [value]

        acq = FixedFeatureAcquisitionFunction(
            acq_function=acq, d=dim, columns=columns, values=values
        )

    return acq

Bases: BayesianGenerator

Source code in xopt/generators/bayesian/upper_confidence_bound.py

class UpperConfidenceBoundGenerator(BayesianGenerator):
    name = "upper_confidence_bound"
    beta: float = Field(2.0, description="Beta parameter for UCB optimization")
    supports_batch_generation: bool = True
    __doc__ = (
        """Bayesian optimization generator using Upper Confidence Bound

Attributes
----------
beta : float, default 2.0
    Beta parameter for UCB optimization, controlling the trade-off between exploration
    and exploitation. Higher values of beta prioritize exploration.

    """
        + formatted_base_docstring()
    )

    @field_validator("vocs")
    def validate_vocs_without_constraints(cls, v):
        if v.constraints:
            warnings.warn(
                f"Using {cls.__name__} with constraints may lead to numerical issues if the base acquisition "
                f"function has negative values."
            )
        return v

    @field_validator("log_transform_acquisition_function")
    def validate_log_transform_acquisition_function(cls, v):
        if v:
            raise ValueError(
                "Log transform cannot be applied to potentially negative UCB "
                "acquisition function."
            )

    def _get_acquisition(self, model):
        if self.n_candidates > 1:
            # MC sampling for generating multiple candidate points
            sampler = self._get_sampler(model)
            acq = qUpperConfidenceBound(
                model,
                sampler=sampler,
                objective=self._get_objective(),
                beta=self.beta,
            )
        else:
            # analytic acquisition function for single candidate generation
            weights = torch.zeros(self.vocs.n_outputs).to(**self._tkwargs)
            weights = set_botorch_weights(weights, self.vocs)
            posterior_transform = ScalarizedPosteriorTransform(weights)
            acq = UpperConfidenceBound(
                model, beta=self.beta, posterior_transform=posterior_transform
            )

        return acq

Bayesian generators

Attributes:

Methods:

model_input_names property

generate(n_candidates)

Parameters:

Returns:

Raises:

Notes:

get_acquisition(model)

Parameters:

Returns:

Raises:

get_input_data(data)

Parameters:

Returns:

Notes:

get_optimum()

propose_candidates(model, n_candidates=1)

train_model(data=None, update_internal=True)

validate_turbo_controller(value, info)

visualize_model(**kwargs)

get_acquisition(model)

`model_input_names` `property`

`generate(n_candidates)`

`get_acquisition(model)`

`get_input_data(data)`

`get_optimum()`

`propose_candidates(model, n_candidates=1)`

`train_model(data=None, update_internal=True)`

`validate_turbo_controller(value, info)`

`visualize_model(**kwargs)`

`get_acquisition(model)`