The problem of approximating the Pareto front of a multiobjective optimization problem can be reformulated as the problem of finding a set that maximizes the hypervolume indicator. This paper...Show moreThe problem of approximating the Pareto front of a multiobjective optimization problem can be reformulated as the problem of finding a set that maximizes the hypervolume indicator. This paper establishes the analytical expression of the Hessian matrix of the mapping from a (fixed size) collection of n points in the d-dimensional decision space (or m dimensional objective space) to the scalar hypervolume indicator value. To define the Hessian matrix, the input set is vectorized, and the matrix is derived by analytical differentiation of the mapping from a vectorized set to the hypervolume indicator. The Hessian matrix plays a crucial role in second-order methods, such as the Newton-Raphson optimization method, and it can be used for the verification of local optimal sets. So far, the full analytical expression was only established and analyzed for the relatively simple bi-objective case. This paper will derive the full expression for arbitrary dimensions (m ≥ 2 objective functions). For the practically important three-dimensional case, we also provide an asymptotically efficient algorithm with time complexity in O(n log n) for the exact computation of the Hessian Matrix’ non-zero entries. We establish a sharp bound of 12m−6 for the number of non-zero entries. Also, for the general m-dimensional case, a compact recursive analytical expression is established, and its algorithmic implementation is discussed. Also, for the general case, some sparsity results can be established; these results are implied by the recursive expression. To validate and illustrate the analytically derived algorithms and results, we provide a few numerical examples using Python and Mathematica implementations. Open-source implementations of the algorithms and testing data are made available as a supplement to this paper.Show less
Real-world optimization scenarios under uncertainty and noise are typically handled with robust optimization techniques, which re-formulate the original optimization problem into a robust... Show moreReal-world optimization scenarios under uncertainty and noise are typically handled with robust optimization techniques, which re-formulate the original optimization problem into a robust counterpart, e.g., by taking an average of the function values over different perturbations to a specific input. Solving the robust counterpart instead of the original problem can significantly increase the associated computational cost, which is often overlooked in the literature to the best of our knowledge. Such an extra cost brought by robust optimization might depend on the problem landscape, the dimensionality, the severity of the uncertainty, and the formulation of the robust counterpart.This paper targets an empirical approach that evaluates and compares the computational cost brought by different robustness formulations in Kriging-based optimization on a wide combination (300 test cases) of problems, uncertainty levels, and dimensions. We mainly focus on the CPU time taken to find robust solutions, and choose five commonly-applied robustness formulations: `"mini-max robustness'', "mini-max regret robustness'', "expectation-based robustness'', ``dispersion-based robustness'', and "composite robustness'' respectively. We assess the empirical performance of these robustness formulations in terms of a fixed budget and a fixed target analysis, from which we find that "mini-max robustness'' is the most practical formulation w.r.t.~the associated computational cost. Show less
Introducing new algorithmic ideas is a key part of the continuous improvement of existing optimization algorithms. However, when introducing a new component into an existing algorithm, assessing... Show moreIntroducing new algorithmic ideas is a key part of the continuous improvement of existing optimization algorithms. However, when introducing a new component into an existing algorithm, assessing its potential benefits is a challenging task. Often, the component is added to a default implementation of the underlying algorithm and compared against a limited set of other variants. This assessment ignores any potential interplay with other algorithmic ideas that share the same base algorithm, which is critical in understanding the exact contributions being made. We explore a more extensive procedure, which uses hyperparameter tuning as a means of assessing the benefits of new algorithmic components. This allows for a more robust analysis by not only focusing on the impact on performance, but also by investigating how this performance is achieved. We implement our suggestion in the context of the Modular CMA-ES framework, which was redesigned and extended to include some new modules and several new options for existing modules, mostly focused on the step-size adaptation method. Our analysis highlights the differences between these new modules, and identifies the situations in which they have the largest contribution. Show less
Structural Bias (SB) is an important type of algorithmic deficiency within iterative optimisation heuristics. However, methods for detecting structural bias have not yet fully matured, and recent... Show moreStructural Bias (SB) is an important type of algorithmic deficiency within iterative optimisation heuristics. However, methods for detecting structural bias have not yet fully matured, and recent studies have uncovered many interesting questions. One of these is the question of how structural bias can be related to anisotropy. Intuitively, an algorithm that is not isotropic would be considered structurally biased. However, there have been cases where algorithms appear to only show SB in some dimensions. As such, we investigate whether these algorithms actually exhibit anisotropy, and how this impacts the detection of SB. We find that anisotropy is very rare, and even in cases where it is present, there are clear tests for SB which do not rely on any assumptions of isotropy, so we can safely expand the suite of SB tests to encompass these kinds of deficiencies not found by the original tests.We propose several additional testing procedures for SB detection and aim to motivate further research into the creation of a robust portfolio of tests. This is crucial since no single test will be able to work effectively with all types of SB we identify. Show less
A new acquisition function is proposed for solving robust optimization problems via Bayesian Optimization. The proposed acquisition function reflects the need for the robust instead of the nominal... Show moreA new acquisition function is proposed for solving robust optimization problems via Bayesian Optimization. The proposed acquisition function reflects the need for the robust instead of the nominal optimum, and is based on the intuition of utilizing the higher moments of the improvement. The efficacy of Bayesian Optimization based on this acquisition function is demonstrated on four test problems, each affected by three different levels of noise. Our findings suggest the promising nature of the proposed acquisition function as it yields a better robust optimal value of the function in 6/12 test scenarios when compared with the baseline. Show less
The imbalanced classification problem is very relevant in both academic and industrial applications. The task of finding the best machine learning model to use for a specific imbalanced dataset is... Show moreThe imbalanced classification problem is very relevant in both academic and industrial applications. The task of finding the best machine learning model to use for a specific imbalanced dataset is complicated due to a large number of existing algorithms, each with its own hyperparameters. The Combined Algorithm Selection and Hyperparameter optimization (CASH) has been introduced to tackle both aspects at the same time. However, CASH has not been studied in detail in the class imbalance domain, where the best combination of resampling technique and classification algorithm is searched for, together with their optimized hyperparameters. Thus, we target the CASH problem for imbalanced classification. We experiment with a search space of 5 classification algorithms, 21 resampling approaches and 64 relevant hyperparameters in total. Moreover, we investigate performance of 2 well-known optimization approaches: Random search and Tree Parzen Estimators approach which is a kind of Bayesian optimization. For comparison, we also perform grid search on all combinations of resampling techniques and classification algorithms with their default hyperparameters. Our experimental results show that a Bayesian optimization approach outperforms the other approaches for CASH in this application domain. Show less
Neural Architecture Search (NAS) aims to optimize deep neural networks' architecture for better accuracy or smaller computational cost and has recently gained more research interests. Despite... Show moreNeural Architecture Search (NAS) aims to optimize deep neural networks' architecture for better accuracy or smaller computational cost and has recently gained more research interests. Despite various successful approaches proposed to solve the NAS task, the landscape of it, along with its properties, are rarely investigated. In this paper, we argue for the necessity of studying the landscape property thereof and propose to use the so-called Exploratory Landscape Analysis (ELA) techniques for this goal. Taking a broad set of designs of the deep convolutional network, we conduct extensive experimentation to obtain their performance. Based on our analysis of the experimental results, we observed high similarities between well-performing architecture designs, which is then used to significantly narrow the search space to improve the efficiency of any NAS algorithm. Moreover, we extract the ELA features over the NAS landscapes on three common image classification data sets, MNIST, Fashion, and CIFAR-10, which shows that the NAS landscape can be distinguished for those three data sets. Also, when comparing to the ELA features of the well-known Black-Box optimization Benchmarking (BBOB) problem set, we found out that the NAS landscapes surprisingly form a new problem class on its own, which can be separated from all 24 BBOB problems. Given this interesting observation, we, therefore, state the importance of further investigation on selecting an efficient optimizer for the NAS landscape as well as the necessity of augmenting the current benchmark problem set. Show less
Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a deep neural network, that is... Show moreRecently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a deep neural network, that is trained using self-play. The unified deep neural network has a policy-head and a value-head, and during training, the optimizer minimizes the sum of policy loss and value loss. However, it is not clear if and under which circumstances other formulations of the loss function are better. Therefore, we perform experiments with different combinations of these two minimization targets. In contrast to many recent papers who adopt single run experiments and use the whole history Elo ratings from self-play, we propose to use repeated runs. The results show that this method can describe the training performance quite well within each training run, but there is a high self-play bias, such that it is incomparable among different training runs. Therefore, inspired by the AlphaGo series papers, a self-play bias avoiding performance assessment, final best player Elo rating, is adopted to evaluate the playing strength in a direct competition between the evolved players. For relatively small games, based on this new evaluation method, surprisingly, minimizing only value loss achieves the strongest playing strength in the final best players' round-robin tournament. These results indicate that more research is needed into the relative importance of value function and policy function in small games. Show less