In benchmarking studies with simulated data sets in which two or more statistical methods are compared, over and above the search of a universally winning method, one may investigate how the... Show moreIn benchmarking studies with simulated data sets in which two or more statistical methods are compared, over and above the search of a universally winning method, one may investigate how the winning method may vary over patterns of characteristics of the data or the data-generating mechanism. Interestingly, this problem bears strong formal similarities to the problem of looking for optimal treatment regimes in biostatistics when two or more treatment alternatives are available for the same medical problem or disease. It is outlined how optimal data-analytic regimes, that is to say, rules for optimally calling in statistical methods, can be derived from benchmarking studies with simulated data by means of supervised classification methods (e.g., classification trees). The approach is illustrated by means of analyses of data from a benchmarking study to compare two different algorithms for the estimation of a two-mode additive clustering model. Show less
Feature Network Models (FNM) are graphical structures that represent proximity data in a discrete space with the use of features. A statistical inference theory is introduced, based on the... Show moreFeature Network Models (FNM) are graphical structures that represent proximity data in a discrete space with the use of features. A statistical inference theory is introduced, based on the additivity properties of networks and the linear regression framework. Considering features as predictor variables leads in a natural way to a univariate multiple regression problem with positivity restrictions on the parameters, which represent edge lengths in the network representation. Theoretical standard errors and confidence intervals are obtained for the parameters and their performance is evaluated by Monte Carlo simulation. When the feature structure is not known in advance, a strategy is proposed to select an adequate subset of features that takes into account a good compromise between model fit and model complexity using Gray codes and the positive lasso. The same statistical inference theory also holds for additive trees that are special cases of FNM. Standard errors and confidence intervals, model tests and prediction error are obtained for the estimates of the branch lengths of additive trees. The dissertation concludes by demonstrating that there exists a universal network representation of city-block models based on key elements of the network representation consisting of betweenness, metric segmental additivity and internal nodes. Show less