Statistical considerations guide the design and implementation of the experiments that scientists perform. For instance, in a clinical trial about a the efficacy of a treatment, its effect in a... Show moreStatistical considerations guide the design and implementation of the experiments that scientists perform. For instance, in a clinical trial about a the efficacy of a treatment, its effect in a certain minimum amount of patients has to be observed in order to make confident assertions about the efficacy of the treatment in the general population---observing one or two patients is, as an extreme example, not enough. Sound statistical methods are required to assess the quality of these general assertions. Currently, the most flexible methods allow experimentalists to analyze the data that they gather as it is collected, and to make decisions about gathering more data, stopping an experiment or starting new one based on their findings. In short, they allow experimentalists to ask "are we there yet?" as their experiments are ongoing. This is crucial in applications such as monitoring of clinical trials, online experimentation and quality control in engineering. The statistical methods that make this degree of flexibility possible are called, in the statistical community, anytime valid; they are the main focus of this dissertation. In this work, a number of mathematical results about optimal anytime-valid methods are shown. Group-invariant models, the analysis of time-to-event data, and prediction with expert advice are investigated. Show less
This dissertation is about Bayesian learning from data. How can humans and computers learn from data? This question is at the core of both statistics and — as its name already suggests — machine... Show moreThis dissertation is about Bayesian learning from data. How can humans and computers learn from data? This question is at the core of both statistics and — as its name already suggests — machine learning. Bayesian methods are widely used in these fields, yet they have certain limitations and problems of interpretation. In two chapters of this dissertation, we examine such a limitation, and overcome it by extending the standard Bayesian framework. In two other chapters, we discuss how different philosophical interpretations of Bayesianism affect mathematical definitions and theorems about Bayesian methods and their use in practise. While some researchers see the Bayesian framework as normative (all statistics should be based on Bayesian methods), in the two remaining chapters, we apply Bayesian methods in a pragmatic way: merely as tool for interesting learning problems (that could also have been addressed by non-Bayesian methods). Show less
This thesis is composed of papers on four topics: Bayesian theory for the sparse normal means problem, specifically for the horseshoe prior (Chapters 1-3), Bayesian theory for community detection ... Show moreThis thesis is composed of papers on four topics: Bayesian theory for the sparse normal means problem, specifically for the horseshoe prior (Chapters 1-3), Bayesian theory for community detection (Chapter 4), nested model selection (Chapter 5), and the application of competing risk methods in the presence of time-dependent clustering (Chapter 6). Show less
Iterson, M. van; Haagen, H.H.H.B.M. van; Goeman, J.J. 2012
In this thesis novel statistical methods are developed for the analysis of high dimensional microarray data. In short: Chapter 1 gives an overview of the most important research methods developed... Show moreIn this thesis novel statistical methods are developed for the analysis of high dimensional microarray data. In short: Chapter 1 gives an overview of the most important research methods developed so far. Chapter 2 describes a method for testing association of the expression of gene sets (pathways) with a patient level response variable, which can be continuous or two-valued. Chapter 3 extends the methodology of chapter 2 to survival as a response variable. Chapter 4 presents a goodness-of-fit test for the multinomial regression model, which can be used to extend the methodology of chapter 2 to multi-valued outcomes. Chapter 5 presents a general theoretical framework in for the tests of chapters 2-4 and derives optimality properties for these tests. Chapter 6 presents a method for predicting a response variable from high dimensional data, based on latent variables. Chapter 7 presents a visualization tool for improved presentation of scatterplots with many thousands of dots. Show less