Classical statistical methods, such as p-values, are difficult for researchers to apply correctly. They for example do not allow drawing conclusions from a study early, or for extending a study... Show moreClassical statistical methods, such as p-values, are difficult for researchers to apply correctly. They for example do not allow drawing conclusions from a study early, or for extending a study with extra research groups that want to make their data available later. Sadly, in practice this often leads to faulty application of statistics and subsequent invalidity of experiment conclusions.Partly because of the above, recently, interest in safe, anytime-valid inference (SAVI) with e-values has emerged. This framework offers the same functionality as classical statistics, but also provides researchers with plenty of flexibility, for example through enabling early stopping and effect estimation at any time, extending a study in hindsight, and analyzing data located across multiple hospitals. In this thesis, this theory is further developed for performing SAVI in scenarios applicable to healthcare, specifically for several use-cases in psychiatry. It is explored how one could set up real-time psychiatry research in practice in an automated manner, combining text mining with network analysis techniques for data preparation and exploration and then confirming hypotheses with SAVI. Through this, the work in this thesis contributes to an environment where continuous learning from routinely collected healthcare data for better personalized recommendations is the new standard. Show less
In this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide... Show moreIn this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide a simple form of storing and sharing information about the world. As humans, we use rules every day, such as the physician that diagnoses someone with flu, represented by "if a person has either a fever or sore throat (among others), then she has the flu.". Even though an individual rule can only describe simple events, several aggregated rules can represent more complex scenarios, such as the complete set of diagnostic rules employed by a physician.The use of rules spans many fields in computer science, and in this dissertation, we focus on rule-based models for machine learning and data mining. Machine learning focuses on learning the model that best predicts future (previously unseen) events from historical data. Data mining aims to find interesting patterns in the available data.To answer our question, we use the Minimum Description Length (MDL) principle, which allows us to define the statistical optimality of rule-based models. Furthermore, we empirically show that this formulation is highly competitive for real-world problems. Show less
Hoogerwerf, M.A.; Koopman, J.P.R.; Janse, J.J.; Langenberg, M.C.C.; Schuijlenburg, R. van; Kruize, Y.C.M.; ... ; Roestenberg, M. 2021
Background. Controlled human hookworm infections could significantly contribute to the development of a hookworm vaccine. However, current models are hampered by low and unstable egg output,... Show moreBackground. Controlled human hookworm infections could significantly contribute to the development of a hookworm vaccine. However, current models are hampered by low and unstable egg output, reducing generalizability and increasing sample sizes. This study aims to investigate the safety, tolerability, and egg output of repeated exposure to hookworm larvae.Methods. Twenty-four healthy volunteers were randomized, double-blindly, to 1, 2, or 3 doses of 50 Necator americanus L3 larvae at 2-week intervals. Volunteers were monitored weekly and were treated with albendazole at week 20.Results. There was no association between larval dose and number or severity of adverse events. Geometric mean egg loads stabilized at 697, 1668, and 1914 eggs per gram feces for the 1 x 50L3, 2 x 501.3, and 3 x 50L3 group, respectively. Bayesian statistical modeling showed that egg count variability relative to the mean was reduced with a second infectious dose; however, the third dose did not increase egg load or decrease variability. We therefore suggest 2 x 50L3 as an improved challenge dose. Model-based simulations indicates increased frequency of stool sampling optimizes the power of hypothetical vaccine trials.Conclusions. Repeated infection with hookworm larvae increased egg counts to levels comparable to the field and reduced relative variability in egg output without aggravating adverse events. Show less
James, T.A.; Viti, S.; Yusef-Zadeh, F.; Royster, M.; Wardle, M. 2021
This thesis develops statistical methods for the analysis of high dimensional data: high dimensional networks reconstruction (Chapters 1-4), incorporation of prior information in networks... Show moreThis thesis develops statistical methods for the analysis of high dimensional data: high dimensional networks reconstruction (Chapters 1-4), incorporation of prior information in networks reconstruction (Chapters 2-4), incorporation of prior knowledge in genetic association studies (Chapter 3). Show less
Klinkenberg, D.; Hahne, S.J.M.; Woudenberg, T.; Wallinga, J. 2018
This thesis covers different problems concerning the evaluation of DNA evidence. It is mainly divided into two parts: the first regards the DIP-STR genotyping techniques. It addresses the... Show moreThis thesis covers different problems concerning the evaluation of DNA evidence. It is mainly divided into two parts: the first regards the DIP-STR genotyping techniques. It addresses the imperative need of developing a model to assign the likelihood ratio for DIP-STR results, and compares, from a statistical and forensic perspective, the advantages of this novel marker system compared to traditional marker systems, such as STR and Y-STR. The second part deals with several more general statistical aspects involved in the evaluation of DNA evidence. It aims at defining the differences between full Bayesian methods and ad hoc plug-in approximations, and at solving the rare type match problem for Y-STR data. The issues of the different reductions of data and of the levels of uncertainty involved in frequentist solutions are also discussed. These two parts are connected in the final project, by developing a Bayesian solution for the rare type match problem for DIP-STR marker system. Moreover, the initial model for DIP-STR data is improved in the light of the statistical discussion of the second part: any ad hoc solution is avoided to obtain a full Bayesian approach. Show less
Many statistical methods rely on models of reality in order to learn from data and to make predictions about future data. By necessity, these models usually do not match reality exactly, but are... Show moreMany statistical methods rely on models of reality in order to learn from data and to make predictions about future data. By necessity, these models usually do not match reality exactly, but are either wrong (none of the hypotheses in the model provides an accurate description of reality) or underspecified (the hypotheses in the model describe only part of the data). In this thesis, we discuss three scenarios involving models that are wrong or underspecified. In each case, we find that standard statistical methods may fail, sometimes dramatically, and present different methods that continue to perform well even if the models are wrong or underspecified. The first two of these scenarios involve regression problems and investigate AIC (Akaike's Information Criterion) and Bayesian statistics. The third scenario has the famous Monty Hall problem as a special case, and considers the question how we can update our belief about an unknown outcome given new evidence when the precise relation between outcome and evidence is unknown. Show less
To fail or not to fail __ Clinical trials in depression investigates the causes of the high failure rate of clinical trials in depression research. Apart from the difficulties in the search for new... Show moreTo fail or not to fail __ Clinical trials in depression investigates the causes of the high failure rate of clinical trials in depression research. Apart from the difficulties in the search for new antidepressants during drug discovery, faulty clinical trial designs hinder their evaluation during drug development. This thesis focuses on three important aspects of clinical trials in depression: clinical endpoints, data analysis and trial design-related factors. Show less
In this thesis, a collection of papers is put together dealing with various quantitative aspects of predictive modelling and archaeological prospection. Among the issues covered are the effects of... Show moreIn this thesis, a collection of papers is put together dealing with various quantitative aspects of predictive modelling and archaeological prospection. Among the issues covered are the effects of survey bias on the archaeological data used for predictive modelling, and the complexities of testing predictive models using both old and new archaeological data. Furthermore, an attempt is made to reconcile the worlds of expert judgment and quantitative analysis by means of multicriteria decision making techniques and Bayesian statistics. The thesis also offers some alternative approaches to predictive modelling, like using prehistoric land use reconstructions, and the integrating of social and cultural factors into the models. It also giv es an up to date review of the international and Dutch state of affairs in archaeological predictive modeling.. Show less