Time-series forecasting through modelling sequences of temporally dependent observations has many industrial and scientific applications. While machine learning models have been widely used to... Show moreTime-series forecasting through modelling sequences of temporally dependent observations has many industrial and scientific applications. While machine learning models have been widely used to create time-series forecasting models, creating efficient and performant time-series forecasting models is a complex task for domain users. Automated Machine Learning (AutoML) is a growing field that aims to make the process of creating machine-learning models accessible for non-machine learning experts. This is achieved by optimising machine learning pipelines automatically. Time-series machine-learning pipelines include various specialised pre-processing steps that are not currently supported by existing AutoML systems. This dissertation investigates how AutoML can be extended to time-series data analysis problems such as time-series forecasting. Several challenges arise when developing specialised AutoML systems for time-series forecasting. For instance, advanced machine-learning pipelines that can extract time-series features and select well-suited machine-learning models need to be developed. Also, extra hyperparameters such as the window size, which shows how many historical data points are helpful, need to be optimised by the AutoML system. This dissertation addresses these issues. We provide a comprehensive overview of the AutoML research field, including hyperparameter optimisation techniques, neural architecture search, and existing AutoML systems. Next, we investigate the use of AutoML for short-term forecasting, single-step ahead time-series forecasting, and multi-step time-series forecasting with time-series features. Show less
This thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies... Show moreThis thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies delved into the application of proteochemometrics (PCM), a machine learning technique that can be used to find relations in protein-ligand bioactivity data and then predict using a virtual screen whether compounds that had never been tested on a particular protein, or set of proteins. With this, sets of compounds were suggested for experimental validation that were significant in a myriad of ways. Another study investigated the mutational patterns in cancer, applying a large dataset of mutation data and identifying several motifs in G protein-coupled receptors. The thesis also contains the work done on the Papyrus dataset, a large scale bioactivity dataset that focuses on standardising data for computational drug discovery and providing an out-of-the-box set that can be used in a variety of settings. Show less
AI-powered emotion recognition, typing with thoughts or eavesdropping virtual assistants: three non-fictional examples illustrate how AI may impact society. AI-related products and services... Show moreAI-powered emotion recognition, typing with thoughts or eavesdropping virtual assistants: three non-fictional examples illustrate how AI may impact society. AI-related products and services increasingly find their way into daily life. Are the EU's fundamental rights to privacy and data protection equipped to protect individuals effectively? In addressing this question, the dissertation concludes that no new legal framework is needed. Instead, adjustments are required. First, the extent of adjustments depends on the AI discipline. There is nothing like 'the AI'. AI covers various concepts, including the disciplines machine learning, natural language processing, computer vision, affective computing and automated reasoning. Second, the extent of adjustments depends on the type of legal problem: legal provisions are violated (type 1), cannot be enforced (type 2) or are not fit for purpose (type 3). Type 2 and 3 problems require either adjustments of current provisions or new judicial interpretations. Two instruments might be helpful for more effective legislation: rebuttable presumptions and reversal of proof. In some cases, the solution is technical, not legal. Research in AI should solve reasoning deficiencies in AI systems and their lack of common sense. Show less
Contrary to common belief, sign languages are distinct across different communities and cultures, evolving organically through interactions among deaf people, rather than being based on spoken... Show moreContrary to common belief, sign languages are distinct across different communities and cultures, evolving organically through interactions among deaf people, rather than being based on spoken languages. Each sign language has its own grammar, vocabulary, and cultural nuances, with variations even within a single country, showcasing the diverse communication methods within the deaf community. Deaf individuals often face encouragement to use spoken language techniques like lipreading or text communication, highlighting a bias towards spoken languages. This is compounded by the lack of sign languages in linguistic technologies, emphasizing the need for more inclusive research and development. This dissertation aims to address this gap using machine and deep learning to improve sign language processing and recognition. It covers six chapters, introducing methods for video-based sign annotation, webcam-based sign language dictionary search, and ranking systems for sign suggestions. It also explores tools for visualizing and comparing sign language variation, contributing valuable resources to linguistic research. Show less
This thesis investigates the contribution of quantum computers to machine learning, a field called Quantum Machine Learning. Quantum Machine Learning promises innovative perspectives and methods... Show moreThis thesis investigates the contribution of quantum computers to machine learning, a field called Quantum Machine Learning. Quantum Machine Learning promises innovative perspectives and methods for solving complex problems in machine learning, leveraging the unique capabilities of quantum computers. These computers differ fundamentally from classical computers by exploiting certain quantum mechanical phenomena. The thesis explores various proposals within quantum machine learning, such as the application of quantum algorithms in topological data analysis. With respect to topological data analysis, results demonstrate that quantum algorithms solve problems considered inefficient in classical settings. The thesis also explores structural risk minimization in quantum machine learning models, identifying crucial design choices for new quantum machine learning models. Additionally, it introduces quantum models in reinforcement learning, which deliver comparable performance to traditional models and are superior in certain scenarios. The final part identifies learning tasks in computational learning theory where quantum learning algorithms have exponential advantages. In summary, this thesis contributes to understanding how quantum computers can address complex machine learning problems, from topological data analysis to reinforcement learning and computational learning tasks. Show less
The research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency,... Show moreThe research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency, which is evaluated based on donors' hemoglobin and ferritin levels. If either of these levels are inadequate, donors are deferred from donation. Deferral due to low hemoglobin levels occurs on-site, meaning that donors have already traveled to the blood bank and then have to return home without donating, which is demotivating for the donor and inefficient for the blood bank. A large part of this dissertation therefore has the objective to develop a prediction model for donors' hemoglobin levels, based on historical measurements and donor characteristics.The prediction model that was developed reduces the deferral rate by approximately 60\% (from 3\% to 1\% for women, and from 1\% to 0.4\% for men), showing the potential of using data to enhance blood bank policy efficiency. Additionally, the model predictions were made explainable, providing the blood bank with insights into why specific predictions are made. These insights increase our understanding of the relationships between donor characteristics and hemoglobin levels. If this prediction model would be implemented in practice, the explanations could also be shared with the donor to help them understand why they are (not) invited to donate, which could also contribute to donor satisfaction and retention.In a collaborative effort with blood banks in Australia, Belgium, Finland and South Africa, the same prediction model was applied on data from each blood bank. Despite differences in blood bank policies and donor demographics, the models found similar associations with the predictor variables in all countries. Differences in performance could mostly be attributed to differences in deferral rates, with blood banks with higher deferral rates obtaining higher model accuracy.Beyond hemoglobin prediction models, additional research questions are explored. One study aims to identify determinants of ferritin levels in donors through repeated measurements, and linking these to environmental variables. Another study involves modeling the pharmacokinetics of antibodies in COVID-19 recovered donors, and finding relationships between patient characteristics, symptoms, and antibody levels over time.In summary, the research in this dissertation shows the potential within the wealth of data collected by blood banks. The proposed data-driven donation strategies not only decrease deferral rates but also increase donor retention and understanding. This comprehensive approach allows Sanquin to provide more personalised feedback to donors regarding their iron status, ultimately optimising the blood donation process and contributing to the overall efficacy of blood banking systems. Show less
The learning of software design is known to be a difficult and challenging task for students. This dissertation studies different didactic approaches for learning software design to improve the way... Show moreThe learning of software design is known to be a difficult and challenging task for students. This dissertation studies different didactic approaches for learning software design to improve the way we teach students software design. The research in the dissertation questions whether we can assess software design skills, what guidance is needed for the improvement of students’ understanding of software design and how to motivate and engage students for learning software design. The research explores the following: an instrument for measuring software design skills based on design principles, the gamification of learning software design, revealing students’ software design strategies, the use of peer-reflection for uncovering the difficulties students have during software design tasks, the use of teaching assistants as bridge between the lecturer and the students, the automation of grading software designs with machine learning, guiding feedback by a pedagogical agent and a workshop for engaging students into the process of software development. The research contributes to the future education of software design. Show less
Inverse problems are problems where we want to estimate the values of certain parameters of a system given observations of the system. Such problems occur in several areas of science and... Show moreInverse problems are problems where we want to estimate the values of certain parameters of a system given observations of the system. Such problems occur in several areas of science and engineering. Inverse problems are often ill-posed, which means that the observations of the system do not uniquely define the parameters we seek to estimate, or that the solution is highly sensitive to small changes in the observation. In order to solve such problems, therefore, we need to make use of additional knowledge about the system at hand. One such prior information is given by the notion of sparsity. Sparsity refers to the knowledge that the solution to the inverse problem can be expressed as a combination of a few terms. The sparsity of a solution can be controlled explicitly or implicitly. An explicit way to induce sparsity is to minimize the number of non-zero terms in the solution. Implicit use of sparsity can be made, for e.g., by making adjustments to the algorithm used to arrive at the solution.In this thesis we studied various inverse problems that arise in different application areas, such as tomographic imaging and equation learning for biology, and showed how ideas of sparsity can be used in each case to design effective algorithms to solve such problems. Show less
The societal burden of spinal conditions is vast and continues to grow with the in- creasing prevalence of patients with spinal degenerative disease, spinal metasta- ses, and spinal infections.... Show moreThe societal burden of spinal conditions is vast and continues to grow with the in- creasing prevalence of patients with spinal degenerative disease, spinal metasta- ses, and spinal infections. Recent application of artificial intelligence in healthcare have shown great promise and similar extensions in spine surgery may improve decision-making. The purpose of this thesis was to examine the utility of predictive analytics and natural language processing in spine surgery. Show less
The aim of this thesis is to determine diagnostic performance of machine learning in differentiating between atypical cartilaginous tumor (ACT) and high-grade chondrosarcoma (CS) based on radiomic... Show moreThe aim of this thesis is to determine diagnostic performance of machine learning in differentiating between atypical cartilaginous tumor (ACT) and high-grade chondrosarcoma (CS) based on radiomic features derived from magnetic resonance imaging (MRI) and computed tomography (CT). In chapter 2, the concept of radiomics of musculoskeletal sarcomas is introduced and a systematic review on radiomic feature reproducibility and validation strategies is conducted. In chapter 3, a preliminary study is performed to investigate the performance of MRI radiomics-based machine learning in discriminating ACT from high-grade CS, using a single-center cohort, in comparison with an expert radiologist. In chapter 4, the influence of interobserver segmentation variability on the reproducibility of CT and MRI radiomic features of cartilaginous bone tumors is assessed. In chapter 5, the performance of CT radiomics-based machine learning in discriminating ACT from high-grade CS of long bones is determined and validated using independent data from a multicenter cohort, compared to an expert radiologist. In chapter 6, the performance of MRI radiomics-based machine learning in differentiating between ACT and grade II CS of long bones is determined and validated using independent data from a multicenter cohort, in comparison with an expert radiologist. Finally, in chapter 7, the main results and implications of this thesis are summarized and discussed. Show less
Despite improved surgical and adjuvant treatment options, malignant brain tumors remain non-curable to date. The thin line between treatment effectiveness and patient harms underpins the importance... Show moreDespite improved surgical and adjuvant treatment options, malignant brain tumors remain non-curable to date. The thin line between treatment effectiveness and patient harms underpins the importance of tailoring clinical management to the individual brain tumor patient. Over the past decades, the volume and complexity of clinically-derived patient data (i.e., imaging, genomics, free-text etc.) is increasing exponentially. Machine learning provides a vast range of algorithms that can learn from this data and guide clinical decision-making by providing accurate patient-level predictions. The current thesis describes several studies along the continuum of the machine learning spectrum as it applies to neurosurgical oncology. Part I investigates postoperative complications and risk factors in patients operated for a primary malignant brain tumor. Part II describes de development of a model for the prediction of individual-patient survival in glioblastoma patients. Part III encompasses the development of a natural language processing framework for automated medical text analysis. Machine learning algorithms should be considered as an extension to statistical approaches and exist along a continuum determined by how much is specified by humans and how much is learnt by the machine. Although machine learning algorithms can produce highly accurate predictions based on high-dimensional data, clinicians and researchers should interpret the clinical implications of these predictions on case-by-case basis. Show less
Image registration is the process of aligning images by finding the spatial relation between the images. Assuming two images called fixed and moving images are taken at different time, different... Show moreImage registration is the process of aligning images by finding the spatial relation between the images. Assuming two images called fixed and moving images are taken at different time, different spatial location, or via a different imaging technique, the aim of image registration is to find an optimal transformation that aligns the fixed and the moving images. Performing an automatic fast image registration with less manual finetuning can speed up numerous medical image processing procedures. In addition, an automatic quality assessment of registration can speed up this time-consuming task. In this thesis, we developed a fast learning-based image registration technique called RegNet.Predicting registration error can be useful for evaluation of registration procedures, which is important for the adoption of registration techniques in the clinic. In addition, quantitative error prediction can be helpful in improving the registration quality. In this thesis, we proposed two quality assessment mechanisms using random forests (RF) and convolutional long short term memory (ConvLSTM), in which the latter performs faster and more accurate. Show less
In this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide... Show moreIn this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide a simple form of storing and sharing information about the world. As humans, we use rules every day, such as the physician that diagnoses someone with flu, represented by "if a person has either a fever or sore throat (among others), then she has the flu.". Even though an individual rule can only describe simple events, several aggregated rules can represent more complex scenarios, such as the complete set of diagnostic rules employed by a physician.The use of rules spans many fields in computer science, and in this dissertation, we focus on rule-based models for machine learning and data mining. Machine learning focuses on learning the model that best predicts future (previously unseen) events from historical data. Data mining aims to find interesting patterns in the available data.To answer our question, we use the Minimum Description Length (MDL) principle, which allows us to define the statistical optimality of rule-based models. Furthermore, we empirically show that this formulation is highly competitive for real-world problems. Show less
Particles are omnipresent in biopharmaceutical products. In protein-based therapeutics such particles are generally associated with impurities, either derived from the drug product itself (e.g.... Show moreParticles are omnipresent in biopharmaceutical products. In protein-based therapeutics such particles are generally associated with impurities, either derived from the drug product itself (e.g. protein aggregates), or from extrinsic contaminations (e.g. cellulose fibers). These impurities can affect product stability, as well as cause adverse effects once introduced into the human body. Particulate impurities are present over a wide range of sizes (from nanometers to millimeters) making them difficult to characterize by using a single method.Novel drug products may also contain particles that act as the active pharmaceutical ingredient (e.g., living cells) or a drug delivery vehicle (e.g., lipid nanoparticles). Unwanted immunotoxicity and inconsistent in vivo functionality can result from particle instability and aggregate formation. Therefore, the efficacy and safety of these therapeutics is dependent on the particle composition, quantity and size distribution.Consequently, well-established methods are required to quantify and characterize particles in the submicron- and micron-size ranges. In this thesis, we developed new approaches which allow for comprehensive characterization of the particle populations present in biopharmaceutical products, both as impurities or as API. Furthermore, the performed work focused on comparing different particle characterization techniques to allow a better understanding of the limitations and strengths of each method applied. Show less
The ongoing increase in antimicrobial resistance combined with the low discovery of novel antibiotics is a serious threat to our health care. Genome mining has given new potential to the field of... Show moreThe ongoing increase in antimicrobial resistance combined with the low discovery of novel antibiotics is a serious threat to our health care. Genome mining has given new potential to the field of natural product discovery, as thousands of biosynthetic gene clusters (BGCs) are discovered for which the natural product is not known.Ribosomally synthesized and post-translationally modified peptides (RiPPs) represent a highly diverse class of natural products. The large number of different modifications that can be applied to a RiPP results in a large variety of chemical structures, but also stems from a large genetic variety in BGCs. As a result, no single method can effectively mine for all RiPP BGCs, making it an interesting source for new molecules.In this thesis, new methods are explored to mine genomes for the BGCs of novel RiPP variants, with a focus on discovering RiPPs that have new modifications. RRE-Finder is a new tool for the detection of RiPP Recognition Elements, domains that are often found in RiPP BGCs. DecRiPPter is another tool that employs machine learning models to discover new RiPP precursor genes encoded in the genomes. Both tools can be used to prioritize novel RiPP BGCs. Two candidate BGCs are characterized, one of which could be shown to specify a new RiPP, validating the approach. Show less
Inflammatory Bowel Diseases (IBD) such as Crohn’s disease (CD) and ulcerative colitis (UC) are chronic immunological digestive diseases with a progressive character and associated with significant... Show moreInflammatory Bowel Diseases (IBD) such as Crohn’s disease (CD) and ulcerative colitis (UC) are chronic immunological digestive diseases with a progressive character and associated with significant healthcare costs. Different solutions have been proposed such as innovation in care monitoring or implementation of electronic health (eHealth). IBD is one of many chronic diseases that could benefit from eHealth, adding smartphone applications to the toolbox for care management has the potential improve disease understanding, enhance medication adherence, improve patient-physician communications, and for earlier interventions by medical professionals when problems arise. Furthermore, the accessibility to Big Data and increased computational resources have paved the way for Artificial Intelligence (AI) to provide potential solutions for the management of prototypical complex diseases with advanced heterogeneity and alternating disease states, like IBD. In this thesis we assessed the current economic and psychosocial impact of IBD by assessing its effect on indirect costs, productivity and caregiving. Furthermore, we observed if we can proactively identify IBD patients’ needs using eHealth and Artificial Intelligence. Lastly, we analyze the impact of monitoring IBD patients using eHealth interventions in order to facilitate the delivery of high-value care. Show less
This thesis describes the importance of being able to control the selectivity of potential drug candidates. It explains how computational models are employed to predict and rationalize compound... Show moreThis thesis describes the importance of being able to control the selectivity of potential drug candidates. It explains how computational models are employed to predict and rationalize compound-protein binding (affinity) and therewith, selectivity of compounds. Moreover, it shows that selectivity can purposely be tuned to target either a single protein or an entire panel of proteins. The challenges of selectivity modeling are addressed based on case studies in the sodium-dependent glucose co-transporters, G protein-coupled receptors, and kinases. Show less
Real-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one... Show moreReal-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one needs expressive models, parsimonious enough to gain insight. Uncertainty is often fundamental for process characterization, e.g., because we sometimes can observe phenomena only partially. This makes probabilistic graphical models a suitable framework for process analysis. In this thesis, new probabilistic graphical models that offer the right balance between expressiveness and interpretability are proposed, inspired by the analysis of complex, real-world problems. We first investigate processes by introducing latent variables, which capture abstract notions from observable data (e.g., intelligence, health status). Such models often provide more accurate descriptions of processes. In medicine, such models can also reveal insight on patient treatment, such as predictive symptoms. The second viewpoint looks at processes by identifying time points in the data where the relationships between observable variables change. This provides an alternative characterization of process change. Finally, we try to better understand processes by identifying subgroups of data that deviate from the whole dataset, e.g., process workflows whose event dynamics differ from the general workflow. Show less
People diagnosed with Borderline Personality Disorder (BPD) continuously struggle with knowing who they are and maintaining relationships. Fortunately, psychotherapies for BPD have proven effective... Show morePeople diagnosed with Borderline Personality Disorder (BPD) continuously struggle with knowing who they are and maintaining relationships. Fortunately, psychotherapies for BPD have proven effective. However, not everyone benefits from treatment with particular challenges remaining in social relations and finding meaning in life. Therefore, it is important to understand how we can better support people with BPD.We know that identity disturbances relate to interpersonal difficulties but we do not really understand how. Therefore, we investigated how interactions with others are influenced by how people see themselves, in the general population and in people diagnosed with BPD. To this end, we studied brain activation and the role of childhood trauma and low self-esteem. In addition, we investigated whether self-views can be strengthened using positive memories.We found that the way people respond to critiques and compliments relates to how positive or negative they see themselves. Moreover, vivid positive memories can benefit mood and self-esteem. However, people with BPD seem to not sufficiently distance themselves from critiques nor engage in positive memories and compliments. Finding the right balance between distance from critiques and engagement with a positive self-image may break the cycle of negative self-knowledge and contribute to better social interactions. Show less
Huntington’s disease (HD) is a progressive autosomal dominant neurodegenerative disorder with a broad spectrum of clinical features. The disease is caused by a mutation in the Huntingtin gene (HTT... Show moreHuntington’s disease (HD) is a progressive autosomal dominant neurodegenerative disorder with a broad spectrum of clinical features. The disease is caused by a mutation in the Huntingtin gene (HTT) on the short arm of chromosome 4. In September 2015, the first-in-human study looking into the safety of an intrathecally administered antisense oligonucleotide therapy to reduce mutant HTT (mHTT) protein was launched in HD patients, where the drug proved to be safe and the intended mHTT lowering was demonstrated. The aim of this thesis is to find biomarkers corresponding with disease state and measuring progression in different stages of HD, which in turn can be used as suitable objective surrogate clinical trial endpoints. We put special emphasis on longitudinal study designs, as these provide the most useful clinical progression and parameter change associations. Although previous neuroimaging studies have shown potential markers, findings remain inconsistent or lacking association with disease state. As such, further exploration of neuroimaging techniques is of great relevance. Using different approaches to evaluate the potential usefulness of specific markers, we demonstrate biomarkers that may assist in the objective assessment of a potential disease-modifying intervention. Show less