This thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies... Show moreThis thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies delved into the application of proteochemometrics (PCM), a machine learning technique that can be used to find relations in protein-ligand bioactivity data and then predict using a virtual screen whether compounds that had never been tested on a particular protein, or set of proteins. With this, sets of compounds were suggested for experimental validation that were significant in a myriad of ways. Another study investigated the mutational patterns in cancer, applying a large dataset of mutation data and identifying several motifs in G protein-coupled receptors. The thesis also contains the work done on the Papyrus dataset, a large scale bioactivity dataset that focuses on standardising data for computational drug discovery and providing an out-of-the-box set that can be used in a variety of settings. Show less
AI-powered emotion recognition, typing with thoughts or eavesdropping virtual assistants: three non-fictional examples illustrate how AI may impact society. AI-related products and services... Show moreAI-powered emotion recognition, typing with thoughts or eavesdropping virtual assistants: three non-fictional examples illustrate how AI may impact society. AI-related products and services increasingly find their way into daily life. Are the EU's fundamental rights to privacy and data protection equipped to protect individuals effectively? In addressing this question, the dissertation concludes that no new legal framework is needed. Instead, adjustments are required. First, the extent of adjustments depends on the AI discipline. There is nothing like 'the AI'. AI covers various concepts, including the disciplines machine learning, natural language processing, computer vision, affective computing and automated reasoning. Second, the extent of adjustments depends on the type of legal problem: legal provisions are violated (type 1), cannot be enforced (type 2) or are not fit for purpose (type 3). Type 2 and 3 problems require either adjustments of current provisions or new judicial interpretations. Two instruments might be helpful for more effective legislation: rebuttable presumptions and reversal of proof. In some cases, the solution is technical, not legal. Research in AI should solve reasoning deficiencies in AI systems and their lack of common sense. Show less
Contrary to common belief, sign languages are distinct across different communities and cultures, evolving organically through interactions among deaf people, rather than being based on spoken... Show moreContrary to common belief, sign languages are distinct across different communities and cultures, evolving organically through interactions among deaf people, rather than being based on spoken languages. Each sign language has its own grammar, vocabulary, and cultural nuances, with variations even within a single country, showcasing the diverse communication methods within the deaf community. Deaf individuals often face encouragement to use spoken language techniques like lipreading or text communication, highlighting a bias towards spoken languages. This is compounded by the lack of sign languages in linguistic technologies, emphasizing the need for more inclusive research and development. This dissertation aims to address this gap using machine and deep learning to improve sign language processing and recognition. It covers six chapters, introducing methods for video-based sign annotation, webcam-based sign language dictionary search, and ranking systems for sign suggestions. It also explores tools for visualizing and comparing sign language variation, contributing valuable resources to linguistic research. Show less
This thesis investigates the contribution of quantum computers to machine learning, a field called Quantum Machine Learning. Quantum Machine Learning promises innovative perspectives and methods... Show moreThis thesis investigates the contribution of quantum computers to machine learning, a field called Quantum Machine Learning. Quantum Machine Learning promises innovative perspectives and methods for solving complex problems in machine learning, leveraging the unique capabilities of quantum computers. These computers differ fundamentally from classical computers by exploiting certain quantum mechanical phenomena. The thesis explores various proposals within quantum machine learning, such as the application of quantum algorithms in topological data analysis. With respect to topological data analysis, results demonstrate that quantum algorithms solve problems considered inefficient in classical settings. The thesis also explores structural risk minimization in quantum machine learning models, identifying crucial design choices for new quantum machine learning models. Additionally, it introduces quantum models in reinforcement learning, which deliver comparable performance to traditional models and are superior in certain scenarios. The final part identifies learning tasks in computational learning theory where quantum learning algorithms have exponential advantages. In summary, this thesis contributes to understanding how quantum computers can address complex machine learning problems, from topological data analysis to reinforcement learning and computational learning tasks. Show less
The research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency,... Show moreThe research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency, which is evaluated based on donors' hemoglobin and ferritin levels. If either of these levels are inadequate, donors are deferred from donation. Deferral due to low hemoglobin levels occurs on-site, meaning that donors have already traveled to the blood bank and then have to return home without donating, which is demotivating for the donor and inefficient for the blood bank. A large part of this dissertation therefore has the objective to develop a prediction model for donors' hemoglobin levels, based on historical measurements and donor characteristics.The prediction model that was developed reduces the deferral rate by approximately 60\% (from 3\% to 1\% for women, and from 1\% to 0.4\% for men), showing the potential of using data to enhance blood bank policy efficiency. Additionally, the model predictions were made explainable, providing the blood bank with insights into why specific predictions are made. These insights increase our understanding of the relationships between donor characteristics and hemoglobin levels. If this prediction model would be implemented in practice, the explanations could also be shared with the donor to help them understand why they are (not) invited to donate, which could also contribute to donor satisfaction and retention.In a collaborative effort with blood banks in Australia, Belgium, Finland and South Africa, the same prediction model was applied on data from each blood bank. Despite differences in blood bank policies and donor demographics, the models found similar associations with the predictor variables in all countries. Differences in performance could mostly be attributed to differences in deferral rates, with blood banks with higher deferral rates obtaining higher model accuracy.Beyond hemoglobin prediction models, additional research questions are explored. One study aims to identify determinants of ferritin levels in donors through repeated measurements, and linking these to environmental variables. Another study involves modeling the pharmacokinetics of antibodies in COVID-19 recovered donors, and finding relationships between patient characteristics, symptoms, and antibody levels over time.In summary, the research in this dissertation shows the potential within the wealth of data collected by blood banks. The proposed data-driven donation strategies not only decrease deferral rates but also increase donor retention and understanding. This comprehensive approach allows Sanquin to provide more personalised feedback to donors regarding their iron status, ultimately optimising the blood donation process and contributing to the overall efficacy of blood banking systems. Show less
Bacteriophages, or phages for short, are the most abundant biological entity in nature. They shape bacterial communities and are a major driving force in bacterial evolution. Their ubiquitous... Show moreBacteriophages, or phages for short, are the most abundant biological entity in nature. They shape bacterial communities and are a major driving force in bacterial evolution. Their ubiquitous nature and their potential use in medical and industrial applications make them attractive targets for fundamental and applied scientific studies. Understanding their structure and function at the molecular level is essential for understanding phage life cycles. In this thesis, I applied different cryo-EM techniques combined with advanced image processing and artificial intelligence methods to gain insight into structure and function of two bacteriophages. In both cases, these phages contain flexible elements which are essential for the infection process. While biologically highly interesting, these flexible components are especially challenging for structural studies. With the advances in computer technology and electron microscopy, researchers can now use various research methods to study different proteins and the structure and function of biological macromolecular machines. The studies presented in this thesis provide valuable insights into phages with flexible components, and provide a useful workflow for researchers with similar research topics. Show less
Background: Recent advances in data-driven computational approaches have been helpful in devising tools to objectively diagnose psychiatric disorders. However, current machine learning studies... Show moreBackground: Recent advances in data-driven computational approaches have been helpful in devising tools to objectively diagnose psychiatric disorders. However, current machine learning studies limited to small homogeneous samples, different methodologies, and different imaging collection protocols, limit the ability to directly compare and generalize their results. Here we aimed to classify individuals with PTSD versus controls and assess the generalizability using a large heterogeneous brain datasets from the ENIGMA-PGC PTSD Working group. Methods: We analyzed brain MRI data from 3,477 structural-MRI; 2,495 resting state-fMRI; and 1,952 diffusion-MRI. First, we identified the brain features that best distinguish individuals with PTSD from controls using traditional machine learning methods. Second, we assessed the utility of the denoising variational autoencoder (DVAE) and evaluated its classification performance. Third, we assessed the generalizability and reproducibility of both models using leave-one-site-out cross-validation procedure for each modality. Results: We found lower performance in classifying PTSD vs. controls with data from over 20 sites (60 % test AUC for s-MRI, 59 % for rs-fMRI and 56 % for D-MRI), as compared to other studies run on single-site data. The performance increased when classifying PTSD from HC without trauma history in each modality (75 % AUC). The classification performance remained intact when applying the DVAE framework, which reduced the number of features. Finally, we found that the DVAE framework achieved better generalization to unseen datasets compared with the traditional machine learning frameworks, albeit performance was slightly above chance. Conclusion: These results have the potential to provide a baseline classification performance for PTSD when using large scale neuroimaging datasets. Our findings show that the control group used can heavily affect classification performance. The DVAE framework provided better generalizability for the multi-site data. This may be more significant in clinical practice since the neuroimaging-based diagnostic DVAE classification models are much less site-specific, rendering them more generalizable. Show less
The recent surge in deployment and use of generative machine learning models has sparked an interest in the relationships between AI and creativity, or more specifically into the question and... Show moreThe recent surge in deployment and use of generative machine learning models has sparked an interest in the relationships between AI and creativity, or more specifically into the question and debate of whether machines can exhibit human-level creativity. This is by no means a new discussion, going back in time decades if not centuries. The debate was approached from multiple angles, and a general consensus was not yet reached. In this position paper, we present the long-standing debate as it formed across various fields such as cognitive science, philosophy, and computing, approaching it mainly from a historical perspective. Along the way we identify how the various views relate to recent developments in machine learning models and argue our own position regarding the question of whether machines can exhibit human-level creativity. As such we aim to involve computer scientists and AI practitioners into the ongoing debate. Show less
Transport inspectorates are looking for novel methods to identify dangerous behavior, ultimately to reduce risks associated to the movements of people and goods. We explore a data-driven approach... Show moreTransport inspectorates are looking for novel methods to identify dangerous behavior, ultimately to reduce risks associated to the movements of people and goods. We explore a data-driven approach to arrive at smart inspections of vehicles. Inspections are smart when they are performed (1) accurate, (2) automated, (3) fair, and (4) in an interpretable manner. We leverage tools from the network science and machine learning domain to encode the behavioral aspect of vehicle’s behavior. Tools used in this thesis include community detection, link prediction, and assortativity. We explore their applicability and provide technical methods. In the final chapter, we also discuss the matter of fairness in machine learning. Show less
Sewer pipes are an essential infrastructure in modern society and their proper operation is important for public health. To keep sewer pipes operational as much as possible, periodical inspections... Show moreSewer pipes are an essential infrastructure in modern society and their proper operation is important for public health. To keep sewer pipes operational as much as possible, periodical inspections for defects are performed. Instead of repairing sewer pipes when a problem becomes critical, such inspections allow municipalities to plan maintenance.Sewer pipe inspections are an attractive target for automation. While a potential improvement in terms of assessment quality and processing efficiency is generally promised by automation, in this case we would also decrease the variability which is a current problem. Besides the reasons for automating, the methods for automating are also attractive: a lot of (visual) data has been gathered over the past decades which may be used to train algorithms.This thesis compiles the results of five years of research into the possible automation of sewer pipe inspections with the tools of machine learning and computer vision. In this thesis, three distinct, yet complementary approaches to automating sewer pipe inspections are described:- Image-Based Unsupervised Anomaly Detection- Convolutional Neural Network Classification- Stereovision and Geometry Reconstruction Show less
Novel entities may pose risks to humans and the environment. The small particle size and relatively large surface area of micro- and nanoparticles (MNPs) make them capable of adsorbing other novel... Show moreNovel entities may pose risks to humans and the environment. The small particle size and relatively large surface area of micro- and nanoparticles (MNPs) make them capable of adsorbing other novel entities, leading to the formation of aggregated contamination. In this dissertation, we utilized advanced computational methods, such as molecular simulation, data mining, machine learning, and quantitative structure-activity relationship modeling. These methods were used to investigate the mechanisms of interaction between MNPs and other novel entities, the joint toxic action of MNPs and other novel entities, the factors affecting their joint toxicity to ecological species, as well as to quantitatively predict the interaction forces between MNPs and other novel entities, and the toxicity of their mixtures. The results indicate that understanding the mechanisms of interactions between novel entities and their modes of joint toxic action can provide an important theoretical basis for establishing effective risk assessment procedures to mitigate the effects of novel entities on ecosystems and human health. Furthermore, this dissertation provides important technical support and a practical basis for the quantitative prediction of the environmental behavior and toxicological effects of novel entities and their mixtures by applying various advanced in silico methods individually or in combination. Show less
This thesis looks at Artificial Intelligence (AI) and its potential to revolutionise the healthcare sector. The first part of this thesis focuses on the responsible development and validation of AI... Show moreThis thesis looks at Artificial Intelligence (AI) and its potential to revolutionise the healthcare sector. The first part of this thesis focuses on the responsible development and validation of AI-based clinical prediction algorithms, exploring the prime considerations in this process. The second part of this thesis addresses the opportunities for classical statistics and machine learning techniques for developing prediction algorithms. It also examines the performance, potential, and challenges of AI prediction algorithms for clinical practice. The conclusion states that cross-discipline collaboration, exchangeability of knowledge and results, and validation of AI for healthcare practice are essential for realising the potential of AI in healthcare. Show less
In this thesis, we examine various systems through the lens of several numerical methods. We delve into questions concerning thermalization in closed unitary systems, lattice gauge theories, and... Show moreIn this thesis, we examine various systems through the lens of several numerical methods. We delve into questions concerning thermalization in closed unitary systems, lattice gauge theories, and the intriguing properties of deep neural network phase spaces. Leveraging modern advancements in both software and hardware, we scrutinize these systems in greater detail, accessing previously unreachable regimes. Show less
Filippo, O. de; Cammann, V.L.; Pancotti, C.; Vece, D. di; Silverio, A.; Schweiger, V.; ... ; Templin, C. 2023
AimsTakotsubo syndrome (TTS) is associated with a substantial rate of adverse events. We sought to design a machine learning (ML)-based model to predict the risk of in-hospital death and to perform... Show moreAimsTakotsubo syndrome (TTS) is associated with a substantial rate of adverse events. We sought to design a machine learning (ML)-based model to predict the risk of in-hospital death and to perform a clustering of TTS patients to identify different risk profiles.Methods and resultsA ridge logistic regression-based ML model for predicting in-hospital death was developed on 3482 TTS patients from the International Takotsubo (InterTAK) Registry, randomly split in a train and an internal validation cohort (75% and 25% of the sample size, respectively) and evaluated in an external validation cohort (1037 patients). Thirty-one clinically relevant variables were included in the prediction model. Model performance represented the primary endpoint and was assessed according to area under the curve (AUC), sensitivity and specificity. As secondary endpoint, a K-medoids clustering algorithm was designed to stratify patients into phenotypic groups based on the 10 most relevant features emerging from the main model. The overall incidence of in-hospital death was 5.2%. The InterTAK-ML model showed an AUC of 0.89 (0.85–0.92), a sensitivity of 0.85 (0.78–0.95) and a specificity of 0.76 (0.74–0.79) in the internal validation cohort and an AUC of 0.82 (0.73–0.91), a sensitivity of 0.74 (0.61–0.87) and a specificity of 0.79 (0.77–0.81) in the external cohort for in-hospital death prediction. By exploiting the 10 variables showing the highest feature importance, TTS patients were clustered into six groups associated with different risks of in-hospital death (28.8% vs. 15.5% vs. 5.4% vs. 1.0.8% vs. 0.5%) which were consistent also in the external cohort.ConclusionA ML-based approach for the identification of TTS patients at risk of adverse short-term prognosis is feasible and effective. The InterTAK-ML model showed unprecedented discriminative capability for the prediction of in-hospital death. Show less
BACKGROUND:Pain evaluation remains largely subjective in neurosurgical practice, but machine learning provides the potential for objective pain assessment tools.OBJECTIVE:To predict daily pain... Show moreBACKGROUND:Pain evaluation remains largely subjective in neurosurgical practice, but machine learning provides the potential for objective pain assessment tools.OBJECTIVE:To predict daily pain levels using speech recordings from personal smartphones of a cohort of patients with diagnosed neurological spine disease.METHODS:Patients with spine disease were enrolled through a general neurosurgical clinic with approval from the institutional ethics committee. At-home pain surveys and speech recordings were administered at regular intervals through the Beiwe smartphone application. Praat audio features were extracted from the speech recordings to be used as input to a K-nearest neighbors (KNN) machine learning model. The pain scores were transformed from a 0 to 10 scale to low and high pain for better discriminative capacity.RESULTS:A total of 60 patients were enrolled, and 384 observations were used to train and test the prediction model. Using the KNN prediction model, an accuracy of 71% with a positive predictive value of 0.71 was achieved in classifying pain intensity into high and low. The model showed 0.71 precision for high pain and 0.70 precision for low pain. Recall of high pain was 0.74, and recall of low pain was 0.67. The overall F1 score was 0.73.CONCLUSION:Our study uses a KNN to model the relationship between speech features and pain levels collected from personal smartphones of patients with spine disease. The proposed model is a stepping stone for the development of objective pain assessment in neurosurgery clinical practice. Show less
Ouwerkerk, J.; Feleus, S.; Zwaan, K.F. van der; Li, Y.L.; Roos, M.; Roon-Mom, W.M.C. van; ... ; Mina, E. 2023
Background In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of dif‑ferent diseases, including cancer and neurodegenerative disorders. For rare diseases,... Show moreBackground In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of dif‑ferent diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirementfor large datasets often prevents this approach. Huntington’s disease (HD) is a rare neurodegenerative disorder causedby a CAG repeat expansion in the coding region of the huntingtin gene. The world’s largest observational studyfor HD, Enroll‑HD, describes over 21,000 participants. As such, Enroll‑HD is amenable to ML methods. In this study, wepre‑processed and imputed Enroll‑HD with ML methods to maximise the inclusion of participants and variables. Withthis dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well‑established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of MLmethods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments.Results Simple pre‑processing imputed around 42% of missing values in Enroll‑HD. Also, 167 variables were retainedas a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula.The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehnformula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accu‑rate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predictedwith an accuracy of 85.2%. The resulting pre‑processing workflow and code to train the ML models are available to beused for related HD predictions at: https:// github. com/ Jaspe rO98/ hdml/ tree/ main.Conclusions Our pre‑processing workflow made it possible to resolve the missing values and include most par‑ticipants and variables in Enroll‑HD. We show the added value of a ML approach, which improved AAO predictionsand allowed for the development of an advisory model that can assist clinicians and participants in estimating futuredriving capability. Show less
Archaeologists are creating ever-increasing amounts of textual data. So much in fact, that manual reading and inspection has become practically impossible. By leveraging computational approaches,... Show moreArchaeologists are creating ever-increasing amounts of textual data. So much in fact, that manual reading and inspection has become practically impossible. By leveraging computational approaches, it is possible to extract relevant information from this big data, allowing for more efficient research and new analyses. In this chapter, methods and techniques to extract information from archaeological texts through Machine Learning are introduced and discussed, with a focus on practical examples. After reading the chapter, you should have a clear grasp on the possibilities of text mining in archaeology, the current state of research, and enough information to start your own text analyses. Show less
Radiography is an important technique to inspect objects, with applications in airports and hospitals. X-ray imaging is also essential in industry, for instance in food safety checks for the... Show moreRadiography is an important technique to inspect objects, with applications in airports and hospitals. X-ray imaging is also essential in industry, for instance in food safety checks for the presence of foreign objects. Computed tomography (CT) enables more accurate visualizations of an object in 3D, but requires more computation time. Spectral X-ray imaging is an important recent development to optimize these conflicting goals of speed and accuracy. This technique enables separation of detected X-ray photons in terms of energy. More information can be extracted from spectral images, which allows for better separation of materials. Deep learning is another important recent technique enabling machines to quickly carry out processing tasks, by training these with large volumes of data for these specific tasks.In this dissertation we present new processing methods that use spectral imaging and machine learning, with a special focus on industrial processes. We design a workflow using CT to efficiently generate large volumes of machine learning training data. In addition, we develop a compression method for efficient processing of large volumes of spectral data and two new spectral CT methods to produce more accurate reconstructions. The presented methods are designed for effective use in industry. Show less
The focus of this thesis is on the technical methods which help promote the movement towards Trustworthy AI, specifically within the Inspectorate of the Netherlands.The goal is develop and assess... Show moreThe focus of this thesis is on the technical methods which help promote the movement towards Trustworthy AI, specifically within the Inspectorate of the Netherlands.The goal is develop and assess the technical methods which are required to shift the actions of the Inspectorate to a data-driven paradigm, concretely under a supervised classification framework of machine learning.The aspect of reliability is addressed as a data quality concern, viz. missingness and noise.The aspect of fairness is addressed as a counter to bias in the selection process of inspections.The conclusion is that, whilst no complete solution has yet been suggested, it is possible to address the concerns related to data quality and data bias, culminating in well-performing classification models which are reliable and fair. Show less
Introduction: Predicting checkpoint inhibitors treatment outcomes in melanoma is a relevant task, due to the unpredictable and potentially fatal toxicity and high costs for society. However,... Show moreIntroduction: Predicting checkpoint inhibitors treatment outcomes in melanoma is a relevant task, due to the unpredictable and potentially fatal toxicity and high costs for society. However, accurate biomarkers for treatment outcomes are lacking. Radiomics are a technique to quantitatively capture tumour characteristics on readily available computed tomography (CT) imaging. The purpose of this study was to investigate the added value of radiomics for predicting clinical benefit from checkpoint inhibitors in melanoma in a large, multicenter cohort.Methods: Patients who received first-line anti-PD1 +/- anti-CTLA4 treatment for advanced cutaneous melanoma were retrospectively identified from nine participating hospitals. For every patient, up to five representative lesions were segmented on baseline CT, and radiomics features were extracted. A machine learning pipeline was trained on the radiomics features to predict clinical benefit, defined as stable disease for more than 6 months or response per RECIST 1.1 criteria. This approach was evaluated using a leave-one-centre-out cross vali-dation and compared to a model based on previously discovered clinical predictors. Lastly, a combination model was built on the radiomics and clinical model.Results: A total of 620 patients were included, of which 59.2% experienced clinical benefit. The radiomics model achieved an area under the receiver operator characteristic curve (AUROC) of 0.607 [95% CI, 0.562-0.652], lower than that of the clinical model (AUROC=0.646 [95% CI, 0.600-0.692]). The combination model yielded no improvement over the clinical model in terms of discrimination (AUROC=0.636 [95% CI, 0.592-0.680]) or calibration. The output of the radiomics model was significantly correlated with three out of five input variables of the clinical model (p < 0.001). Discussion: The radiomics model achieved a moderate predictive value of clinical benefit, which was statistically significant. However, a radiomics approach was unable to add value to a simpler clinical model, most likely due to the overlap in predictive information learned by both models. Future research should focus on the application of deep learning, spectral CT -derived radiomics, and a multimodal approach for accurately predicting benefit to checkpoint inhibitor treatment in advanced melanoma.(c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less