This thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies... Show moreThis thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies delved into the application of proteochemometrics (PCM), a machine learning technique that can be used to find relations in protein-ligand bioactivity data and then predict using a virtual screen whether compounds that had never been tested on a particular protein, or set of proteins. With this, sets of compounds were suggested for experimental validation that were significant in a myriad of ways. Another study investigated the mutational patterns in cancer, applying a large dataset of mutation data and identifying several motifs in G protein-coupled receptors. The thesis also contains the work done on the Papyrus dataset, a large scale bioactivity dataset that focuses on standardising data for computational drug discovery and providing an out-of-the-box set that can be used in a variety of settings. Show less
This thesis introduces the concept of "physics-based inverse design", working on the notion that the physical driving forces governing functionality are inherently encoded in independently... Show moreThis thesis introduces the concept of "physics-based inverse design", working on the notion that the physical driving forces governing functionality are inherently encoded in independently parameterized energy functions, which can be resolved through the use of inverse design strategies.The thesis describes the development of EVO-MD, a Python-based implementation of the physics-based inverse design concept. EVO-MD is capable of automatically setting-up, performing, and analyzing molecular dynamics simulations, allowing for the evolutionary optimization of complex and dynamic features in peptides. Examples of such applications include the optimization of lipid composition and curvature sensors, and the development of peptides with antiviral properties. Show less
This thesis investigates how the assessment of circular economy (CE) at the macro-economic level can be facilitated and promoted. First, a study on the socio-economic environmental impacts of... Show moreThis thesis investigates how the assessment of circular economy (CE) at the macro-economic level can be facilitated and promoted. First, a study on the socio-economic environmental impacts of international agricultural supply chain is presented to better exemplify how Multi-Regional Environmental Extended Input-Output (MR EEIO) data can be used to support policy making. Then, a Python software package (pycirk) and methods for standardized and replicable CE scenarios are presented with a case study on the global environmental and socio-economic impacts CE strategies. The thesis also presents an easy to use and open-source web-based tool for CE scenario construction and analysis (RaMa-Scene). Through these studies, MR EEIO appears to be an adequate tool to assess CE scenarios. However, the implementation of CE interventions will require a variety of micro-level changes across the current international production and consumption system and in many cases more detailed data is required than what is currently available in existing MR EEIO databases. Data availability for CE assessment could be increased through the use of Computer-Aided Technologies and Artificial Intelligence methods in combination with Life Cycle Inventory modelling and MR EEIO databases, but this is only one potential way forward. In fact, the industrial ecology and circular economy communities have many opportunities ahead to improve data collection practices by leveraging digital technologies and artificial intelligence methods. However, coordination in these scientific communities is needed to ensure that the full potential of these technological developments is harvested for the benefit of a sustainable circular economy and society. Show less
Als gevolg van de grote technologische vooruitgang in de gezondheidszorg worden in toenemende mate gegevens verzameld tijdens de uitvoering van klinische onderzoeken. Het is evenwel essentieel om... Show moreAls gevolg van de grote technologische vooruitgang in de gezondheidszorg worden in toenemende mate gegevens verzameld tijdens de uitvoering van klinische onderzoeken. Het is evenwel essentieel om te beseffen dat gegevens op zich van weinig of geen waarde zijn. Ten behoeve van hun optimale bruikbaarheid dienen gegevens geanalyseerd, geïnterpreteerd en verwerkt te worden. Machine learning-strategieën kunnen hiertoe nuttige en adequate oplossingen bieden. Dit proefschrift bevat machine learning-benaderingen toegepast op verschillende klinische datasets. De klassieke gegevens bestaan uit elektrische signalen van het electrocardiogram (ecg) verkregen bijgezonde proefpersonen, de innovatieve gegevens zijn afkomstig vanmetingen in een rijsimulator, en de opkomende gegevens zijn afgeleid van dna-analyse van de micro-organismen die op de huid voorkomenvan patiënten met huidziekten. We toonden aan dat het aantal ECG’s van invloed was op de nauwkeurigheid van geschatte verlenging van het qt-interval voor alle ingezette qt-correctieformules. Met behulp van SHapley AdditiveexPlanations (shap)-waarden werd de impact van de individuele kenmerken op de voorspelling van fysiologische leeftijd van het hart bepaald. We maakten gebruik van machine learning voor een betere beoordeling van de rijprestaties van bestuurders die medicijnen gebruikten. Tot slot lieten we zien dat de belangrijkste micro-organismen voor discriminatie van seborrroische dermatitis – naast Cutibacterium en Staphylococcus – kwamen relatief weinig voor, waardoor men deze micro-organismen in standaardanalyses eenvoudig over het hoofd kan zien. Daarmee hebben we aangetoond dat machine learning kanworden toegepast op gegevens die zijn afgeleid van klinische onderzoeken om in een vroeg stadium het effect van medicijnen en andere interventies op te sporen en te evalueren. Show less
In order to answer the research question, the dissertation is divided into four parts. Part I examines the ratio legis of the 1999 Montreal Convention to determine to what extent uniformity is a... Show moreIn order to answer the research question, the dissertation is divided into four parts. Part I examines the ratio legis of the 1999 Montreal Convention to determine to what extent uniformity is a principal aim of the convention that must be pursued in its application. Part II analyses the factors which already existed at the time of the signing and prevented its uniform application. Part III scrutinizes the fragmentation factors that only appeared during the lifespan of the convention. Part IV makes different suggestions to improve the uniform application of the convention and to reduce its fragmentation. The author concludes the research with a list of not less than 10 recommendations to protect the aim of uniformity of the international air carrier liability regime established by the convention. Show less
The thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of... Show moreThe thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of HEP calculations. One of the issues is solving mathematical expressions of interest with millions of terms. These calculations can be solved with the FORM program, which is software for symbolic manipulation. Since these calculations are computationally intensive and take a large amount of time, the FORM program was parallelized to solve them in a reasonable amount of time.Therefore, any new algorithm based on MCTS, should also be parallelized. This requirement was behind the problem statement of the thesis: “How do we design a structured pattern-based parallel programming approach for efficient parallelism of MCTS for both multi-core and manycore shared-memory machines?”.To answer this question, the thesis approached the MCTS parallelization problem in three levels: (1) implementation level, (2) data structure level, and (3) algorithm level.In the implementation level, we proposed task-level parallelization over thread-level parallelization. Task-level parallelization provides us with efficient parallelism for MCTS to utilize cores on both multi-core and manycore machines.In the data structure level, we presented a lock-free data structure that guarantees the correctness. A lock-free data structure (1) removes the synchronization overhead when a parallel program needs many tasks to feed its cores and (2) improves both performance and scalability.In the algorithm level, we first explained how to use pipeline pattern for parallelization of MCTS to overcome search overhead. Then, through a step by step approach, we were able to propose and detail the structured parallel programming approach for Monte Carlo Tree Search. Show less
A P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such... Show moreA P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such as patients with Amyotrophic Lateral Sclerosis or spinal-cord injury because a P300 speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Unfortunately, P300 spellers are still not used in human’s daily life and remain in an experimental stage at research labs. The reason for this situation is that the performance and the efficiency of current P300 spellers are unacceptably low for BCI users in their daily life. Therefore, in this thesis, we have focused our attention on developing high performance and efficient P300 spellers in order to bring P300 spellers into practical use. More specifically, in order to increase the performance of a P300 speller, we have developed methods to increase the character spelling accuracy and the Information Transfer Rate. In order to improve the efficiency of a P300 speller, we have developed methods to reduce the number of sensors needed to acquire EEG signals as well as to reduce the complexity of the classifier used in a P300 speller without losing the performance. Show less
This study presents an agent-based simulation model exploring the patterns of presence and absence of Late Pleistocene Neanderthals in western Europe. HomininSpace implements a parameterized... Show moreThis study presents an agent-based simulation model exploring the patterns of presence and absence of Late Pleistocene Neanderthals in western Europe. HomininSpace implements a parameterized generic demographic and social model of hominin dispersal while avoiding parameter value biases and explicitly modelled handicaps. Models are simulated through time within a high-resolution environment where reconstructed temperatures and precipitation levels influence the carrying capacity of the landscape. Model parameter values are assigned and varied automatically while optimizing the match with Neanderthal archaeology using a Genetic Algorithm (GA) inspired by the processes of natural selection. The system is able to traverse the huge parameter space that is created by the complete set of all possible parameter value combinations to find those values that will result in a simulation that matches well with archaeological data in the form of radiometrically obtained presence data. Show less
Mining time series is a machine learning subfield that focuses on a particular data structure, where variables are measured over (short or long) periods of time. In this thesis we focus on... Show moreMining time series is a machine learning subfield that focuses on a particular data structure, where variables are measured over (short or long) periods of time. In this thesis we focus on multivariate time series, with multiple vari- ables measured over the same period of time. In most cases, such variables are collected at different sampling rates. When combined, these variables can be explored with machine learning methods for multiple purposes.Firstly, we consider the possibility of unsupervised learning. In this case, we propose a pattern recognition method that discovers subsets of variables that show consistent behavior in a number of shared time segments. Fur- thermore, when in a supervised setting, given a dependent variable (target),we propose a method that aggregates independent variables into meaningful features.Additionally to the methods above, we provide two tools in the form of Software as a Service, where users without programming background can intuitively follow the learning and testing methodologies for both methods.Finally, we present an applied study of machine learning to improve speed skating athletes performance. Here, we make a deep analysis of historical data, in order to help optimize performance results. Show less
Many databases do not consist of a single table of fixed dimensions, but of objects that are related to each other: the databases are relational, or structured. We study the discovery of patterns... Show moreMany databases do not consist of a single table of fixed dimensions, but of objects that are related to each other: the databases are relational, or structured. We study the discovery of patterns in such data. In our approach, a data analyst specifies constraints on patterns that she believes to be of interest, and the computer searches for patterns that satisfy these constraints. An important constraint on which we focus, is the constraint that a pattern should have a significant number of occurrences in the data. Constraints like this allow the search to be performed reasonably efficiently. We develop algorithms for searching ppatterns taht are represented in formal first order logic, tree data structures and graph data structures. We perform experiments in which these algorithms, and algorithms proposed by other researchers, are compared with each other, and study which properties determine the efficiency of the algorithms. As a result, we are able to develop more efficient algorithms. As application we study the discovery of fragments in molecular datasets. The aim is to discover fragments that relate the structure of molecules to their activity. Show less