Engineered nanomaterials (ENMs) are ubiquitous nowadays, finding their application in different fields of technology and various consumer products. Virtually any chemical can be manipulated at the... Show moreEngineered nanomaterials (ENMs) are ubiquitous nowadays, finding their application in different fields of technology and various consumer products. Virtually any chemical can be manipulated at the nano-scale to display unique characteristics which makes them appealing over larger sized materials. As the production and development of ENMs have increased considerably over time, so too have concerns regarding their adverse effects and environmental impacts. It is unfeasible to assess the risks associated with every single ENM through in vivo or in vitro experiments. As an alternative, in silico methods can be employed to evaluate ENMs. To perform such an evaluation, we collected data from databases and literature to create classification models based on machine learning algorithms in accordance with the principles laid out by the OECD for the creation of QSARs. The aim was to investigate the performance of various machine learning algorithms towards predicting a well-defined in vivo toxicity endpoint (Daphnia magna immobilization) and also to identify which features are important drivers of D. magna in vivo nanotoxicity. Results indicated highly comparable model performance between all algorithms and predictive performance exceeding ∼0.7 for all evaluated metrics (e.g. accuracy, sensitivity, specificity, balanced accuracy, Matthews correlation coefficient, area under the receiver operator characteristic curve). The random forest, artificial neural network, and k-nearest neighbor models displayed the best performance but this was only marginally better compared to the other models. Furthermore, the variable importance analysis indicated that molecular descriptors and physicochemical properties were generally important within most models, while features related to the exposure conditions produced slightly conflicting results. Lastly, results also indicate that reliable and robust machine learning models can be generated for in vivo endpoints with smaller datasets. Show less
Gorostiola Gonzalez, M.; Janssen, A.P.A.; IJzerman, A.P.; Heitman, L.H.; Westen, G.J.P. van 2022
The integration of machine learning and structure-based methods has proven valuable in the past as a way to prioritize targets and compounds in early drug discovery. In oncological research, these... Show moreThe integration of machine learning and structure-based methods has proven valuable in the past as a way to prioritize targets and compounds in early drug discovery. In oncological research, these methods can be highly beneficial in addressing the diversity of neoplastic diseases portrayed by the different hallmarks of cancer. Here, we review six use case scenarios for integrated computational methods, namely driver prediction, computational mutagenesis, (off)-target prediction, binding site prediction, virtual screening, and allosteric modulation analysis. We address the heterogeneity of integration approaches and individual methods, while acknowledging their current limitations and highlighting their potential to bring drugs for personalized oncological therapies to the market faster. Show less
Wall, H.E.C. van der; Hassing, G.J.; Doll, R.J.; Westen, G.J.P. van; Cohen, A.F.; Selder, J.L.; ... ; Gal, P. 2022
ObjectiveThe aim of the present study was to develop a neural network to characterize the effect of aging on the ECG in healthy volunteers. Moreover, the impact of the various ECG features on aging... Show moreObjectiveThe aim of the present study was to develop a neural network to characterize the effect of aging on the ECG in healthy volunteers. Moreover, the impact of the various ECG features on aging was evaluated.Methods & resultsA total of 6228 healthy subjects without structural heart disease were included in this study. A neural network regression model was created to predict age of the subjects based on their ECG; 577 parameters derived from a 12‑lead ECG of each subject were used to develop and validate the neural network; A tenfold cross-validation was performed, using 118 subjects for validation each fold. Using SHapley Additive exPlanations values the impact of the individual features on the prediction of age was determined. Of 6228 subjects tested, 1808 (29%) were females and mean age was 34 years, range 18-75 years. Physiologic age was estimated as a continuous variable with an average error of 6.9 ± 5.6 years (R2 = 0.72 ± 0.04). The correlation was slightly stronger for men (R2 = 0.74) than for women (R2 = 0.66). The most important features on the prediction of physiologic age were T wave morphology indices in leads V4 and V5, and P wave amplitude in leads AVR and II.ConclusionThe application of machine learning to the ECG using a neural network regression model, allows accurate estimation of physiologic cardiac age. This technique could be used to pick up subtle age-related cardiac changes, but also estimate the reversing of these age-associated effects by administered treatments. Show less
Stel, W. van der; Yang, H.; Le Dévédec, S.E.; Water, B. van de; Beltman, J.B.; Danen, E.H.J. 2022
Cells can adjust their mitochondrial morphology by altering the balance between mitochondrial fission and fusion to adapt to stressful conditions. The connection between a chemical perturbation,... Show moreCells can adjust their mitochondrial morphology by altering the balance between mitochondrial fission and fusion to adapt to stressful conditions. The connection between a chemical perturbation, changes in mitochondrial function, and altered mitochondrial morphology is not well understood. Here, we made use of high-throughput high-content confocal microscopy to assess the effects of distinct classes of oxidative phosphorylation (OXPHOS) complex inhibitors on mitochondrial parameters in a concentration and time resolved manner. Mitochondrial morphology phenotypes were clustered based on machine learning algorithms and mitochondrial integrity patterns were mapped. In parallel, changes in mitochondrial membrane potential (MMP), mitochondrial and cellular ATP levels, and viability were microscopically assessed. We found that inhibition of MMP, mitochondrial ATP production, and oxygen consumption rate (OCR) using sublethal concentrations of complex I and III inhibitors did not trigger mitochondrial fragmentation. Instead, complex V inhibitors that suppressed ATP and OCR but increased MMP provoked a more fragmented mitochondrial morphology. In agreement, complex V but not complex I or III inhibitors triggered proteolytic cleavage of the mitochondrial fusion protein, OPA1. The relation between increased MMP and fragmentation did not extend beyond OXPHOS complex inhibitors: increasing MMP by blocking the mPTP pore did not lead to OPA1 cleavage or mitochondrial fragmentation and the OXPHOS uncoupler FCCP was associated with OPA1 cleavage and MMP reduction. Altogether, our findings connect vital mitochondrial functions and phenotypes in a high-throughput high-content confocal microscopy approach that help understanding of chemical-induced toxicity caused by OXPHOS complex perturbing chemicals. Show less
In many real-world applications today, it is critical to continuously record and monitor certain machine or system health indicators to discover malfunctions or other abnormal behavior at an early... Show moreIn many real-world applications today, it is critical to continuously record and monitor certain machine or system health indicators to discover malfunctions or other abnormal behavior at an early stage and prevent potential harm. The demand for such reliable monitoring systems is expected to increase in the coming years. Particularly in the industrial context, in the course of ongoing digitization, it is becoming increasingly important to analyze growing volumes of data in an automated manner using state-of-the-art algorithms. In many practical applications, one has to deal with temporal data in the form of data streams or time series. The problem of detecting unusual (or anomalous) behavior in time series is commonly referred to as time series anomaly detection. Anomalies are events observed in the data that do not conform to the normal or expected behavior when viewed in their temporal context.This thesis focuses on unsupervised machine learning algorithms for anomaly detection in time series. In an unsupervised learning setup, a model attempts to learn the normal behavior in a time series — which might already be contaminated with anomalies — without any external assistance. The model can then use its learned notion of normality to detect anomalous events. Show less
The archaeology domain produces large amounts of texts, too much to effectively read or manually search through for research. To alleviate this problem, we created a search system (called AGNES),... Show moreThe archaeology domain produces large amounts of texts, too much to effectively read or manually search through for research. To alleviate this problem, we created a search system (called AGNES), which combines full text search with entity and geographical search. We first created a manually labelled data set to train a Named Entity Recognition model, which is used to extract entities from text. We also did a user requirement study, and usability evaluation on the system, to make sure it is suitable for archaeological research. In a case study on Early Medieval cremations, we show that using AGNES leads to a knowledge increase when compared to the knowledge of experts, gathered using previously available search engines. This shows that this kind of intelligent search system can help with literature research, find more relevant data, and lead to a better understanding of the past. Show less
Background: Given the strong relationship between depression and anxiety, there is an urge to investigate their shared and specific long-term course determinants. The current study aimed to... Show moreBackground: Given the strong relationship between depression and anxiety, there is an urge to investigate their shared and specific long-term course determinants. The current study aimed to identify and compare the main determinants of the 9-year trajectories of combined and pure depression and anxiety symptom severity. Methods: Respondents with a 6-month depression and/or anxiety diagnosis (n=1,701) provided baseline data on 152 sociodemographic, clinical and biological variables. Depression and anxiety symptom severity assessed at baseline, 2-, 4-, 6- and 9-year follow-up, were used to identify data-driven course-trajectory subgroups for general psychological distress, pure depression, and pure anxiety severity scores. For each outcome (classprobability), a Superlearner (SL) algorithm identified an optimally weighted (minimum mean squared error) combination of machine-learning prediction algorithms. For each outcome, the top determinants in the SL were identified by determining variable-importance and correlations between each SL-predicted and observed outcome (rho pred) were calculated. Results: Low to high prediction correlations (rho pred: 0.41-0.91, median=0.73) were found. In the SL, important determinants of psychological distress were age, young age of onset, respiratory rate, participation disability, somatic disease, low income, minor depressive disorder and mastery score. For course of pure depression and anxiety symptom severity, similar determinants were found. Specific determinants of pure depression included several types of healthcare-use, and of pure-anxiety course included somatic arousal and psychological distress. Limitations: Limited sample size for machine learning. Conclusions: The determinants of depression- and anxiety-severity course are mostly shared. Domain-specific exceptions are healthcare use for depression and somatic arousal and distress for anxiety-severity course. Show less
OBJECTIVESleep deprivation is known to affect driving behavior and may lead to serious car accidents similar to the effects from e.g., alcohol. In a previous study, we have demonstrated that the... Show moreOBJECTIVESleep deprivation is known to affect driving behavior and may lead to serious car accidents similar to the effects from e.g., alcohol. In a previous study, we have demonstrated that the use of machine learning techniques allows adequate characterization of abnormal driving behavior after alprazolam and/or alcohol intake. In the present study, we extend this approach to sleep deprivation and test the model for characterization of new interventions. We aimed to classify abnormal driving behavior after sleep deprivation, and, by using a machine learning model, we tested if this model could also pick up abnormal driving behavior resulting from other interventions.METHODSData were collected during a previous study, in which 24 subjects were tested after being sleep-deprived and after a well-rested night. Features were calculated from several driving parameters, such as the lateral position, speed of the car, and steering speed. In the present study, we used a gradient boosting model to classify sleep deprivation. The model was validated using a 5-fold cross validation technique. Next, probability scores were used to identify the overlap of driving behavior after sleep deprivation and driving behavior affected by other interventions. In the current study alprazolam, alcohol, and placebo are used to test/validate the approach.RESULTSThe sleep deprivation model detected abnormal driving behavior in the simulator with an accuracy of 77 ± 9%. Abnormal driving behavior after alprazolam, and to a lesser extent also after alcohol intake, showed remarkably similar characteristics to sleep deprivation. The average probability score for alprazolam and alcohol measurements was 0.79, for alcohol 0.63, and for placebo only 0.27 and 0.30, matching the expected relative drowsiness.CONCLUSIONWe developed a model detecting abnormal driving induced by sleep deprivation. The model shows the similarities in driving characteristics between sleep deprivation and other interventions, i.e., alcohol and alprazolam. Consequently, our model for sleep deprivation may serve as a next reference point for a driving test battery of newly developed drugs. Show less
In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint... Show moreIn link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks. Show less
Barrientos, A.; Holdship, J.R.; Solar, M.; Martín, S.; Rivilla, V.M.; Viti, S.; ... ; Humire, P. 2021
Algorithms have become increasingly common, and with this development, so have algorithms that approximate human speech. This has introduced new issues with which courts and legislators will have... Show moreAlgorithms have become increasingly common, and with this development, so have algorithms that approximate human speech. This has introduced new issues with which courts and legislators will have to grapple. Courts in the United States have found that search engine results are a form of speech that is protected by the Constitution, and cases in Europe concerning liability for autocomplete suggestions have led to varied results. Beyond these instances, insight into how courts handle algorithmic speech are few and far between.By focusing on three categories of algorithmic speech, defined as curated production, interactive/responsive production, and semiautonomous production, this Article analyzes these various forms of algorithmic speech within the international framework for freedom of expression. After a brief introduction of that framework and a look towards approaches to algorithmic speech in the United States, the Article then examines whether the creators or controllers of different forms of algorithms should be considered content providers or mere intermediaries, the determination of which ultimately has implications for liability, which is also explored. The Article then looks at possible interferences with algorithmic speech, and how such interferences may be examined under the three-part test—particular attention is paid to the balancing of rights and interests at play—in order to answer the question of the extent to which algorithmic speech is worthy of protection under international standards of freedom of expression. Finally, other relevant issues surrounding algorithmic speech are discussed that will have an impact going forward, many of which involve questions of policy and societal values that accompany granting algorithmic speech protection. Show less
It is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of... Show moreIt is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of future evaluations.In surrogate-model-assisted optimization, selecting the right modeling technique without preliminary knowledge about the objective function can be challenging. It might be beneficial if the algorithm trains many different surrogate models and selects the model with the smallest training error. This approach is known as model selection.In this thesis, a generalization of this approach is developed. Instead of choosing a single model, the optimal convex combinations of model predictions is used to combine surrogate models into one more accurate ensemble surrogate model.This approach is studied in a fundamental way, by first evaluating minimalistic ensembles of only two surrogate models in detail and then proceeding to ensembles with more surrogate models.Finally, the approach is adopted and evaluated in the context of sequential parameter optimization. Besides discussing the general strategy, the optimal frequency of learning the convex combination of weights is investigated.The results provide insights into the performance, scalability, and robustness of the approach. Show less
A P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such... Show moreA P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such as patients with Amyotrophic Lateral Sclerosis or spinal-cord injury because a P300 speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Unfortunately, P300 spellers are still not used in human’s daily life and remain in an experimental stage at research labs. The reason for this situation is that the performance and the efficiency of current P300 spellers are unacceptably low for BCI users in their daily life. Therefore, in this thesis, we have focused our attention on developing high performance and efficient P300 spellers in order to bring P300 spellers into practical use. More specifically, in order to increase the performance of a P300 speller, we have developed methods to increase the character spelling accuracy and the Information Transfer Rate. In order to improve the efficiency of a P300 speller, we have developed methods to reduce the number of sensors needed to acquire EEG signals as well as to reduce the complexity of the classifier used in a P300 speller without losing the performance. Show less
The high velocity tail of the total velocity distribution of stars provides essential insight into fundamental properties of the Galaxy. Hypervelocity stars (HVSs), travelling on unbound orbits... Show moreThe high velocity tail of the total velocity distribution of stars provides essential insight into fundamental properties of the Galaxy. Hypervelocity stars (HVSs), travelling on unbound orbits coming from the Galactic Centre, are powerful tracers of the underlying Galactic gravitational potential, and can shed light on the stellar population in the proximity of our massive black hole. Runaway stars, ejected from the stellar disk, provide information on binary evolution and stellar clusters' dynamical processes. The advent of the data from the European Space Agency satellite Gaia has revolutionized our knowledge on high velocity stars. In my PhD thesis entitled "Hunting for the fastest stars in the Milky Way" I present my work on searching for the fastest stars in the Milky Way. I start by presenting our modelling work on predicting the HVS population expected to be contained in the Gaia catalogue. Then I illustrate the data mining techniques built and implemented to find these rare objects in the first and second data release of Gaia. Finally, I conclude discussing how HVSs can be used to constrain the Galactic dark matter halo and the binary population in the Galactic Centre. Show less
Brandsen, A.; Lambers, K.; Verberne, S.; Wansleeben, M. 2019
In this paper, we present the results of user requirement solicitation for a search system of grey literature in archaeology, specifically Dutch excavation reports. This search system uses Named... Show moreIn this paper, we present the results of user requirement solicitation for a search system of grey literature in archaeology, specifically Dutch excavation reports. This search system uses Named Entity Recognition and Information Retrieval techniques to create an effective and effortless search experience. Specifically, we used Conditional Random Fields to identify entities, with an average accuracy of 56%. This is a baseline result, and we identified many possibilities for improvement. These entities were indexed in ElasticSearch and a user interface was developed on top of the index. This proof of concept was used in user requirement solicitation and evaluation with a group of end users. Feedback from this group indicated that there is a dire need for such a system, and that the first results are promising. Show less
Mining time series is a machine learning subfield that focuses on a particular data structure, where variables are measured over (short or long) periods of time. In this thesis we focus on... Show moreMining time series is a machine learning subfield that focuses on a particular data structure, where variables are measured over (short or long) periods of time. In this thesis we focus on multivariate time series, with multiple vari- ables measured over the same period of time. In most cases, such variables are collected at different sampling rates. When combined, these variables can be explored with machine learning methods for multiple purposes.Firstly, we consider the possibility of unsupervised learning. In this case, we propose a pattern recognition method that discovers subsets of variables that show consistent behavior in a number of shared time segments. Fur- thermore, when in a supervised setting, given a dependent variable (target),we propose a method that aggregates independent variables into meaningful features.Additionally to the methods above, we provide two tools in the form of Software as a Service, where users without programming background can intuitively follow the learning and testing methodologies for both methods.Finally, we present an applied study of machine learning to improve speed skating athletes performance. Here, we make a deep analysis of historical data, in order to help optimize performance results. Show less
Many databases do not consist of a single table of fixed dimensions, but of objects that are related to each other: the databases are relational, or structured. We study the discovery of patterns... Show moreMany databases do not consist of a single table of fixed dimensions, but of objects that are related to each other: the databases are relational, or structured. We study the discovery of patterns in such data. In our approach, a data analyst specifies constraints on patterns that she believes to be of interest, and the computer searches for patterns that satisfy these constraints. An important constraint on which we focus, is the constraint that a pattern should have a significant number of occurrences in the data. Constraints like this allow the search to be performed reasonably efficiently. We develop algorithms for searching ppatterns taht are represented in formal first order logic, tree data structures and graph data structures. We perform experiments in which these algorithms, and algorithms proposed by other researchers, are compared with each other, and study which properties determine the efficiency of the algorithms. As a result, we are able to develop more efficient algorithms. As application we study the discovery of fragments in molecular datasets. The aim is to discover fragments that relate the structure of molecules to their activity. Show less