Toxicological information as needed for risk assessments of chemical compounds is often sparse. Unfortunately, gathering new toxicological information experimentally often involves animal testing.... Show moreToxicological information as needed for risk assessments of chemical compounds is often sparse. Unfortunately, gathering new toxicological information experimentally often involves animal testing. Simulated alternatives, e.g., quantitative structure – activity relationship (QSAR) models, are preferred to infer the toxicity of new compounds. Aquatic toxicity data collections consist of many related tasks, each predicting the toxicity of new compounds on a given species. Since many of these tasks are inherently low-resource, i.e., involve few associated compounds, this is challenging. Meta-learning is a subfield of artificial intelligence that can lead to more accurate models by enabling the utilization of information across tasks. In our work, we benchmark various state-of-the-art meta-learning techniques for building QSAR models, focusing on knowledge sharing between species. Specifically, we employ and compare transformational machine learning, model-agnostic meta-learning, fine-tuning, and multi-task models. Our experiments show that established knowledge-sharing techniques outperform single-task approaches. We recommend the use of multi task random forest models for aquatic toxicity modeling, which matched or exceeded the performance of other approaches and robustly produced good results in the low-resource settings we studied. This model functions on a species level, predicting toxicity for multiple species across various phyla, with flexible exposure duration and on a large chemical applicability domain. Show less
König, H.M.T.; Bosman, A.W.; Hoos, H.H.; Rijn, J.N. van 2023
Many fields of computational science advance through improvements in the algorithms used for solving key problems. These advancements are often facilitated by benchmarks and competitions that... Show moreMany fields of computational science advance through improvements in the algorithms used for solving key problems. These advancements are often facilitated by benchmarks and competitions that enable performance comparisons and rankings of solvers. Simultaneously, meta-algorithmic techniques, such as automated algorithm selection and configuration, enable performance improvements by utilizing the complementary strengths of different algorithms or configurable algorithm components. In fact, meta-algorithms have become major drivers in advancing the state of the art in solving many prominent computational problems. However, meta-algorithmic techniques are complex and difficult to use correctly, while their incorrect use may reduce their efficiency, or in extreme cases, even lead to performance losses. Here, we introduce the Sparkle platform, which aims to make meta-algorithmic techniques more accessible to nonexpert users, and to make these techniques more broadly available in the context of competitions, to further enable the assessment and advancement of the true state of the art in solving challenging computational problems. To achieve this, Sparkle implements standard protocols for algorithm selection and configuration that support easy and correct use of these techniques. Following an experiment, Sparkle generates a report containing results, problem instances, algorithms, and other relevant information, for convenient use in scientific publications. Show less
Given the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal... Show moreGiven the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal data can be of tremendous value. Existing methods for spatial interpolation, most notably Gaussian processes and spatial autoregressive models, tend to suffer from (a) a trade-off between modelling local or global spatial interaction, (b) the assumption there is only one possible path between two points, and (c) the assumption of homogeneity of intermediate locations between points. Addressing these issues, we propose a value propagation-based spatial interpolation method called VPint, inspired by Markov reward processes (MRPs), and introduce two variants thereof: (i) a static discount (SD-MRP) and (ii) a data-driven weight prediction (WP-MRP) variant. Both these interpolation variants operate locally, while implicitly accounting for global spatial relationships in the entire system through recursion. We evaluated our proposed methods by comparing the mean absolute error, root mean squared error, peak signal-to-noise ratio and structural similarity of interpolated grid cells to those of 8 common baselines. Our analysis involved detailed experiments on a synthetic and two real-world datasets, as well as experiments on convergence and scalability. Empirical results demonstrate the competitive advantage of VPint on randomly missing data, where it performed better than baselines in terms of mean absolute error and structural similarity, as well as spatially clustered missing data, where it performed best on 2 out of 3 datasets. Show less
Bargaining can be used to resolve mixed-motive games in multi-agent systems. Although there is an abundance of negotiation strategies implemented in automated negotiating agents, most agents are... Show moreBargaining can be used to resolve mixed-motive games in multi-agent systems. Although there is an abundance of negotiation strategies implemented in automated negotiating agents, most agents are based on single fixed strategies, while it is acknowledged that there is no single best-performing strategy for all negotiation settings.In this paper, we focus on bargaining settings where opponents are repeatedly encountered, but the bargaining problems change. We introduce a novel method that automatically creates and deploys a portfolio of complementary negotiation strategies using a training set and optimise pay-off in never-before-seen bargaining settings through per-setting strategy selection. Our method relies on the following contributions. We introduce a feature representation that captures characteristics for both the opponent and the bargaining problem. We model the behaviour of an opponent during a negotiation based on its actions, which is indicative of its negotiation strategy, in order to be more effective in future encounters.Our combination of feature-based methods generalises to new negotiation settings, as in practice, over time, it selects effective counter strategies in future encounters. Our approach is tested in an ANAC-like tournament, and we show that we are capable of winning such a tournament with a 5.6% increase in pay-off compared to the runner-up agent. Show less
In the context of the current COVID-19 pandemic, various sophisticated epidemic and machine learning models have been used for forecasting. These models, however, rely on carefully selected... Show moreIn the context of the current COVID-19 pandemic, various sophisticated epidemic and machine learning models have been used for forecasting. These models, however, rely on carefully selected architectures and detailed data that is often only available for specific regions. Automated machine learning (AutoML) addresses these challenges by allowing to automatically create forecasting pipelines in a data-driven manner, resulting in high-quality predictions. In this paper, we study the role of open data along with AutoML systems in acquiring high-performance forecasting models for COVID-19. Here, we adapted the AutoML framework auto-sklearn to the time series forecasting task and introduced two variants for multi-step ahead COVID-19 forecasting, which we refer to as (a) multi-output and (b) repeated single output forecasting. We studied the usefulness of anonymised open mobility datasets (place visits and the use of different transportation modes) in addition to open mortality data. We evaluated three drift adaptation strategies to deal with concept drifts in data by (i) refitting our models on part of the data, (ii) the full data, or (iii) retraining the models completely. We compared the performance of our AutoML methods in terms of RMSE with five baselines on two testing periods (over 2020 and 2021). Our results show that combining mobility features and mortality data improves forecasting accuracy. Furthermore, we show that when faced with concept drifts, our method refitted on recent data using place visits mobility features outperforms all other approaches for 22 of the 26 countries considered in our study. Show less
Wang, C.; Baratchi, M.; Bäck, T.H.W.; Hoos, H.H.; Limmer, S.; Olhofer, M. 2022
Early time series classification (EarlyTSC) involves the prediction of a class label based on partial observation of a given time series. Most EarlyTSC algorithms consider the trade-off between... Show moreEarly time series classification (EarlyTSC) involves the prediction of a class label based on partial observation of a given time series. Most EarlyTSC algorithms consider the trade-off between accuracy and earliness as two competing objectives, using a single dedicated hyperparameter. To obtain insights into this trade-off requires finding a set of non-dominated (Pareto efficient) classifiers. So far, this has been approached through manual hyperparameter tuning. Since the trade-off hyperparameters only provide indirect control over the earliness-accuracy trade-off, manual tuning is tedious and tends to result in many sub-optimal hyperparameter settings. This complicates the search for optimal hyperparameter settings and forms a hurdle for the application of EarlyTSC to real-world problems. To address these issues, we propose an automated approach to hyperparameter tuning and algorithm selection for EarlyTSC, building on developments in the fast-moving research area known as automated machine learning (AutoML). To deal with the challenging task of optimising two conflicting objectives in early time series classification, we propose MultiETSC, a system for multi-objective algorithm selection and hyperparameter optimisation (MO-CASH) for EarlyTSC. MultiETSC can potentially leverage any existing or future EarlyTSC algorithm and produces a set of Pareto optimal algorithm configurations from which a user can choose a posteriori. As an additional benefit, our proposed framework can incorporate and leverage time-series classification algorithms not originally designed for EarlyTSC for improving performance on EarlyTSC; we demonstrate this property using a newly defined, "naive" fixed-time algorithm. In an extensive empirical evaluation of our new approach on a benchmark of 115 data sets, we show that MultiETSC performs substantially better than baseline methods, ranking highest (avg. rank 1.98) compared to conceptually simpler single-algorithm (2.98) and single-objective alternatives (4.36). Show less
Background: Predicting the onset and course of mood and anxiety disorders is of clinical importance but remains difficult. We compared the predictive performances of traditional logistic regression... Show moreBackground: Predicting the onset and course of mood and anxiety disorders is of clinical importance but remains difficult. We compared the predictive performances of traditional logistic regression, basic probabilistic machine learning (ML) methods, and automated ML (Auto-sklearn).Methods: Data were derived from the Netherlands Study of Depression and Anxiety. We compared how well multinomial logistic regression, a na?ve Bayes classifier, and Auto-sklearn predicted depression and anxiety diagnoses at a 2-, 4-, 6-, and 9-year follow up, operationalized as binary or categorical variables. Predictor sets included demographic and self-report data, which can be easily collected in clinical practice at two initial time points (baseline and 1-year follow up).Results: At baseline, participants were 42.2 years old, 66.5% were women, and 53.6% had a current mood or anxiety disorder. The three methods were similarly successful in predicting (mental) health status, with correct predictions for up to 79% (95% CI 75?81%). However, Auto-sklearn was superior when assessing a more complex dataset with individual item scores.Conclusions: Automated ML methods added only limited value, compared to traditional data modelling when predicting the onset and course of depression and anxiety. However, they hold potential for automatization and may be better suited for complex datasets. Show less
Bontempi, G.; Chavarriaga, R.; De Canck, H.; Girardi, E.; Hoos, H.H.; Kilbane-Dawe, I.; ... ; Maratea, M. 2021
A volunteer effort by Artificial Intelligence (AI) researchers has shown it can deliver significant research outcomes rapidly to help tackle COVID-19. Within two months, CLAIRE's self-organising... Show moreA volunteer effort by Artificial Intelligence (AI) researchers has shown it can deliver significant research outcomes rapidly to help tackle COVID-19. Within two months, CLAIRE's self-organising volunteers delivered the World's first comprehensive curated repository of COVID-19-related datasets useful for drug-repurposing, drafted review papers on the role CT/X-ray scan analysis and robotics could play, and progressed research in other areas. Given the pace required and nature of voluntary efforts, the teams faced a number of challenges. These offer insights in how better to prepare for future volunteer scientific efforts and large scale, data-dependent AI collaborations in general. We offer seven recommendations on how to best leverage such efforts and collaborations in the context of managing future crises. Show less
Serban, A.; Blom, K. van der; Hoos, H.H.; Visser, J. 2021