As the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by... Show moreAs the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user's main activity location and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this article, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this article are as follows: (i) providing an extensive survey of context-aware location recommendation; (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, which can incorporate all the major contextual information into a single recommendation model; and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy. Show less
Nasri, M.; Tsou, Y.-T.; Koutamanis, A.; Baratchi, M.; Giest, S.; Reidsma, D.; Rieffe, C. 2022
Social participation at schoolyards is crucial for children's development. Yet, schoolyard environments contain features that can hinder children's social participation. In this paper, we... Show moreSocial participation at schoolyards is crucial for children's development. Yet, schoolyard environments contain features that can hinder children's social participation. In this paper, we empirically examine schoolyards to identify existing obstacles. Traditionally, this type of study requires huge amounts of detailed information about children in a given environment. Collecting such data is exceedingly difficult and expensive. In this study, we present a novel sensor data-driven approach for gathering this information and examining the effect of schoolyard environments on children's behaviours in light of schoolyard affordances and individual effectivities. Sensor data is collected from 150 children at two primary schools, using location trackers, proximity tags, and Multi-Motion receivers to measure locations, face-to-face contacts, and activities. Results show strong potential for this data-driven approach, as it allows collecting data from individuals and their interactions with schoolyard environments, examining the triad of physical, social, and cultural affordances in schoolyards, and identifying factors that significantly impact children's behaviours. Based on this approach, we further obtain better knowledge on the impact of these factors and identify limitations in schoolyard designs, which can inform schools, designers, and policymakers about current problems and practical solutions. Show less
Nasri, M.; Tsou, Y-T.; Koutamanis, A.; Baratchi, M.; Giest, S.; Reidsma, D.; Rieffe, C. 2022
Given the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal... Show moreGiven the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal data can be of tremendous value. Existing methods for spatial interpolation, most notably Gaussian processes and spatial autoregressive models, tend to suffer from (a) a trade-off between modelling local or global spatial interaction, (b) the assumption there is only one possible path between two points, and (c) the assumption of homogeneity of intermediate locations between points. Addressing these issues, we propose a value propagation-based spatial interpolation method called VPint, inspired by Markov reward processes (MRPs), and introduce two variants thereof: (i) a static discount (SD-MRP) and (ii) a data-driven weight prediction (WP-MRP) variant. Both these interpolation variants operate locally, while implicitly accounting for global spatial relationships in the entire system through recursion. We evaluated our proposed methods by comparing the mean absolute error, root mean squared error, peak signal-to-noise ratio and structural similarity of interpolated grid cells to those of 8 common baselines. Our analysis involved detailed experiments on a synthetic and two real-world datasets, as well as experiments on convergence and scalability. Empirical results demonstrate the competitive advantage of VPint on randomly missing data, where it performed better than baselines in terms of mean absolute error and structural similarity, as well as spatially clustered missing data, where it performed best on 2 out of 3 datasets. Show less
Kefalas, M.; Stein, B. van; Baratchi, M.; Apostolidis, A.; Bäck, T.H.W. 2022
In the context of the current COVID-19 pandemic, various sophisticated epidemic and machine learning models have been used for forecasting. These models, however, rely on carefully selected... Show moreIn the context of the current COVID-19 pandemic, various sophisticated epidemic and machine learning models have been used for forecasting. These models, however, rely on carefully selected architectures and detailed data that is often only available for specific regions. Automated machine learning (AutoML) addresses these challenges by allowing to automatically create forecasting pipelines in a data-driven manner, resulting in high-quality predictions. In this paper, we study the role of open data along with AutoML systems in acquiring high-performance forecasting models for COVID-19. Here, we adapted the AutoML framework auto-sklearn to the time series forecasting task and introduced two variants for multi-step ahead COVID-19 forecasting, which we refer to as (a) multi-output and (b) repeated single output forecasting. We studied the usefulness of anonymised open mobility datasets (place visits and the use of different transportation modes) in addition to open mortality data. We evaluated three drift adaptation strategies to deal with concept drifts in data by (i) refitting our models on part of the data, (ii) the full data, or (iii) retraining the models completely. We compared the performance of our AutoML methods in terms of RMSE with five baselines on two testing periods (over 2020 and 2021). Our results show that combining mobility features and mortality data improves forecasting accuracy. Furthermore, we show that when faced with concept drifts, our method refitted on recent data using place visits mobility features outperforms all other approaches for 22 of the 26 countries considered in our study. Show less
Wang, C.; Baratchi, M.; Bäck, T.H.W.; Hoos, H.H.; Limmer, S.; Olhofer, M. 2022
Early time series classification (EarlyTSC) involves the prediction of a class label based on partial observation of a given time series. Most EarlyTSC algorithms consider the trade-off between... Show moreEarly time series classification (EarlyTSC) involves the prediction of a class label based on partial observation of a given time series. Most EarlyTSC algorithms consider the trade-off between accuracy and earliness as two competing objectives, using a single dedicated hyperparameter. To obtain insights into this trade-off requires finding a set of non-dominated (Pareto efficient) classifiers. So far, this has been approached through manual hyperparameter tuning. Since the trade-off hyperparameters only provide indirect control over the earliness-accuracy trade-off, manual tuning is tedious and tends to result in many sub-optimal hyperparameter settings. This complicates the search for optimal hyperparameter settings and forms a hurdle for the application of EarlyTSC to real-world problems. To address these issues, we propose an automated approach to hyperparameter tuning and algorithm selection for EarlyTSC, building on developments in the fast-moving research area known as automated machine learning (AutoML). To deal with the challenging task of optimising two conflicting objectives in early time series classification, we propose MultiETSC, a system for multi-objective algorithm selection and hyperparameter optimisation (MO-CASH) for EarlyTSC. MultiETSC can potentially leverage any existing or future EarlyTSC algorithm and produces a set of Pareto optimal algorithm configurations from which a user can choose a posteriori. As an additional benefit, our proposed framework can incorporate and leverage time-series classification algorithms not originally designed for EarlyTSC for improving performance on EarlyTSC; we demonstrate this property using a newly defined, "naive" fixed-time algorithm. In an extensive empirical evaluation of our new approach on a benchmark of 115 data sets, we show that MultiETSC performs substantially better than baseline methods, ranking highest (avg. rank 1.98) compared to conceptually simpler single-algorithm (2.98) and single-objective alternatives (4.36). Show less
Kefalas, M.; Baratchi, M.; Apostolidis, A.; Herik, D. van den; Bäck, T.H.W. 2021
How do we make sure that all citizens in a city have access to enough green space? An increasing part of the world’s population lives in urban areas, where contact with nature is largely reduced to... Show moreHow do we make sure that all citizens in a city have access to enough green space? An increasing part of the world’s population lives in urban areas, where contact with nature is largely reduced to street trees and parks. As optional tree planting sites and financial resources are limited, determining the best planting site can be formulated as an optimization problem with constraints. Can we locate these sites based on the popularity of nearby venues? How can we ensure that we include groups of people who tend to spend time in tree deprived areas?Currently, tree location sites are chosen based on criteria from spatial-visual, physical and biological, and functional categories. As these criteria do not give any insights into which citizens are benefiting from the tree placement, we propose new data-driven tree planting policies that take socio-cultural aspects as represented by the citizens’ behavior into account. We combine a Location Based Social Network (LBSN) mobility data set with tree location data sets, both of New York City and Paris, as a case study. The effect of four different policies is evaluated on simulated movement data and assessed on the average, overall exposure to trees as well as on how much inequality in tree exposure is mitigated. Show less
Sa, N.C. de; Baratchi, M.; Hauser, L.T.; Bodegom, P. van 2021
Remote sensing (RS) of biophysical variables plays a vital role in providing the information necessary for understanding spatio-temporal dynamics in ecosystems. The hybrid approach to retrieve... Show moreRemote sensing (RS) of biophysical variables plays a vital role in providing the information necessary for understanding spatio-temporal dynamics in ecosystems. The hybrid approach to retrieve biophysical variables from RS by combining Machine Learning (ML) algorithms with surrogate data generated by Radiative Transfer Models (RTM). The susceptibility of the ill-posed solutions to noise currently constrains further application of hybrid approaches. Here, we explored how noise affects the performance of ML algorithms for biophysical trait retrieval. We focused on synthetic Sentinel-2 (S2) data generated using the PROSAIL RTM and four commonly applied ML algorithms: Gaussian Processes (GPR), Random Forests (RFR), and Artificial Neural Networks (ANN) and Multi-task Neural Networks (MTN). After identifying which biophysical variables can be retrieved from S2 using a Global Sensitivity Analysis, we evaluated the performance loss of each algorithm using the Mean Absolute Percentage Error (MAPE) with increasing noise levels. We found that, for S2 data, Carotenoid concentrations are uniquely dependent on band 2, Chlorophyll is almost exclusively dependent on the visible ranges, and Leaf Area Index, water, and dry matter contents are mostly dependent on infrared bands. Without added noise, GPR was the best algorithm (<0.05%), followed by the MTN (<3%) and ANN (<5%), with the RFR performing very poorly (<50%). The addition of noise critically affected the performance of all algorithms (>20%) even at low levels of added noise (approximate to 5%). Overall, both neural networks performed significantly better than GPR and RFR when noise was added with the MTN being slightly better when compared to the ANN. Our results imply that the performance of the commonly used algorithms in hybrid-RTM inversion are pervasively sensitive to noise. The implication is that more advanced models or approaches are necessary to minimize the impact of noise to improve near real-time and accurate RS monitoring of biophysical trait retrieval. Show less
Arp, L.; Vreumingen, D. van; Gawehns, D.; Baratchi, M. 2020
Urban movement data as collected by location-based social networks provides valuable information about routes and specific roads that people are likely to drive on. This allows us to pinpoint roads... Show moreUrban movement data as collected by location-based social networks provides valuable information about routes and specific roads that people are likely to drive on. This allows us to pinpoint roads that occur in many routes and are thus sensitive to congestion. Redistributing some of the traffic to avoid unnecessary use of these roads could be a key factor in improving traffic flow. Many of the previously proposed approaches to combat congestion are either static (e.g. a city tax) or do not incorporate any movement data and hence ignore how citizens use the infrastructure. In this work, we present a method to redistribute traffic through the introduction of externally imposed variable costs to each road segment, assuming that all drivers seek to drive the cheapest route. We propose using a metaheuristic optimisation approach to minimise total travel times by optimising a set of road-specific variable cost parameters, which are used as input for an objective function based on Greenshields traffic flow theory. We evaluate the performance of this approach within the context of a case study on the city centre of Tokyo. An optimisation scenario was defined for this city using public spatial road network data, and movement data acquired from Foursquare. Experimental results on this case study show that, depending on the amount of cars on the road network, our proposed method has the potential to achieve an improvement between 1.35% (437 hours for 112,985 drivers) and 13.15% (925 hours for 31,584 drivers) of total travel time, compared to that of a currently operational road network configuration with no imposed variable costs. Show less
Chilipirea, C.; Baratchi, M.; Dobre, C.; Steen, M. van 2018