In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint... Show moreIn link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks. Show less
The goal of this paper is to learn the dynamics of truck co-driving behaviour. Understanding this behaviour is important because co-driving has a potential positive impact on the environment. In... Show moreThe goal of this paper is to learn the dynamics of truck co-driving behaviour. Understanding this behaviour is important because co-driving has a potential positive impact on the environment. In the so-called co-driving network, trucks are nodes while links indicate that two trucks frequently drive together. To understand the network’s dynamics, we use a link prediction approach employing a machine learning classifier. The features of the classifier can be categorized into spatio-temporal features, neighbourhood features, path features, and node features. The very different types of features allow us to understand the social processes underlying the co-driving behaviour. Our work is based on a spatio-temporal data not studied before. Data is collected from 18 million truck movements in the Netherlands. We find that co-driving behaviour is best described by using neighbourhood features, and to lesser extent by path and spatio-temporal features. Node features are deemed unimportant. Findings suggest that the dynamics of a truck co-driving network has clear social network effects. Show less
Numerical models of chemical transport have been used to simulate the complex processes involved in the formation and transport of air pollutants. Although these models can predict the... Show moreNumerical models of chemical transport have been used to simulate the complex processes involved in the formation and transport of air pollutants. Although these models can predict the spatiotemporal variability of a variety of chemical species, the accuracy of these models is often limited. Therefore, in the past two decades, data assimilation methods have been applied to use the available measurements for improving the forecast. Nowadays, machine learning techniques provide new opportunities for improving the air quality forecast. A case study on PM10 concentrations during a dust storm is performed. It is known that the PM10 concentrations are caused by multiple emission sources, e.g., dust from the desert and anthropogenic emissions. Accurate modeling of the PM10 concentration levels owing to the local anthropogenic emissions is essential for an adequate evaluation of the dust level. However, real-time measurement of local emissions is not possible, so no direct data is available. Actually, the lack of in-time emission inventories is one of the main reasons that current numerical chemical transport models cannot produce accurate anthropogenic PM10 simulations. Using machine learning techniques to generate local emissions based on past observations is a promising approach. We report how it can be combined with data assimilation to improve the accuracy of air quality forecast considerably. Show less