Documents
-
- Download
- out
- Accepted Manuscript
- open access
- Full text at publishers site
In Collections
This item can be found in the following collections:
Experimental evaluation of train and test split strategies in link prediction
In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.
Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that...
Show moreIn link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.
Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks.
Show less- All authors
- Bruin, G.J. de; Veenman, C.J.; Herik, H.J. van den; Takes, F.W.
- Editor(s)
- Benito, R.M.; Cherifi, C.; Cherifi, H.; Moro, E.; Rocha, L.M.; Sales-Pardo, M.
- Date
- 2021-01-05
- Title of host publication
- Complex networks & their applications IX
- Pages
- 79 - 91
- ISBN (print)
- 9783030653507
- ISBN (electronic)
- 9783030653514
Publication Series
- Name
- 944
Conference
- Conference
- Complex Networks
- Date
- 2020-12-01 - 2020-12-03
- Location
- Madrid, Spain