In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the... Show moreIn this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum’s search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread–query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result. Show less
O'Donnell, M.; Nelson, L.D.; Ackermann, E.; Aczel, B.; Akhtar, A.; Aldrovandi, S.; ... ; Zrubka, M. 2018
Dijksterhuis and van Knippenberg (1998) reported that participants primed with a category associated with intelligence (“professor”) subsequently performed 13% better on a trivia test than... Show moreDijksterhuis and van Knippenberg (1998) reported that participants primed with a category associated with intelligence (“professor”) subsequently performed 13% better on a trivia test than participants primed with a category associated with a lack of intelligence (“soccer hooligans”). In two unpublished replications of this study designed to verify the appropriate testing procedures, Dijksterhuis, van Knippenberg, and Holland observed a smaller difference between conditions (2%–3%) as well as a gender difference: Men showed the effect (9.3% and 7.6%), but women did not (0.3% and −0.3%). The procedure used in those replications served as the basis for this multilab Registered Replication Report. A total of 40 laboratories collected data for this project, and 23 of these laboratories met all inclusion criteria. Here we report the meta-analytic results for those 23 direct replications (total N = 4,493), which tested whether performance on a 30-item general-knowledge trivia task differed between these two priming conditions (results of supplementary analyses of the data from all 40 labs, N = 6,454, are also reported). We observed no overall difference in trivia performance between participants primed with the “professor” category and those primed with the “hooligan” category (0.14%) and no moderation by gender. Show less
Verberne, S.; Krahmer, E.; Hendrickx, I.; Wubben, S.; Bosch, A. van den 2018
{ extcopyright} 2017 The Author(s) In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human... Show more{ extcopyright} 2017 The Author(s) In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model's summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset. Show less
Eenbergen, M.C. van; Poll-Franse, L.V. van de; Krahmer, E.; Verberne, S.; Mols, F. 2018