To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this... Show moreTo what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs’ robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs. Show less
Tseng, R.; Verberne, S.; Putten, P.W.H. van der 2023
Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary... Show moreSign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand's arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80\% and 71\% accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90\% at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size. Show less
Mason, C.; Putten, P.W.H. van der; Duijn, M. van 2020
Computer simulations have been used to model psychological and sociological phenomena in order to provide insight into how they affect human behavior and population-wide systems. In this study,... Show moreComputer simulations have been used to model psychological and sociological phenomena in order to provide insight into how they affect human behavior and population-wide systems. In this study, three agent-based simulations (ABSs) were developed to model opinion dynamics in an online social media context. The main focus was to test the effects of ‘social identity’ and ‘certainty’ on social influence. When humans interact, they influence each other’s opinions and behavior. It was hypothesized that the influence of other agents based on ingroup/outgroup perceptions can lead to extremism and polarization under conditions of uncertainty. The first two simulations isolated social identity and certainty respectively to see how social influence would shape the attitude formation of the agents, and the opinion distribution by extension. Problems with previous models were remedied to some extent, but not fully resolved. The third combined the two to see if the limitations of both designs would be ameliorated with added complexity. The combination proved to be moderating, and while stable opinion clusters form, extremism and polarization do not develop in the system without added forces. Show less
Fragkiadakis, M.; Nyst, V.A.S.; Putten, P.W.H. van der 2020
This study presents a new methodology to search sign language lexica, using a full sign as input for a query. Thus, a dictionary user can look up information about a sign by signing the sign to a... Show moreThis study presents a new methodology to search sign language lexica, using a full sign as input for a query. Thus, a dictionary user can look up information about a sign by signing the sign to a webcam. The recorded sign is then compared to potential matching signs in the lexicon. As such, it provides a new way of searching sign language dictionaries to complement existing methods based on (spoken language) glosses or phonological features, like handshape or location. The method utilizes OpenPose to extract the body and finger joint positions. Dynamic Time Warping (DTW) is used to quantify the variation of the trajectory of the dominant hand and the average trajectories of the fingers. Ten people with various degrees of sign language proficiency have participated in this study. Each subject viewed a set of 20 signs from the newly compiled Ghanaian sign language lexicon and was asked to replicate the signs. The results show that DTW can predict the matching sign with 87% and 74% accuracy at the Top-10 and Top-5 ranking level respectively by using only the trajectory of the dominant hand. Additionally, more proficient signers obtain 90% accuracy at the Top-10 ranking. The methodology has the potential to be used also as a variation measurement tool to quantify the difference in signing between different signers or sign languages in general. Show less
Hees, M. van; Putten, P.W.H. van der; Lamers, M.H. 2018
Heavy Metal is a popular sub culture, and in itself is highly tribalized, which makes it an interesting domain to research how cultures and sub cultures relate and evolve. To study this, we scrape... Show moreHeavy Metal is a popular sub culture, and in itself is highly tribalized, which makes it an interesting domain to research how cultures and sub cultures relate and evolve. To study this, we scrape the Encyclopaedia Metallum heavy metal music archive website to generate a large scale networked data set. Bands are linked through shared musicians, and each band can be labelled with multiple user contributed genres. By applying Word2Vec on genre co-occurences, and hierarchical network clustering on the band collaboration graph, we gain insight into how music genres relate to each other. While the Word2Vec results show some interesting patterns with regards to the observed clusters, the hierarchical clustering proves to be more inconclusive, partially caused by factors beyond genre that generate the network. From a machine learning point of view, this case is an instance of the more general problem of understanding label structure in networked data. Show less
This paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to... Show moreThis paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to take a machine learning approach. It is based on Moral Foundations Theory, a framework of moral values that are assumed to be universal. Carefully handcrafted keyword dictionaries for Moral Foundations Theory exist, but experiments demonstrate that models that do not leverage these have similar or superior performance, thus proving the value of a more pure machine learning approach. Show less