To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this... Show moreTo what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs’ robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs. Show less
Tseng, R.; Verberne, S.; Putten, P.W.H. van der 2023
Conversational artificial agents and artificially intelligent (AI) voice assistants are becoming increasingly popular. Digital virtual assistants such as Siri, or conversational devices such as... Show moreConversational artificial agents and artificially intelligent (AI) voice assistants are becoming increasingly popular. Digital virtual assistants such as Siri, or conversational devices such as Amazon Echo or Google Home are permeating everyday life, and are designed to be more and more humanlike in their speech. This study investigates the effect this can have on one’s conformity with an AI assistant. In the 1950s, Solomon Asch’s already demonstrated the power and danger of conformity amongst people. In these classical experiments test persons were asked to answer relatively simple questions, whilst others pretending to be participants tried to convince the test person to give wrong answers. These studies were later replicated with embodied robots, but these physical robots are still rare. In light of our increasing reliance on AI assistants, this study investigates to what extent an individual will conform to a disembodied virtual assistant. We also investigate if there is a difference between a group that interacts with an assistant that communicates through text, one that has a robotic voice and one that has a humanlike voice. The assistant attempts to subtly influence participants’ final responses in a general knowledge quiz, and we measure how often participants change their answer after having been given advice. Results show that participants conformed significantly more often to the assistant with a human voice than the one that communicated through text. Show less
Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary... Show moreSign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand's arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80\% and 71\% accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90\% at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size. Show less
Fragkiadakis, M.; Nyst, V.A.S.; Putten, P.W.H. van der 2021
The annotation process of sign language corpora in terms of glosses, is a highly labor-intensive task, but a condition for a reliable quantitative analysis. During the annotation process the... Show moreThe annotation process of sign language corpora in terms of glosses, is a highly labor-intensive task, but a condition for a reliable quantitative analysis. During the annotation process the researcher typically defines the precise time slot in which a sign occurs and then enters the appropriate gloss for the sign. The aim of this project is to develop a set of tools to assist the annotation of the signs and their formal features in a video irrespectively of its content and quality. Recent advances in the field of deep learning have led to the development of accurate and fast pose estimation frameworks. In this study, such a framework (namely OpenPose) has been used to develop three different methods and tools to facilitate the annotation process. The first tool estimates the span of a sign sequence and creates empty slots in an annotation file. The second tool detects whether a sign is one- or two-handed. The last tool recognizes the different handshapes presented in a video sample. All tools can be easily re-trained to fit the needs of the researcher. Show less
Siebelt, M.; Das, D.; Moosdijk, A. van den; Warren, T.; Putten, P.W.H. van der; Weegen, W. van der 2021
Background and purpose — Machine learning (ML) techniques are a form of artificial intelligence able to analyze big data. Analyzing the outcome of (digital) questionnaires, ML might recognize... Show moreBackground and purpose — Machine learning (ML) techniques are a form of artificial intelligence able to analyze big data. Analyzing the outcome of (digital) questionnaires, ML might recognize different patterns in answers that might relate to different types of pathology. With this study, we investigated the proof-of-principle of ML-based diagnosis in patients with hip complaints using a digital questionnaire and the Kellgren and Lawrence (KL) osteoarthritis score.Patients and methods — 548 patients (> 55 years old) scheduled for consultation of hip complaints were asked to participate in this study and fill in an online questionnaire. Our questionnaire consists of 27 questions related to general history-taking and validated patient-related outcome measures (Oxford Hip Score and a Numeric Rating Scale for pain). 336 fully completed questionnaires were related to their classified diagnosis (either hip osteoarthritis, bursitis or tendinitis, or other pathology). Different AI techniques were used to relate questionnaire outcome and hip diagnoses. Resulting area under the curve (AUC) and classification accuracy (CA) are reported to identify the best scoring AI model. The accuracy of different ML models was compared using questionnaire outcome with and without radiologic KL scores for degree of osteoarthritis.Results — The most accurate ML model for diagnosis of patients with hip complaints was the Random Forest model (AUC 82%, 95% CI 0.78–0.86; CA 69%, CI 0.64–0.74) and most accurate analysis with addition of KL scores was with a Support Vector Machine model (AUC 89%, CI 0.86–0.92; CA 83%, CI 0.79–0.87).Interpretation — Analysis of self-reported online questionnaires related to hip complaints can differentiate between basic hip pathologies. The addition of radiological scores for osteoarthritis further improves these outcomes. Show less
Mason, C.; Putten, P.W.H. van der; Duijn, M. van 2020
Computer simulations have been used to model psychological and sociological phenomena in order to provide insight into how they affect human behavior and population-wide systems. In this study,... Show moreComputer simulations have been used to model psychological and sociological phenomena in order to provide insight into how they affect human behavior and population-wide systems. In this study, three agent-based simulations (ABSs) were developed to model opinion dynamics in an online social media context. The main focus was to test the effects of ‘social identity’ and ‘certainty’ on social influence. When humans interact, they influence each other’s opinions and behavior. It was hypothesized that the influence of other agents based on ingroup/outgroup perceptions can lead to extremism and polarization under conditions of uncertainty. The first two simulations isolated social identity and certainty respectively to see how social influence would shape the attitude formation of the agents, and the opinion distribution by extension. Problems with previous models were remedied to some extent, but not fully resolved. The third combined the two to see if the limitations of both designs would be ameliorated with added complexity. The combination proved to be moderating, and while stable opinion clusters form, extremism and polarization do not develop in the system without added forces. Show less
Fragkiadakis, M.; Nyst, V.A.S.; Putten, P.W.H. van der 2020
This study presents a new methodology to search sign language lexica, using a full sign as input for a query. Thus, a dictionary user can look up information about a sign by signing the sign to a... Show moreThis study presents a new methodology to search sign language lexica, using a full sign as input for a query. Thus, a dictionary user can look up information about a sign by signing the sign to a webcam. The recorded sign is then compared to potential matching signs in the lexicon. As such, it provides a new way of searching sign language dictionaries to complement existing methods based on (spoken language) glosses or phonological features, like handshape or location. The method utilizes OpenPose to extract the body and finger joint positions. Dynamic Time Warping (DTW) is used to quantify the variation of the trajectory of the dominant hand and the average trajectories of the fingers. Ten people with various degrees of sign language proficiency have participated in this study. Each subject viewed a set of 20 signs from the newly compiled Ghanaian sign language lexicon and was asked to replicate the signs. The results show that DTW can predict the matching sign with 87% and 74% accuracy at the Top-10 and Top-5 ranking level respectively by using only the trajectory of the dominant hand. Additionally, more proficient signers obtain 90% accuracy at the Top-10 ranking. The methodology has the potential to be used also as a variation measurement tool to quantify the difference in signing between different signers or sign languages in general. Show less
Hees, M. van; Putten, P.W.H. van der; Lamers, M.H. 2018
Heavy Metal is a popular sub culture, and in itself is highly tribalized, which makes it an interesting domain to research how cultures and sub cultures relate and evolve. To study this, we scrape... Show moreHeavy Metal is a popular sub culture, and in itself is highly tribalized, which makes it an interesting domain to research how cultures and sub cultures relate and evolve. To study this, we scrape the Encyclopaedia Metallum heavy metal music archive website to generate a large scale networked data set. Bands are linked through shared musicians, and each band can be labelled with multiple user contributed genres. By applying Word2Vec on genre co-occurences, and hierarchical network clustering on the band collaboration graph, we gain insight into how music genres relate to each other. While the Word2Vec results show some interesting patterns with regards to the observed clusters, the hierarchical clustering proves to be more inconclusive, partially caused by factors beyond genre that generate the network. From a machine learning point of view, this case is an instance of the more general problem of understanding label structure in networked data. Show less
This paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to... Show moreThis paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to take a machine learning approach. It is based on Moral Foundations Theory, a framework of moral values that are assumed to be universal. Carefully handcrafted keyword dictionaries for Moral Foundations Theory exist, but experiments demonstrate that models that do not leverage these have similar or superior performance, thus proving the value of a more pure machine learning approach. Show less
Data mining can be seen as a process, with modeling as the core step. However, other steps such as planning, data preparation, evaluation and deployment are of key importance for applications. This... Show moreData mining can be seen as a process, with modeling as the core step. However, other steps such as planning, data preparation, evaluation and deployment are of key importance for applications. This thesis studies data mining in the context of these other steps with the goal of improving data mining applicability. We introduce cases that provide an end to end overview and serve as motivating examples, and then focus on specific research topics. We discuss the problem of data mining across multiple sources, with data fusion as a potential solution. This is an interesting research topic, as it removes barriers for applications and data mining can be used to carry out the fusion. We then analyze a large scale experiment in real world data mining. We use the bias variance evaluation framework across all steps in the process to investigate the large spread in results for a data mining competition. We conclude with a study advocating model profiling for novel classifiers. Given that it is unlikely that a novel classifier outperforms all competing classifiers across all problems, it is more interesting to characterize on what problems it performs best and to what other algorithms its behavior is most similar. Show less