Children are the focal point for studying the link between language and Theory of Mind (ToM) competence. Language and ToM are often studied with younger children and standardized tests, but as both... Show moreChildren are the focal point for studying the link between language and Theory of Mind (ToM) competence. Language and ToM are often studied with younger children and standardized tests, but as both are social competences, data and methods with higher ecological validity are critical.We leverage a corpus of 442 freely-told stories by Dutch children aged 4-12, recorded in their everyday classroom environments, to study language and ToM with NLP-tools. We labelled stories according to the mental depth of story characters children create, as a proxy for their ToM competence ‘in action’, and built a classifier with features encoding linguistic competences identified in existing work as predictive of ToM.We obtain good and fairly robust results (F1-macro = .71), relative to the complexity of the task for humans. Our results are explainable in that we link specific linguistic features such as lexical complexity and sentential complementation, that are relatively independent of children’s ages, to higher levels of character depth. This confirms and extends earlier work, as our study includes older children and socially embedded data from a different domain. Overall, our results support the idea that language and ToM are strongly interlinked, and that in narratives the former can scaffold the latter. Show less
Kroon, M.S.; Barbiers, L.C.J.; Odijk, J., Pas, S.L. van der 2020
In this paper we present a systematic approach to detect and rank hypotheses about possible syntactic differences for further investigation by leveraging parallel data and using the Minimum... Show moreIn this paper we present a systematic approach to detect and rank hypotheses about possible syntactic differences for further investigation by leveraging parallel data and using the Minimum Description Length (MDL) principle. We deploy the SQS-algorithm (‘Summarising event seQuenceS’; Tatti and Vreeken 2012) – an MDL-based algorithm – to mine ‘typical’ sequences of Part of Speech (POS) tags for each language under investigation. We create a shortlist of potential syntactic differences based on the number of parallel sentences with a mismatch in pattern occurrence. We applied our method to parallel corpora of English, Dutch and Czech sentences from the Europarl v7 corpus (Koehn 2005). The approach proved useful in both retrieving POS building blocks of a language as well as pointing to meaningful syntactic differences between languages. Despite a clear sensitivity to tagging accuracy, our results and approach are promising. Show less