This dissertation centers around the question whether syntactic differences between languages can be detected automatically, and if so, how. With the enormous number of natural languages and... Show moreThis dissertation centers around the question whether syntactic differences between languages can be detected automatically, and if so, how. With the enormous number of natural languages and dialects, the very high level of variation they exhibit between one another, and the technically infinite number of possible sentences per language or dialect, systematic manual comparison is a hugely daunting task. The field would therefore significantly benefit from the (partial) automatization of the process, as it would increase the scale, speed, systematicity and reproducibility of research.Over the course of five chapters it is shown through case studies involving English, Dutch, German, Czech and Hungarian that correct hypotheses on syntactic differences between languages can be generated automatically from parallel corpora through the use of the minimum description length principle, counting mismatches between part-of-speech pattern occurrences, word alignment and mapping annotation from an annotated language onto another unannotated language. The tools developed for the purposes of this research work well and can aid a linguist significantly in their search for differences or similarities, but do not replace the human researcher. Show less
In this paper, we capture the crosslinguistic variation in Bantu nominal structure in a unified analysis of gender on n (Kramer 2014, 2015). We demonstrate that this analysis accounts for the... Show moreIn this paper, we capture the crosslinguistic variation in Bantu nominal structure in a unified analysis of gender on n (Kramer 2014, 2015). We demonstrate that this analysis accounts for the morphosyntactic properties of basic nouns as well as locative and diminutive derivations. Moreover, it allows us to capture intra- and inter-language morphosyntactic variation by reference to just three parameters – one strictly morphological and two structural. The presence of one or two n heads, and the size of the complement distinguish between different types of locatives (structural variation); the presence or absence of a spell-out rule of adjacent n heads differentiates “stacking” versus “non-stacking” prefixes in diminutive and augmentative derivations (morphological variation only). Show less
Kroon, M.S.; Barbiers, L.C.J.; Odijk, J., Pas, S.L. van der 2020
In this paper we present a systematic approach to detect and rank hypotheses about possible syntactic differences for further investigation by leveraging parallel data and using the Minimum... Show moreIn this paper we present a systematic approach to detect and rank hypotheses about possible syntactic differences for further investigation by leveraging parallel data and using the Minimum Description Length (MDL) principle. We deploy the SQS-algorithm (‘Summarising event seQuenceS’; Tatti and Vreeken 2012) – an MDL-based algorithm – to mine ‘typical’ sequences of Part of Speech (POS) tags for each language under investigation. We create a shortlist of potential syntactic differences based on the number of parallel sentences with a mismatch in pattern occurrence. We applied our method to parallel corpora of English, Dutch and Czech sentences from the Europarl v7 corpus (Koehn 2005). The approach proved useful in both retrieving POS building blocks of a language as well as pointing to meaningful syntactic differences between languages. Despite a clear sensitivity to tagging accuracy, our results and approach are promising. Show less
The Bantu languages show much variation in object marking, two parameters being (1) their behaviour in ditransitives (symmetric or asymmetric) and (2) the number of object markers allowed (single... Show moreThe Bantu languages show much variation in object marking, two parameters being (1) their behaviour in ditransitives (symmetric or asymmetric) and (2) the number of object markers allowed (single or multiple). This paper reveals that a combination of these parameter settings in a sample of 50+ Bantu languages results in an almost-gap, the AWSOM correlation: “asymmetry wants single object marking”. A Minimalist featural analysis is presented of Bantu object marking as agreement with a defective goal (van der Wal 2015) and parametric variation in the distribution of 𝜙 features on low functional heads (e.g. Appl) accounts for both the AWSOM and Sambaa as the one exception to the AWSOM. Show less