This study examines a single English language usage problem, number agreement in the variants of the species noun phrase ; e.g. 'these kinds of errors' vs. 'these kind of errors' vs. 'errors of... Show moreThis study examines a single English language usage problem, number agreement in the variants of the species noun phrase ; e.g. 'these kinds of errors' vs. 'these kind of errors' vs. 'errors of this kind' from three perspectives: those of linguists, prescriptivists and the general public. The study, framed by the descriptions given in modern reference grammars and theoretical analyses (the linguists), is conducted within the historical perspective of the advice given in usage guides published between 1770 and 2010 and beyond (the prescriptivists). The general public is represented in an online attitude survey of the variant forms, and by an analysis of a corpus of un-copy-edited academic writing that was compiled specifically for this study. The main findings are (i) that there is a great deal of harmony between the views of the three groups studied, and that, on the basis of these analyses, the popular view of 'descriptive' linguists in conflict with 'prescriptive' usage guides is not justified; and (ii) that the innovative use of multiple contextualised examples in the attitude survey contributes to the suggestion of - a cline of acceptability - on the part of the general public, rather than a simple - acceptable- vs. - unacceptable stance- . Show less
This paper explores how linguistic data annotation can be made (semi-)automatic by means of machine learning. More specifically, we focus on the use of “contextualized word embeddings” (i.e.... Show moreThis paper explores how linguistic data annotation can be made (semi-)automatic by means of machine learning. More specifically, we focus on the use of “contextualized word embeddings” (i.e. vectorized representations of the meaning of word tokens based on the sentential context in which they appear) extracted by large language models (LLMs). In three example case studies, we assess how the contextualized embeddings generated by LLMs can be combined with different machine learning approaches to serve as a flexible, adaptable semi-automated data annotation tool for corpus linguists. Subsequently, to evaluate which approach is most reliable across the different case studies, we use a Bayesian framework for model comparison, which estimates the probability that the performance of a given classification approach is stronger than that of an alternative approach. Our results indicate that combining contextualized word embeddings with metric fine-tuning yield highly accurate automatic annotations. Show less
Over the centuries, the French language has had a lot of influence on the Dutch language. Thousands of words from French entered Dutch and apart from that, Dutch has borrowed morphological... Show moreOver the centuries, the French language has had a lot of influence on the Dutch language. Thousands of words from French entered Dutch and apart from that, Dutch has borrowed morphological elements such as suffixes from French. Moreover, it is assumed that the popularity of certain Dutch morphosyntactic constructions can be attributed to language contact with French. Despite the fact that histories of Dutch often speak of so-called ‘Frenchification’ because of these French influences, hardly any empirical research has been carried out so far on the actual influence of French on Dutch. The aim of this thesis is to provide insight into the influence that French had on the Dutch language between 1500 and 1900. This is done by means of corpus analyses with the diachronic Language of Leiden corpus, which comprises texts from Leiden from different social domains. The corpus analyses aim to trace the language changes in Dutch as a consequence of language contact with French on three language levels: lexicon, morphology, and morphosyntax. In this way, this thesis aims to contribute to a better understanding of the historical language contact between Dutch and French. Show less
In linguistic research, present-day Dutch has been characterized as a pluricentric language, meaning that there are multiple centers from where language norms spread. Within the Dutch language area... Show moreIn linguistic research, present-day Dutch has been characterized as a pluricentric language, meaning that there are multiple centers from where language norms spread. Within the Dutch language area, we can discern a center in the Northern Netherlands (the Randstad area) and the Southern Netherlands (around the province of Brabant). Traditional histories of the language suggest that pluricentricity for Dutch is a relatively recent phenomenon, dating back to the beginning of the 20th century. However, based on findings from empirical historical-linguistic research, we could expect to situate pluricentricity at least 100 years earlier in time. This dissertation therefore provides an in-depth study in which pluricentricity is put into a broader historical perspective.Through systematic corpus analyses, this dissertation aims to assess the usefulness of the modern concept of pluricentricity in Dutch language history. A total of six linguistic features is examined in the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus, involving central and peripheral regions in both the North and the South. Moreover, by integrating Northern and Southern varieties of Dutch in the study, and by mapping the interactions between the different regions, we want to lay the foundation for an integrated history of Dutch. Show less
Conditionals, or if-then sentences, form a crucial ingredient of everyday reasoning and argumentation, as they enable us to express our thoughts about possible states of the world. They are used in... Show moreConditionals, or if-then sentences, form a crucial ingredient of everyday reasoning and argumentation, as they enable us to express our thoughts about possible states of the world. They are used in very different ways, and the main aim of this dissertation is to investigate to what extent these different uses of conditionals are connected to one another and to their grammatical features.The first part of this dissertation presents an analysis of conditionals in terms of implicatures of 'unassertiveness' and 'connectedness'. Insights from semantics, pragmatics, cognitive linguistics, and neighbouring fields are combined. In the second part, the analysis is tested on a corpus of spoken and written Dutch discourse. To investigate the relation between the meaning and grammar of conditionals, several cluster analyses are conducted. The results show that grammatical features such as verb tense and modal marking do not, or only weakly, license generalised implicatures of unassertiveness and connectedness. This outcome sheds light on difficulties in applying general categories of conditionals to language use data, and it suggests that the fundamentals of categorising conditional constructions need revision.The dissertation shows the benefits of combining semantic and pragmatic analyses of conditionals. It provides an extensive discussion of classifications of conditionals, an overview of the grammatical features of Dutch conditionals, and it presents cluster analyses using state-of-the-art machine-learning techniques. The study should therefore be of interest to anyone concerned with the syntax, semantics, and pragmatics of conditionals, and to anyone working on Dutch grammar, corpus linguistics, theories of argumentation, and the interface between semantics and pragmatics. Show less
Computer-assisted corpus linguistics is one of the main points of convergence between linguistic and computational methods. In particular, the use of diachronic linguistic corpora provides... Show moreComputer-assisted corpus linguistics is one of the main points of convergence between linguistic and computational methods. In particular, the use of diachronic linguistic corpora provides opportunities for the quantitative analysis of phenomena concerning language change through time. This dissertation offers contributions to three of the stages of the research involving diachronic corpora: (a) corpus building and compilation; (b) designing of tools and algorithms for data exploration; and (c) data analysis for linguistic, cultural and historical research. Two resources are first presented: a Web scraper of comments from news portals; and a diachronic corpus composed of comments published in a news website. Then, I propose a generalizable method to assist in the identification of periods of establishment and obsolescence of linguistic items in a diachronic corpus based on the frequency of these items in the corpus. This method may be employed for the analysis of any collection of linguistic items, regardless of language or historical period. Finally, I describe how diachronic corpora might be used for quantitative linguistic investigation by proposing a framework centered on the investigation of vocabulary through a diachronic approach, and demonstrate its applicability by analyzing the use of the term 'fake news' in the media. Show less
This dissertation provides new insights into language variation and change in late eighteenth- and early nineteenth-century Dutch. More specifically, it investigates whether and to what extent... Show moreThis dissertation provides new insights into language variation and change in late eighteenth- and early nineteenth-century Dutch. More specifically, it investigates whether and to what extent official language policy measures exerted influence on actual language practice.During the nation-building period around 1800, the Northern Netherlands witnessed the introduction of a national language policy, which aimed at the spread of a homogeneous written standard variety of Dutch, symbolising 'the' nation. In concrete terms, these top-down endeavours resulted in the first official codification of the Dutch orthography (Siegenbeek 1804) and grammar (Weiland 1805). Despite marking a decisive turning point in the standardisation history of Dutch, the effectiveness of the so-called schrijftaalregeling 'written language regulation' has never been investigated empirically.Taking a historical-sociolinguistic approach, this dissertation aims to fill this research gap by examining the impact of language policy on patterns of variation and change. How successful was the schrijftaalregeling in disseminating the officialised norms across the population at large, as envisaged by the government? Making use of the newly compiled Going Dutch Corpus, a diachronic multi-genre corpus comprising more than 420,000 words of authentic usage data (private letters, diaries and travelogues, newspapers), a wide range of orthographic and morphosyntactic features is analysed. Show less
How did common people write in the late eighteenth century? Little is yet known on this topic, since our knowledge is mainly based on printed texts written by a small part of the (male) elite... Show moreHow did common people write in the late eighteenth century? Little is yet known on this topic, since our knowledge is mainly based on printed texts written by a small part of the (male) elite population. This dissertation __ written from a sociolinguistic point of view __ gives us new insights into late-eighteenth-century language use. For this purpose a large number of Dutch private letters has been used. These letters were captured by the English in times of warfare between the Dutch and the English and are still preserved at the National Archives in Kew (London). The research is based on a selection of approximately 400 letters, written between 1776 and 1784 by Dutch male and female letter writers from all social ranks. This study into late-eighteenth-century language variation can be regarded as a first broad exploration of this valuable material. Therefore various linguistic phenomena have been examined: forms of address, negation, reflexivity and reciprocity, schwa-apocope, deletion of final -n, diminutives and the genitive and alternative constructions. The case studies clearly establish more variety in eighteenth-century written language than previous studies suggested. Almost every linguistic feature under discussion appears to show social variation, and gender and social class, in particular, are influential factors. Show less
This study explores the role of linguistic data in the reconstruction of Dolgan (pre)history. While most ethno-linguistic groups have a longstanding history and a clear ethnic and linguistic... Show moreThis study explores the role of linguistic data in the reconstruction of Dolgan (pre)history. While most ethno-linguistic groups have a longstanding history and a clear ethnic and linguistic affiliation, the formation of the Dolgans has been a relatively recent development, and their ethnic origins as well as their linguistic affiliation have been a matter of debate. According to some scholars, the Dolgans, who inhabit the Taimyr Peninsula and the Anabar district of the Republic of Sakha (Yakutia), are Turkic people who adopted a Tungusic name and certain Tungusic cultural features. Others hold the view that they have Tungusic origins but shifted to a Turkic language. Migrations and frequent contacts with other ethnic groups complicate a reconstruction of their past. Accepting the idea that contact settings may correlate with linguistic outcomes, contact-induced changes in Dolgan are analysed and used to infer information about the nature of the contact settings in which they occurred. The linguistic conclusions are interpreted in a multidisciplinary context, integrating insights from history, ethnography as well as from population genetics. In particular, linguistic patterns of contact influence are correlated with genetic admixture patterns, providing new insights into the prehistoric migration patterns of the Dolgans. Due to its holistic approach, this study provides an example of the innovative ways in which data from different disciplines can be combined to gain a deeper understanding of a people__s past and identity, and provides a valuable contribution to the investigation of Siberian history. Show less
In the National Archives in Kew, London, a treasure is kept which is of great importance for the history of the Dutch language: a collection of seventeenth-century letters written by men and women... Show moreIn the National Archives in Kew, London, a treasure is kept which is of great importance for the history of the Dutch language: a collection of seventeenth-century letters written by men and women from various social backgrounds. Given the fact that much of the linguistic research of seventeenth-century Dutch has been perforce based on printed texts and linguistic data produced by a relatively small number of upper-class __ usually male __ writers, not much is known with certainty about the everyday Dutch of seventeenth-century lower- and middle-class people. The letters hidden in the National Archives can change this. In this dissertation, a corpus of 595 letters written between 1664 and 1672 is examined from a sociolinguistic perspective. The topics treated are: forms of address, reflexivity and reciprocity, negation, schwa-apocope, diminutives, and the genitive and alternative constructions. The case studies show that there was still a lot of variation in seventeenth-century Dutch and that some linguistic changes had not progressed as far in the everyday Dutch of __ordinary__ people as previous research has suggested. Furthermore, it is shown that gender and social class are important factors of influence on the seventeenth-century language use, especially when interpreted in terms of education and writing experience. Show less
Words may have multiple interpretations. Generally, native speakers do not perceive this as a problem, because the context provides enough clues as to what is meant. For non-native speakers and... Show moreWords may have multiple interpretations. Generally, native speakers do not perceive this as a problem, because the context provides enough clues as to what is meant. For non-native speakers and students of dead languages, however, the existence of multiple interpretations sometimes does raise problems. This suggests that the context is not the only clue native speakers use to interpret words.In this dissertation, it is studied what types of context Dutch speakers need to interpret the poly-interpretable word ergens ‘somewhere/anywhere’, modal particle. The results of this investigation were used to find out more about the Ancient Greek form που ‘somewhere, anywhere’, modal particle.This thesis shows that the study of contextual cues that allow native speakers to interpret their language provides insights that may be used in the study of dead languages. The modal interpretations of ergens and που turned out to be quite different, but the context of both words clearly showed recurring (albeit different) patterns. Knowledge of the common interpretation of words in specific contexts seems crucial for their interpretation, suggesting that it is not words themselves that carry meaning, but words-in-context. Show less