In current practice, when dating the root of a Bayesian language phylogeny the researcher is required to supply some of the information beforehand, including a distribution of root ages and dates... Show moreIn current practice, when dating the root of a Bayesian language phylogeny the researcher is required to supply some of the information beforehand, including a distribution of root ages and dates for some nodes serving as calibration points. In addition to the potential subjectivity that this leaves room for, the problem arises that for many of the language families of the world there are no available internal calibration points. Here we address the following questions: Can a new Bayesian framework which overcomes these problems be introduced and how well does it perform? The new framework that we present is generalized in thesense that no family-specific priors or calibration points are needed. We moreover introduce a way to overcome another potential source of subjectivity in Bayesian tree inference as commonly practiced, namely that of manual cognate identification; instead, we apply an automated approach. Dates are obtained by fitting a Gamma regression model to tree lengths and known time depths for 30 phylogenetically independent calibration points. This model is used to predict the time depths of both the root and the internal nodes for 116 language families, producing a total of 1,287 dates for families and subgroups. It turns out thatresults are similar to those of published Bayesian studies of individual language families. The performance of the method is compared to automated glottochronology, which is an update of the classical method of Swadesh drawing upon automated cognate recognition and a new formula for deriving a time depth from percentages of shared cognates. It is also compared to a third dating method, that of the Automated Similarity Judgment Program (ASJP). In terms of errors and correlations with known dates, ASJP works better than thenew method and both work better than automated glottochronology. Show less
This paper explores the application of quantitative methods to study the effect of various factors on phonetic word duration in ten languages. Data on most of these languages were collected in... Show moreThis paper explores the application of quantitative methods to study the effect of various factors on phonetic word duration in ten languages. Data on most of these languages were collected in fieldwork aiming at documenting spontaneous speech in mostly endangered languages, to be used for multiple purposes, including the preservation of cultural heritage and community work. Here we show the feasibility of studying processes of online acceleration and deceleration of speech across languages using such data, which have not been considered for this purpose before. Our results show that it is possible to detect a consistent effect of higher frequency of words leading to faster articulation even in the relatively small language documentation corpora used here. We also show that nouns tend to be pronounced more slowly than verbs when controlling for other factors. Comparison of the effects of these and other factors shows that some of them are difficultto capture with the current data and methods, including potential effects of crosslinguistic differences in morphological complexity. In general, this paper argues for widening the cross-linguistic scope of phonetic and psycholinguistic research by including the wealth of language documentation data that has recently become available. Show less
The terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agreethat it is impossible to apply a non-arbitrary distinction such that two speech varieties can beidentified... Show moreThe terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agreethat it is impossible to apply a non-arbitrary distinction such that two speech varieties can beidentified as either distinct languages or two dialects of one and the same language. A databaseof lexical information for more than 7,500 speech varieties, however, unveils a strong tendencyfor linguistic distances to be bimodally distributed. For a given language group the linguisticdistances pertaining to either cluster can be teased apart, identifying a mixture of normaldistributions within the data and then separating them fitting curves and finding the point wherethey cross. The thresholds identified are remarkably consistent across data sets, qualifying theirmean as a universal criterion for distinguishing between language and dialect pairs. The meanof the thresholds identified translates into a temporal distance of around one to one-and-a-halfmillennia (1,075–1,635 years). Show less
In 2004, Lacadena and Wichmann proposed a set of orthographic rules for the Maya script.The choice of using one of three different patterns of syn- or disharmonic spellings allowed Mayascribes to... Show moreIn 2004, Lacadena and Wichmann proposed a set of orthographic rules for the Maya script.The choice of using one of three different patterns of syn- or disharmonic spellings allowed Mayascribes to signal whether word-final syllables contained a short vowel, a long vowel or a glottal stop. Inour earlier paper we focused on the lexical evidence for these orthographic «harmony rules». Althoughit was stated that the rules apply equally well when a suffix is involved and when no suffix is involved,the data relating to the former situation were not discussed in detail. This is the aim of the present paper. Show less
In this paper we first test whether there is statistical support for a transitivity hierarchy viewed as an implicational hierarchy. To that end we construct data-driven transitivity hierarchies of... Show moreIn this paper we first test whether there is statistical support for a transitivity hierarchy viewed as an implicational hierarchy. To that end we construct data-driven transitivity hierarchies of two-place verb meanings based on the Valency Patterns Leipzig (ValPaL) database using Guttman scaling. We look at how well the hierarchies conform to strict scalarity (one-dimensionality) and, through matrix randomization, test whether their strengths are significant. We then go on to construct slightly different hierarchies based on simple counts of instances of two-participant coding frames for a given verb meaning across languages, rather than through the Guttman scaling procedure, which yields less resolution and is not designed for missing data. Finally, we assess whether the members of the hierarchies fall into semantic verb classes. The concluding section summarizes the results. Show less
By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant-speakers regularly speed up and slow down. Variation in speech rate is... Show moreBy force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant-speakers regularly speed up and slow down. Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or unexpected information; otherwise, they have to be replaced by pronouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another. Show less
It is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have... Show moreIt is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have been documented in linguistics, cognitive science, and anthropology, but these studies only involved small subsets of the 6,000+ languages spoken in the world today. By analyzing word lists covering nearly two-thirds of the world’s languages, we demonstrate that a considerable proportion of 100 basic vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages (linguistic families or isolates). Prominently among these relations, we find property words (“small” and i, “full” and p or b) and body part terms (“tongue” and l, “nose” and n). The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed. Our results therefore have important implications for the language sciences, given that nonarbitrary associations have been proposed to play a critical role in the emergence of cross-modal mappings, the acquisition of language, and the evolution of our species’ unique communication system. Show less