The amount of archaeological literature is growing rapidly. Until recently, these data were only accessible through metadata search. We implemented a text retrieval engine for a large... Show moreThe amount of archaeological literature is growing rapidly. Until recently, these data were only accessible through metadata search. We implemented a text retrieval engine for a large archaeological text collection (~658 million words). In archaeological IR, domain-specific entities such as locations, time periods and artefacts play a central role. This motivated the development of a named entity recognition (NER) model to annotate the full collection with archaeological named entities. In this article, we present ArcheoBERTje, a BERT (Bidirectional Encoder Representations from Transformers) model pre-trained on Dutch archaeological texts. We compare the model’s quality and output on an NER task to a generic multilingual model and a generic Dutch model. We also investigate ensemble methods for combining multiple BERT models, and combining the best BERT model with a domain thesaurus using conditional random fields. We find that ArcheoBERTje outperforms both the multilingual and Dutch model significantly with a smaller standard deviation between runs, reaching an average F1 score of 0.735. The model also outperforms ensemble methods combining the three models. Combining ArcheoBERTje predictions and explicit domain knowledge from the thesaurus did not increase the F1 score. We quantitatively and qualitatively analyse the differences between the vocabulary and output of the BERT models on the full collection and provide some valuable insights in the effect of fine-tuning for specific domains. Our results indicate that for a highly specific text domain such as archaeology, further pre-training on domain-specific data increases the model’s quality on NER by a much larger margin than shown for other domains in the literature, and that domain-specific pre-training makes the addition of domain knowledge from a thesaurus unnecessary. Show less
Alcàntara-Rodríguez, M.; Francozo, M.; Andel, T.R. van 2021
The early colonial period witnessed new scales of connectivity and unprecedented projects of resource extraction across the Spanish Americas. Yet such transformations also drew heavily on... Show moreThe early colonial period witnessed new scales of connectivity and unprecedented projects of resource extraction across the Spanish Americas. Yet such transformations also drew heavily on preexisting Indigenous landscapes, technologies, and institutions. Drawing together recent discussions in archaeology and geography about mobility and resource materialities, this article takes the early colonial route as a central object of investigation and contributes to new emerging interpretive frameworks that make sense of Spanish colonialism in the Americas as a variable, large-scale, and materially constituted process. Using three case studies—the ruta de Colón on the island of Hispaniola, the routes connecting the southeastern Caribbean islands with mainland South America, and the ruta de la plata in the south-central Andes—we develop a comparative archaeological analysis that reveals divergent trajectories of persistence, appropriation, and erasure in the region's routes and regimes of extraction and mobility during the fifteenth and sixteenth centuries. Show less
Geertsma, I.P.; Françozo, M.; Andel, T. van; Alcantara Rodriguez, M. 2021
Comparing Apples and Pears: The Hidden Diversity of Central African Bush Mangoes (Irvingiaceae). The fruits of Irvingiaceae trees, commonly known as "bush mangoes" or "mangues sauvages," are... Show moreComparing Apples and Pears: The Hidden Diversity of Central African Bush Mangoes (Irvingiaceae). The fruits of Irvingiaceae trees, commonly known as "bush mangoes" or "mangues sauvages," are crucial foods for Central African human populations, as well as local wildlife. Irvingiaceae oil-rich kernels play an important role in local diet, well-being, and livelihood. When collected for sale, they enter the international market of non-timber forest products (NTFPs), which represents a considerable source of income for central African countries. Despite the importance of bush mangoes, there is a general lack of precision in the literature on the exact species of Irvingiaceae present in local diets and NTFP markets. Few botanical studies include local names and uses of the different Irvingiaceae species, while ethnographical and social studies rarely corroborate their identifications by collecting vouchers. In this study, we combined ethnographic research and botanical collection to verify which Irvingiaceae species were consumed and collected for trade by the Baka, a group of forager-horticulturalists in southeastern Cameroon. We provide evidence of the floristic diversity hidden behind the term "bush mangoes", as well as the knowledge and uses of Irvingiaceae fruits by the Baka. We discuss the importance of eight Irvingiaceae species for Baka livelihood as well as the potential threats regarding the future of these valuable trees. Show less
Alcantara Rodriguez, M.; Pombo Geerstma, I.; De Campos Françozo, M.; Andel, T.R. van 2020
This paper presents WODAN2.0, a workflow using Deep Learning for the automated detection of multiple archaeological object classes in LiDAR data from the Netherlands. WODAN2.0 is developed to... Show moreThis paper presents WODAN2.0, a workflow using Deep Learning for the automated detection of multiple archaeological object classes in LiDAR data from the Netherlands. WODAN2.0 is developed to rapidly and systematically map archaeology in large and complex datasets. To investigate its practical value, a large, random test dataset—next to a small, non-random dataset—was developed, which better represents the real-world situation of scarce archaeological objects in different types of complex terrain. To reduce the number of false positives caused by specific regions in the research area, a novel approach has been developed and implemented called Location-Based Ranking. Experiments show that WODAN2.0 has a performance of circa 70% for barrows and Celtic fields on the small, non-random testing dataset, while the performance on the large, random testing dataset is lower: circa 50% for barrows, circa 46% for Celtic fields, and circa 18% for charcoal kilns. The results show that the introduction of Location-Based Ranking and bagging leads to an improvement in performance varying between 17% and 35%. However, WODAN2.0 does not reach or exceed general human performance, when compared to the results of a citizen science project conducted in the same research area. Show less
Increasing deforestation affects tropical forests, threatening the livelihoods of local populations who subsist on forest resources. The disappearance of wild plants and animals and the increasing... Show moreIncreasing deforestation affects tropical forests, threatening the livelihoods of local populations who subsist on forest resources. The disappearance of wild plants and animals and the increasing influence of market economies affect local health, well-being, and diet. The impact of these changes on wild meat consumption has been well documented, but little attention has been given to wild edible plants, despite their importance as sources of calories and micronutrients. Furthermore, the relationships among food behavior strategies adopted by local populations, their psycho-cultural representations of food, and their food preferences have been poorly explored. In this study, we investigate food behaviors with an emphasis on the role of wild edible plants among a forager-horticulturalist society from the Congo Basin: the Baka. By combining an ethnobotanical survey with data from interviews (n = 536) related to food behaviors and representations of food, our data show that the Baka valorize both agricultural and marketable foods, and that wild plants represent a minor part of their diet, both in frequency and diversity. Finally, by examining how some wild edible plants have shifted from being eaten to being sold, we explore how market-oriented uses of wild edible plants may affect dietary behaviors and biocultural resilience. Show less
Increasing deforestation affects tropical forests, threatening the livelihoods of local populations who subsist on forest resources. The disappearance of wild plants and animals and the increasing... Show moreIncreasing deforestation affects tropical forests, threatening the livelihoods of local populations who subsist on forest resources. The disappearance of wild plants and animals and the increasing influence of market economies affect local health, well-being, and diet. The impact of these changes on wild meat consumption has been well documented, but little attention has been given to wild edible plants, despite their importance as sources of calories and micronutrients. Furthermore, the relationships among food behavior strategies adopted by local populations, their psycho-cultural representations of food, and their food preferences have been poorly explored. In this study, we investigate food behaviors with an emphasis on the role of wild edible plants among a forager-horticulturalist society from the Congo Basin: the Baka. By combining an ethnobotanical survey with data from interviews (n = 536) related to food behaviors and representations of food, our data show that the Baka valorize both agricultural and marketable foods, and that wild plants represent a minor part of their diet, both in frequency and diversity. Finally, by examining how some wild edible plants have shifted from being eaten to being sold, we explore how market-oriented uses of wild edible plants may affect dietary behaviors and biocultural resilience. Show less
Brandsen, A.; Verberne, S.; Lambers, K.; Wansleeben, M. 2020
In this paper, we present the development of a training dataset for Dutch Named Entity Recognition (NER) in the archaeology domain. This dataset was created as there is a dire need for semantic... Show moreIn this paper, we present the development of a training dataset for Dutch Named Entity Recognition (NER) in the archaeology domain. This dataset was created as there is a dire need for semantic search within archaeology, in order to allow archaeologists to find structured information in collections of Dutch excavation reports, currently totalling around 60,000 (658 million words) and growing rapidly. To guide this search task, NER is needed. We created rigorous annotation guidelines in an iterative process, then instructed five archaeology students to annotate a number of documents. The resulting dataset contains ~31k annotations between six entity types (artefact, time period, place, context, species & material). The inter-annotator agreement is 0.95, and when we used this data for machine learning, we observed an increase in F1 score from 0.51 to 0.70 in comparison to a machine learning model trained on a dataset created in prior work. This indicates that the data is of high quality, and can confidently be used to train NER classifiers Show less
Alcantara Rodriguez, M.; Francozo, M.; Andel, T.R. van 2019
The Historia Naturalis Brasiliae (HNB, 1648) is the most complete treatise on Brazilian flora and fauna created in the seventeenth century. Scientists Marcgrave and Piso depicted hundreds of plants... Show moreThe Historia Naturalis Brasiliae (HNB, 1648) is the most complete treatise on Brazilian flora and fauna created in the seventeenth century. Scientists Marcgrave and Piso depicted hundreds of plants and described uses, vernacular names, and diseases in Dutch Brazil. We aimed to verify whether these plants are still used similarly, using herbarium vouchers and taxonomic literature to identify the species described in the HNB and reviewing historical and modern ethnobotanical literature to analyze whether the HNB documented specific plants and uses for the northeast region. We highlighted Old World species, as they indicate plant introduction before and during the trans-Atlantic slave trade and exchange of African ethnobotanical knowledge. Of the 378 species found in the HNB, 256 (68%) were useful, mostly used for healing and food in a similar way (80%) both in the seventeenth century and in modern Brazil. Only one species (Swartzia pickelii) is endemic to northeast Brazil, while the others are more widely distributed. The HNB includes one of the first reports on African crops in Brazil, such as sesame, okra, and spider plant. This study brings insights on indigenous and African plant knowledge retentions since the creation of the HNB and acknowledges its non-European contributors. Show less
Brandsen, A.; Lambers, K.; Verberne, S.; Wansleeben, M. 2019
In this paper, we present the results of user requirement solicitation for a search system of grey literature in archaeology, specifically Dutch excavation reports. This search system uses Named... Show moreIn this paper, we present the results of user requirement solicitation for a search system of grey literature in archaeology, specifically Dutch excavation reports. This search system uses Named Entity Recognition and Information Retrieval techniques to create an effective and effortless search experience. Specifically, we used Conditional Random Fields to identify entities, with an average accuracy of 56%. This is a baseline result, and we identified many possibilities for improvement. These entities were indexed in ElasticSearch and a user interface was developed on top of the index. This proof of concept was used in user requirement solicitation and evaluation with a group of end users. Feedback from this group indicated that there is a dire need for such a system, and that the first results are promising. Show less