Natural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe... Show moreNatural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe as well as other evolutionary processes. They are of paramount importance for mapping out long-term changes: from culture, to ecology, to how natural history is practiced.This thesis describes computational methods for knowledge extraction from archives of natural history collections---here referring to handwritten manuscripts and hand-drawn illustrations. As we are dealing with heterogeneous real-world data, the task becomes exceptionally challenging. Small samples and a long-tailed distribution, sometimes with very fine-grained distinctions between classes, hamper model learning. Prior knowledge is therefore needed to bootstrap the learning process. Moreover, archival content can be difficult to interpret and integrate, and should therefore be formally described for data integration within and across collections. By serving extracted knowledge to the Semantic Web, collections are made amenable for research and integration with other biodiversity resources on the Web. Show less
In this paper we analyse the classification of zoological illustrations. Historically, zoological illustrations were the modus operandi for the documentation of new species, and now serve as... Show moreIn this paper we analyse the classification of zoological illustrations. Historically, zoological illustrations were the modus operandi for the documentation of new species, and now serve as crucial sources for long-term ecological and biodiversity research. By employing computational methods for classification, the data can be made amenable to research. Automated species identification is challenging due to the long-tailed nature of the data, and the millions of possible classes in the species taxonomy. Success commonly depends on large training sets with many examples per class, but images from only a subset of classes are digitally available, and many images are unlabelled, since labelling requires domain expertise. We explore zero-shot learning to address the problem, where features are learned from classes with medium to large samples, which are then transferred to recognise classes with few or no training samples. We specifically explore how distributed, multi-modal background knowledge from data providers, such as the Global Biodiversity Information Facility (GBIF), iNaturalist, and the Biodiversity Heritage Library (BHL), can be used to share knowledge between classes for zero-shot learning. We train a prototypical network for zero-shot classification, and introduce fused prototypes (FP) and hierarchical prototype loss (HPL) to optimise the model. Finally, we analyse the performance of the model for use in real-world applications. The experimental results are encouraging, indicating potential for use of such models in an expert support system, but also express the difficulty of our task, showing a necessity for research into computer vision methods that are able to learn from small samples. Show less
Large collections of historical biodiversity expeditions are housed in natural history museums throughout the world. Potentially they can serve as rich sources of data for cultural historical and... Show moreLarge collections of historical biodiversity expeditions are housed in natural history museums throughout the world. Potentially they can serve as rich sources of data for cultural historical and biodiversity research. However, they exist as only partially catalogued specimen repositories and images of unstructured, non-standardised, hand-written text and drawings. Although many archival collections have been digitised, disclosing their content is challenging. They refer to historical place names and outdated taxonomic classifications and are written in multiple languages. Efforts to transcribe the hand-written text can make the content accessible, but semantically describing and interlinking the content would further facilitate research. We propose a semantic model that serves to structure the named entities in natural history archival collections. In addition, we present an approach for the semantic annotation of these collections whilst documenting their provenance. This approach serves as an initial step for an adaptive learning approach for semi-automated extraction of named entities from natural history archival collections. The applicability of the semantic model and the annotation approach is demonstrated using image scans from a collection of 8, 000 field book pages gathered by the Committee for Natural History of the Netherlands Indies between 1820 and 1850, and evaluated together with domain experts from the field of natural and cultural history. Show less