Natural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe... Show moreNatural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe as well as other evolutionary processes. They are of paramount importance for mapping out long-term changes: from culture, to ecology, to how natural history is practiced.This thesis describes computational methods for knowledge extraction from archives of natural history collections---here referring to handwritten manuscripts and hand-drawn illustrations. As we are dealing with heterogeneous real-world data, the task becomes exceptionally challenging. Small samples and a long-tailed distribution, sometimes with very fine-grained distinctions between classes, hamper model learning. Prior knowledge is therefore needed to bootstrap the learning process. Moreover, archival content can be difficult to interpret and integrate, and should therefore be formally described for data integration within and across collections. By serving extracted knowledge to the Semantic Web, collections are made amenable for research and integration with other biodiversity resources on the Web. Show less
Identifying genes involved in functional differences between similar tissues from expression profiles is challenging, because the expected differences in expression levels are small. To exemplify... Show moreIdentifying genes involved in functional differences between similar tissues from expression profiles is challenging, because the expected differences in expression levels are small. To exemplify this challenge, we studied the expression profiles of two skeletal muscles, deltoid and biceps, in healthy individuals. We provide a series of guides and recommendations for the analysis of this type of studies. These include how to account for batch effects and inter-individual differences to optimize the detection of gene signatures associated with tissue function. We provide guidance on the selection of optimal settings for constructing gene co-expression networks through parameter sweeps of settings and calculation of the overlap with an established knowledge network. Our main recommendation is to use a combination of the data-driven approaches, such as differential gene expression analysis and gene co-expression network analysis, and hypothesis-driven approaches, such as gene set connectivity analysis. Accordingly, we detected differences in metabolic gene expression between deltoid and biceps that were supported by both data- and hypothesis-driven approaches. Finally, we provide a bioinformatic framework that support the biological interpretation of expression profiles from related tissues from this combination of approaches, which is available at github.com/tabbassidaloii/AnalysisFrameworkSimilarTissues. Show less