In this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that... Show moreIn this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that are ill-defined. It supports the transfer from a subjective human classification to a numerical scale. In this manner it affords the testing of hypothesis and separation of the classes in the data. We first formulate problems in terms of a fuzzy system and then develop and test algorithms in terms of their performance with data from the domain of the life-sciences. From the results and the performance, we will learn about the usefulness of fuzzy systems for the field, as well as the applicability to the kind of problems and practicality for the computation itself. Show less
This thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated... Show moreThis thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated image analysis solution is prior to an objective understanding of the raw image data. Compared to general application domain, the efficiency of HT/HC image analysis is highly subjected to image quantity and quality. Accordingly, this thesis will address two major procedures, namely image segmentation and object tracking, in the image analysis step of HT/HC screen study. Moreover, this thesis focuses on expending generic computer science and machine learning theorems into the design of dedicated algorithms for HT/HC image analysis. Additionally, this thesis exemplifies a practical implementation of image analysis and data analysis workflow via empirical case studies with different image modalities and experiment settings. However, the data analysis theorem will be generally illustrated without further expansions. Finally, the thesis will briefly address supplementary infrastructures for end-user interaction and data visualization. Show less
High-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of... Show moreHigh-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of high-grade osteosarcoma specimens and model systems, in order to learn more on osteosarcomagenesis and to find possible ways to inhibit this process. By analyzing different microarray data types using a systems biology approach, genomic instability was identified as an important driver of osteosarcomagenesis. A protective role of macrophages against metastasis of osteosarcoma was detected. In addition, the IR/IGF1R and PI3K/Akt signaling pathways were discovered as potential targets for treatment. This thesis provides the first steps in unraveling the genomic and transcriptomic landscape of high-grade osteosarcoma, and provides a biological rationale for certain new options for adjuvant treatment of this highly genomica lly unstable tumor. Show less
This thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses.... Show moreThis thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses. It integrates two lines of research __ genetics-based virus classification and evolutionary dynamics of gene length __ and aims at unveiling commonalities in the biology of these and other RNA viruses as well as assisting applied research in virology. Show less
This dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data... Show moreThis dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data sources and techniques, leading to improved discovery of mechanistic insights on human diseases. Chapter one looks at the possibility in which combining various bioinformatics-based strategies can significantly improve the characterization of the OPMD mouse model. We discuss that this approach in knowledge discovery, on the basis of our extensive analysis, helped us to shed some light on how this model system relates to OPMD pathophysiology in human. In Chapter two, we expand on this combinatory approach by conducting a cross-species data analysis. In this study, we have looked for common patterns that emerge by assessing the transcriptome data from three OPMD model systems and patients. This strategy led to unravelling the most prominent molecular pathway involved in OPMD pathology. The third chapter achieves a similar goal to identify similar molecular and pathophysiological features between OPMD and the common process of skeletal muscle ageing. Engaging in a study in which the focus was made on the universality of biological processes, in the light of evolutionary mechanisms and common functional features, led to novel discoveries. This work helped us uncover remarkable insights on molecular mechanisms of ageing muscles and protein aggregation. Chapters four and five take a different route by tackling the field of computational biology. These chapters aim to extend network inference by providing novel strategies for the exploitation and integration of multiple data sources. We show that these developments allow us to infer more robust regulatory mechanisms to be identified while translations and predictions are made across very different datasets, platforms, and organisms. Finally, the dissertation is concluded by providing an outlook on ways the field of systems biology can evolve in order to offer enhanced, diversified and robust strategies for knowledge discovery. Show less
This thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a... Show moreThis thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a large-scale international project, Fugu rubripes. the pufferfish. In particular, it highlights how this fish has a genome that contains as many genes as the human genome, although it is ten times smaller. It also shows that the majority of genes that are found in the human genome can be found in this fish genome as well. In the second chapter we compared fish genomes to the human genome to find regions that have been preserved during evolution and which are therefore likely to have a function, even though they are not genes. We showed that indeed they are functional, and they help to regulate other genes. Knowing all the genes in the genome we could then interrogate how they are expressed, i.e. if they are switched __on__ or __off__ and in particular in chapter 4 we looked at how a specific gene is in charge of gradually switching off genes that are inherited from the mother in a newborn fish embryo. Finally in the last chapter since genome sequencing is now becoming much cheaper and simpler to achieve we set out to map the genome of the common carp and we discuss the best approaches and strategies to obtain a good genome sequence for this species. The common carp is a candidate model system for high-troughput screening. Show less
A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was... Show moreA data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug discovery. Thus, because OA and PD are characterized by clinical heterogeneity, a more sensitive classification of the cohort of patients may contribute to the search for the underlying diseases mechanism. In drug discovery, subtyping may improve the understanding of the similarity (and distance) between different phenotypic effects as induced by drugs and chemicals. Our second scenario aims to compare text classification algorithms. First, we show that common classifiers achieve comparable performance on most problems. Second, tightly constrained SVM solutions are high performers. In that situation, most training documents are bounded support vectors, SVM reduces to a nearest mean classifier and no training is necessary, which raises a question on SVM merits in sparse bag of words feature spaces. Also, SVM is shown to suffer from performance deterioration for particular combinations of training set size/number of features. This relate to outlying documents of distinct classes overlapping in the feature space. Show less
In general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors... Show moreIn general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors was constructed, and differences between neutral and harmful variants were studied. A database of compounds with their mutagenicity data was constructed, and substructures were extracted that distinguish between Ames positive and Ames negative compounds. 6. Keywords (At most 10, in English), preferably from the thesaurus in use within your discipline. Do not use very general terms. cheminformatics, chemoinformatics, bioinformatics, databases, data mining, drug discovery, SNPs, polymorphisms, substructures. Show less