The use of existing medications for diseases they were not originally developed for is called drug repositioning. A popular drug repositioning method to find new drugs against specific cancer types... Show moreThe use of existing medications for diseases they were not originally developed for is called drug repositioning. A popular drug repositioning method to find new drugs against specific cancer types is to search for drugs which are expected to bring back the gene expression activity of cancer cells to that of healthy cells (‘normalization’). One of the main research goals of this thesis was to investigate of this method could also be used on the gene expression profiles of individual tumors, enabling personalization of drug repositioning candidates for each patient. We initially had some success with this approach but this eventually lead to a systematic validation of the underlying principle using almost 10,000 tumor samples across 18 different tumor types. Unfortunately, the predictive power of the method was found to be much lower than previously reported and the part that remained could be nullified by correcting the gene expression profiles of the drugs for the downstream effects of reduced cell division. These results indicate that the current use of the method does not result in drug repositioning candidates specific for a tumor type but is only able to select generally cell-toxic drugs. Show less
Metagenomics enables the detection of all the genetic material of organisms present in a sample, making it a pathogen-agnostic approach for detecting common and rare or novel pathogens that are not... Show moreMetagenomics enables the detection of all the genetic material of organisms present in a sample, making it a pathogen-agnostic approach for detecting common and rare or novel pathogens that are not included in conventional testing. Beforehand, a clinician does not need to have a hypothesis of what pathogen is expected, unlike traditional polymerase chain reaction (PCR) testing.This thesis is focusing on diagnostic yield, clinical findings, and enhancing technical opportunities in viral metagenomics. The identification, typing, and quantification of viruses by means of viral metagenomics as a diagnostic tool are evaluated. Technical aspects are appraised for improved sensitivity and specificity of the wet and dry (bioinformatic) lab components of viral metagenomics. The use of a metagenomic protocol for virus discovery directly in a patient sample is assessed, and the best methods and approaches for performing genetic analysis of the SARS-CoV-2 virus are investigated.Viral metagenomic testing results in the identification of more viruses, therefore it is a valuable addition to current diagnostic test protocols. Additionally, it is a useful test for virus discovery and monitoring during infectious disease outbreaks caused by novel viruses. Show less
Immunotherapy approach to cancer is only benefiting to a minority of patients. In this study, we approach cancer solutions by studying the microenvironment and its immunological signature... Show moreImmunotherapy approach to cancer is only benefiting to a minority of patients. In this study, we approach cancer solutions by studying the microenvironment and its immunological signature throughout the body by focusing on the systemic immunity with new technology like mass cytometry. By highlighting specific immunological patterns in cancer, we were able to associate responsive immune cells and positive outcome, therefore paving the way to improve immunotherapy in cancer. Show less
The use of data derived from genomics and transcriptomic to further develop our understanding of Polycystic Kidney Diseases and identify novel drugs for its treatment.
Neurodegenerative diseases are hallmarked by protein inclusions and cell loss in disease-related brain regions, but the molecular mechanisms that lead to the pathological and symptomatic hallmarks... Show moreNeurodegenerative diseases are hallmarked by protein inclusions and cell loss in disease-related brain regions, but the molecular mechanisms that lead to the pathological and symptomatic hallmarks of neurodegeneration are still not fully understood. In this thesis, we make use of bioinformatics approaches to analyze a high-resolution spatial gene expression atlas of the healthy human brain generated by the Allen Institute of Brain Science. Spatial transcriptomics allows examining the molecular and functional organization of the human brain and can be combined with neuroimaging data to identify brain regions and anatomical structures that are vulnerable to cell loss in neurodegenerative diseases. By combining both data modalities, we examined healthy molecular functions in brain regions associated with disease vulnerability based on neuroimaging features, namely gray matter loss within brain networks in individuals with Parkinson’s disease, Huntington’s disease, and individuals at risk of schizophrenia. With this thesis, we have shown that by applying data-driven computational methods we can explore the whole genome and find gene expression patterns informative of regional brain vulnerability in neurodegenerative diseases. Our methods can similarly be applied to unravel the molecular mechanisms in other neurodegenerative diseases, and potentially even reveal shared mechanisms between neurological disorders. Show less
The order Nidovirales, including families Coronaviridae and Arteriviridae, is a monophyletic group of highly divergent (+)ssRNA viruses that infect vertebrate and invertebrate hosts; they share... Show moreThe order Nidovirales, including families Coronaviridae and Arteriviridae, is a monophyletic group of highly divergent (+)ssRNA viruses that infect vertebrate and invertebrate hosts; they share conserved genome organization and replication mechanisms. The genome sequence is the only information available about many newly discovered nidoviruses whose number is fast increasing driven by technology advancements. This development makes comparative genomics, an approach that already has been used extensively in nidovirology, increasingly important. In this thesis, diverse methods of comparative genomics were used to address scientific questions about composition and evolution of the nidovirus genome and proteome, and their connection to the biology of nidoviruses. Three studies were conducted in collaboration with experimental researchers, and ranged from the analysis of the highly divergent polyprotein N-terminus in arteriviruses, to identification of the fifth universally conserved domain of nidoviruses, and to characterization of a nidovirus with the largest known RNA genome. The latter study prompted the development of a bioinformatics tool facilitating functional annotation of large multidomain polyproteins. The thesis illustrates how a notion of nidovirus-specific conservation has been steadily refined as a result of recent discoveries. Show less
In dit proefschrift worden de moleculaire mechanismen behandeld die onderliggend zijn aan artrose. Specifiek wordt genoomwijd onderzocht welke genen anders tot expressie komen in aangedaan... Show moreIn dit proefschrift worden de moleculaire mechanismen behandeld die onderliggend zijn aan artrose. Specifiek wordt genoomwijd onderzocht welke genen anders tot expressie komen in aangedaan vergeleken met gezond kraakbeen van artrose patienten. Dit in de context van epigenetische regulatie van gen expressie, specifiek door DNA methylatie in het licht van de lokale genetische context in de vorm van puntmutaties. Show less
In this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that... Show moreIn this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that are ill-defined. It supports the transfer from a subjective human classification to a numerical scale. In this manner it affords the testing of hypothesis and separation of the classes in the data. We first formulate problems in terms of a fuzzy system and then develop and test algorithms in terms of their performance with data from the domain of the life-sciences. From the results and the performance, we will learn about the usefulness of fuzzy systems for the field, as well as the applicability to the kind of problems and practicality for the computation itself. Show less
This dissertation describes the development of glyco-bioinformatics tools that facilitate the high-throughput data processing of glycomics and glycoproteomics experiments, specifically for both... Show moreThis dissertation describes the development of glyco-bioinformatics tools that facilitate the high-throughput data processing of glycomics and glycoproteomics experiments, specifically for both MALDI-TOF-MS (Chapter 2) and LC-ESI-MS (Chapter 3). The developed methods also provide various quality control parameters that assist the researcher in curating both the measured spectra and quantified analytes, thereby providing high-quality data in a high-throughput manner.The tools that were developed within this thesis have been used to identify the influence of glycosylation on trypsin efficacy of Immunoglobulin G (Chapter 3) and two biological cohorts. Specifically, to investigate the serum N-glycosylation during and after pregnancy (Chapter 5) and to identify the differences in the N-glycosylation between maternal and fetal serum and IgG (Chapter 6). Show less
This thesis demonstrates the application of bioinformatics to investigate the mechanisms that are implicated in Huntington’s Disease (HD). HD is an inherited neurodegenerative disorder and although... Show moreThis thesis demonstrates the application of bioinformatics to investigate the mechanisms that are implicated in Huntington’s Disease (HD). HD is an inherited neurodegenerative disorder and although the cause of the disease is known since 1993 we are still lacking a cure or treatment that can effectively treat the symptoms of HD. In order to tackle such a complicated case study, we followed a multidisciplinary approach to exploit the expertise and knowledge of people with diverse scientific background (chapter 2). This blend of disciplines facilitates constant collaboration between bioinformaticians, wet lab technicians, biologists, computer engineers and data scientists. A collaborative eScience model is proposed as a way to combine state-of-the-art computation analysis and laboratory work (chapter 3). At the same time, we explored methods to preserve the results, materials and methods involved in the experiment to increase the reproducibility and reusability of our research (chapter 4). In chapter 5 we identified disease signatures in blood that are functionally similar to signatures in brain. These are proposed as candidate biomarkers to be used as a monitoring tool for the state of the disease in brain, but also as a means to determine whether a treatment is successful or not. Show less
In this thesis I focus on the application of bioinformatics to analyze RNA. The type of experimental data of interest is sequencing data generated with various Next Generation Sequencing technique:... Show moreIn this thesis I focus on the application of bioinformatics to analyze RNA. The type of experimental data of interest is sequencing data generated with various Next Generation Sequencing technique: nuclear RNA, cytoplasmic RNA, captured polyadenylated RNA fragments, etc. I highlight the necessity in developing new tools (e.g., to analyze nuclear RNA) and give a showcase example of implementing such tool and showing its usability on a real biological experiment. The thesis also covers existing tools to perform various types of RNA analysis and shows how these tools can be twigged and expanded to answer certain biological questions (e.g., studying changes in RNA specific to human aging). I also show how current bioinformatic approaches can be used in a particularly complex study such as investigating cancer (in this thesis, breast cancer) pathogenesis. Show less
Advances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted... Show moreAdvances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted in researchers grappling not just with huge volumes but also multiple types of data. While generation and storage of high-quality data are an important research focus, it is increasingly recognized that translating data into actionable information and insight is a critical research challenge. To infer reliable conclusions from the data, it is often necessary to integrate large amounts of heterogeneous data with different formats and semantics. Given the breadth and volume of data involved, this goal is best achieved through automated methods and tools for data integration and workflow management. This thesis presents automated strategies that combine bioinformatics and statistical methods to identify novel biomarkers in high-throughput OMICs datasets pertaining to the metabolic syndrome and to gain mechanistic insight into the underlying biological processes. An underlying theme in this thesis is data-driven approaches that generate plausible hypothesis which is followed by experimental verification. Show less
This thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated... Show moreThis thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated image analysis solution is prior to an objective understanding of the raw image data. Compared to general application domain, the efficiency of HT/HC image analysis is highly subjected to image quantity and quality. Accordingly, this thesis will address two major procedures, namely image segmentation and object tracking, in the image analysis step of HT/HC screen study. Moreover, this thesis focuses on expending generic computer science and machine learning theorems into the design of dedicated algorithms for HT/HC image analysis. Additionally, this thesis exemplifies a practical implementation of image analysis and data analysis workflow via empirical case studies with different image modalities and experiment settings. However, the data analysis theorem will be generally illustrated without further expansions. Finally, the thesis will briefly address supplementary infrastructures for end-user interaction and data visualization. Show less
High-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of... Show moreHigh-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of high-grade osteosarcoma specimens and model systems, in order to learn more on osteosarcomagenesis and to find possible ways to inhibit this process. By analyzing different microarray data types using a systems biology approach, genomic instability was identified as an important driver of osteosarcomagenesis. A protective role of macrophages against metastasis of osteosarcoma was detected. In addition, the IR/IGF1R and PI3K/Akt signaling pathways were discovered as potential targets for treatment. This thesis provides the first steps in unraveling the genomic and transcriptomic landscape of high-grade osteosarcoma, and provides a biological rationale for certain new options for adjuvant treatment of this highly genomica lly unstable tumor. Show less
This thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses.... Show moreThis thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses. It integrates two lines of research __ genetics-based virus classification and evolutionary dynamics of gene length __ and aims at unveiling commonalities in the biology of these and other RNA viruses as well as assisting applied research in virology. Show less
This dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data... Show moreThis dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data sources and techniques, leading to improved discovery of mechanistic insights on human diseases. Chapter one looks at the possibility in which combining various bioinformatics-based strategies can significantly improve the characterization of the OPMD mouse model. We discuss that this approach in knowledge discovery, on the basis of our extensive analysis, helped us to shed some light on how this model system relates to OPMD pathophysiology in human. In Chapter two, we expand on this combinatory approach by conducting a cross-species data analysis. In this study, we have looked for common patterns that emerge by assessing the transcriptome data from three OPMD model systems and patients. This strategy led to unravelling the most prominent molecular pathway involved in OPMD pathology. The third chapter achieves a similar goal to identify similar molecular and pathophysiological features between OPMD and the common process of skeletal muscle ageing. Engaging in a study in which the focus was made on the universality of biological processes, in the light of evolutionary mechanisms and common functional features, led to novel discoveries. This work helped us uncover remarkable insights on molecular mechanisms of ageing muscles and protein aggregation. Chapters four and five take a different route by tackling the field of computational biology. These chapters aim to extend network inference by providing novel strategies for the exploitation and integration of multiple data sources. We show that these developments allow us to infer more robust regulatory mechanisms to be identified while translations and predictions are made across very different datasets, platforms, and organisms. Finally, the dissertation is concluded by providing an outlook on ways the field of systems biology can evolve in order to offer enhanced, diversified and robust strategies for knowledge discovery. Show less
This thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a... Show moreThis thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a large-scale international project, Fugu rubripes. the pufferfish. In particular, it highlights how this fish has a genome that contains as many genes as the human genome, although it is ten times smaller. It also shows that the majority of genes that are found in the human genome can be found in this fish genome as well. In the second chapter we compared fish genomes to the human genome to find regions that have been preserved during evolution and which are therefore likely to have a function, even though they are not genes. We showed that indeed they are functional, and they help to regulate other genes. Knowing all the genes in the genome we could then interrogate how they are expressed, i.e. if they are switched __on__ or __off__ and in particular in chapter 4 we looked at how a specific gene is in charge of gradually switching off genes that are inherited from the mother in a newborn fish embryo. Finally in the last chapter since genome sequencing is now becoming much cheaper and simpler to achieve we set out to map the genome of the common carp and we discuss the best approaches and strategies to obtain a good genome sequence for this species. The common carp is a candidate model system for high-troughput screening. Show less
A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was... Show moreA data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug discovery. Thus, because OA and PD are characterized by clinical heterogeneity, a more sensitive classification of the cohort of patients may contribute to the search for the underlying diseases mechanism. In drug discovery, subtyping may improve the understanding of the similarity (and distance) between different phenotypic effects as induced by drugs and chemicals. Our second scenario aims to compare text classification algorithms. First, we show that common classifiers achieve comparable performance on most problems. Second, tightly constrained SVM solutions are high performers. In that situation, most training documents are bounded support vectors, SVM reduces to a nearest mean classifier and no training is necessary, which raises a question on SVM merits in sparse bag of words feature spaces. Also, SVM is shown to suffer from performance deterioration for particular combinations of training set size/number of features. This relate to outlying documents of distinct classes overlapping in the feature space. Show less
In general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors... Show moreIn general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors was constructed, and differences between neutral and harmful variants were studied. A database of compounds with their mutagenicity data was constructed, and substructures were extracted that distinguish between Ames positive and Ames negative compounds. 6. Keywords (At most 10, in English), preferably from the thesaurus in use within your discipline. Do not use very general terms. cheminformatics, chemoinformatics, bioinformatics, databases, data mining, drug discovery, SNPs, polymorphisms, substructures. Show less