The present study compared genetically modified (GM) crops with crops from different farming practices using high-resolution tandem mass spectrometry (HR-MS) and proteomics bioinformatics tools. In... Show moreThe present study compared genetically modified (GM) crops with crops from different farming practices using high-resolution tandem mass spectrometry (HR-MS) and proteomics bioinformatics tools. In a previously pub-lished study, a number of significant differences regarding nutritional and elemental composition between a selection of GM, non-GM conventionally farmed, and organic soybeans have been found. In the present study, the proteome-level equivalence of the same samples was assessed using HR-MS. Direct comparison of tandem mass spectra and bottom-up proteomics bioinformatics indicated that proteomes of all samples investigated were very similar overall, with only a few distinct protein expression clusters obtained for GM and organic samples. Standard bottom-up proteome analyses identified 1025 soy proteins; of these 39 were found to be differentially expressed (p < 0.01) between GM, non-GM conventionally farmed, and organically farmed soybeans. Subsequent bioinformatics analyses of these proteins highlighted several potentially affected biochemical pathways that could contribute to the compositional differences reported earlier. In addition, protein markers separating conventionally, and organically farmed soybean seeds were found and peptide markers for the detection of GM soy in food and feed samples are described. Taken together, the data presented here shows that HR-MS based proteomics approaches can be used for the detection of transgenic events in food and feed grade soy, the dif-ferentiation of organically and conventionally farmed plants, and provide mechanistic explanations of effects observed on the phenotypic level of GM plants. HR-MS and proteomic bioinformatics thus should be considered key tools when developing molecular panel approaches for detection and safety assessments of novel crop va-rieties destined for use in feed and food. Show less
Velden, J. van der; Asselbergs, F.W.; Bakkers, J.; Batkai, S.; Bertrand, L.; Bezzina, C.R.; ... ; Thum, T. 2022
Cardiovascular diseases represent a major cause of morbidity and mortality, necessitating research to improve diagnostics, and to discover and test novel preventive and curative therapies. All of... Show moreCardiovascular diseases represent a major cause of morbidity and mortality, necessitating research to improve diagnostics, and to discover and test novel preventive and curative therapies. All of which warrant experimental models that recapitulate human disease. The translation of basic science results to clinical practice is a challenging task. In particular for complex conditions such as cardiovascular diseases, which often result from multiple risk factors and co-morbidities. This difficulty might lead some individuals to question the value of animal research, citing the translational 'valley of death', which largely reflects the fact that studies in rodents are difficult to translate to humans. This is also influenced by the fact that new, human-derived in vitro models can recapitulate aspects of disease processes. However, it would be a mistake to think that animal models cannot provide a vital step in the translational pathway as they do provide important pathophysiological insights into disease mechanisms particularly on a organ and systemic level. While stem cell-derived human models have the potential to become key in testing toxicity and effectiveness of new drugs, we need to be realistic, and carefully validate all new human-like disease models. In this position paper, we highlight recent advances in trying to reduce the number of animals for cardiovascular research ranging from stem cell-derived models to in situ modelling of heart properties, bioinformatic models based on large datasets, and improved current animal models, which show clinically relevant characteristics observed in patients with a cardiovascular disease. We aim to provide a guide to help researchers in their experimental design to translate bench findings to clinical routine taking the replacement, reduction and refinement (3R) as a guiding concept. Show less
In this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that... Show moreIn this thesis we will explore the use of fuzzy systems theory for applications in bioinformatics. The theory of fuzzy systems is concerned with formulating decision problems in data sets that are ill-defined. It supports the transfer from a subjective human classification to a numerical scale. In this manner it affords the testing of hypothesis and separation of the classes in the data. We first formulate problems in terms of a fuzzy system and then develop and test algorithms in terms of their performance with data from the domain of the life-sciences. From the results and the performance, we will learn about the usefulness of fuzzy systems for the field, as well as the applicability to the kind of problems and practicality for the computation itself. Show less
In this thesis I focus on the application of bioinformatics to analyze RNA. The type of experimental data of interest is sequencing data generated with various Next Generation Sequencing technique:... Show moreIn this thesis I focus on the application of bioinformatics to analyze RNA. The type of experimental data of interest is sequencing data generated with various Next Generation Sequencing technique: nuclear RNA, cytoplasmic RNA, captured polyadenylated RNA fragments, etc. I highlight the necessity in developing new tools (e.g., to analyze nuclear RNA) and give a showcase example of implementing such tool and showing its usability on a real biological experiment. The thesis also covers existing tools to perform various types of RNA analysis and shows how these tools can be twigged and expanded to answer certain biological questions (e.g., studying changes in RNA specific to human aging). I also show how current bioinformatic approaches can be used in a particularly complex study such as investigating cancer (in this thesis, breast cancer) pathogenesis. Show less
Advances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted... Show moreAdvances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted in researchers grappling not just with huge volumes but also multiple types of data. While generation and storage of high-quality data are an important research focus, it is increasingly recognized that translating data into actionable information and insight is a critical research challenge. To infer reliable conclusions from the data, it is often necessary to integrate large amounts of heterogeneous data with different formats and semantics. Given the breadth and volume of data involved, this goal is best achieved through automated methods and tools for data integration and workflow management. This thesis presents automated strategies that combine bioinformatics and statistical methods to identify novel biomarkers in high-throughput OMICs datasets pertaining to the metabolic syndrome and to gain mechanistic insight into the underlying biological processes. An underlying theme in this thesis is data-driven approaches that generate plausible hypothesis which is followed by experimental verification. Show less
Background: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a... Show moreBackground: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets.Results: Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression.Conclusions: We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression. Show less
This thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated... Show moreThis thesis is dedicated to the empirical study of image analysis in HT/HC screen study. Often a HT/HC screening produces extensive amounts that cannot be manually analyzed. Thus, an automated image analysis solution is prior to an objective understanding of the raw image data. Compared to general application domain, the efficiency of HT/HC image analysis is highly subjected to image quantity and quality. Accordingly, this thesis will address two major procedures, namely image segmentation and object tracking, in the image analysis step of HT/HC screen study. Moreover, this thesis focuses on expending generic computer science and machine learning theorems into the design of dedicated algorithms for HT/HC image analysis. Additionally, this thesis exemplifies a practical implementation of image analysis and data analysis workflow via empirical case studies with different image modalities and experiment settings. However, the data analysis theorem will be generally illustrated without further expansions. Finally, the thesis will briefly address supplementary infrastructures for end-user interaction and data visualization. Show less
High-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of... Show moreHigh-grade osteosarcoma is a primary bone tumor with complex genetic alterations, for which targeted therapy is lacking. The aim of this thesis was to use high-throughput molecular data analysis of high-grade osteosarcoma specimens and model systems, in order to learn more on osteosarcomagenesis and to find possible ways to inhibit this process. By analyzing different microarray data types using a systems biology approach, genomic instability was identified as an important driver of osteosarcomagenesis. A protective role of macrophages against metastasis of osteosarcoma was detected. In addition, the IR/IGF1R and PI3K/Akt signaling pathways were discovered as potential targets for treatment. This thesis provides the first steps in unraveling the genomic and transcriptomic landscape of high-grade osteosarcoma, and provides a biological rationale for certain new options for adjuvant treatment of this highly genomica lly unstable tumor. Show less
This thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses.... Show moreThis thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses. It integrates two lines of research __ genetics-based virus classification and evolutionary dynamics of gene length __ and aims at unveiling commonalities in the biology of these and other RNA viruses as well as assisting applied research in virology. Show less
This dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data... Show moreThis dissertation mainly focuses on interdisciplinary approaches for biomedical knowledge discovery. This required special efforts in developing systematic strategies to integrate various data sources and techniques, leading to improved discovery of mechanistic insights on human diseases. Chapter one looks at the possibility in which combining various bioinformatics-based strategies can significantly improve the characterization of the OPMD mouse model. We discuss that this approach in knowledge discovery, on the basis of our extensive analysis, helped us to shed some light on how this model system relates to OPMD pathophysiology in human. In Chapter two, we expand on this combinatory approach by conducting a cross-species data analysis. In this study, we have looked for common patterns that emerge by assessing the transcriptome data from three OPMD model systems and patients. This strategy led to unravelling the most prominent molecular pathway involved in OPMD pathology. The third chapter achieves a similar goal to identify similar molecular and pathophysiological features between OPMD and the common process of skeletal muscle ageing. Engaging in a study in which the focus was made on the universality of biological processes, in the light of evolutionary mechanisms and common functional features, led to novel discoveries. This work helped us uncover remarkable insights on molecular mechanisms of ageing muscles and protein aggregation. Chapters four and five take a different route by tackling the field of computational biology. These chapters aim to extend network inference by providing novel strategies for the exploitation and integration of multiple data sources. We show that these developments allow us to infer more robust regulatory mechanisms to be identified while translations and predictions are made across very different datasets, platforms, and organisms. Finally, the dissertation is concluded by providing an outlook on ways the field of systems biology can evolve in order to offer enhanced, diversified and robust strategies for knowledge discovery. Show less
This thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a... Show moreThis thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a large-scale international project, Fugu rubripes. the pufferfish. In particular, it highlights how this fish has a genome that contains as many genes as the human genome, although it is ten times smaller. It also shows that the majority of genes that are found in the human genome can be found in this fish genome as well. In the second chapter we compared fish genomes to the human genome to find regions that have been preserved during evolution and which are therefore likely to have a function, even though they are not genes. We showed that indeed they are functional, and they help to regulate other genes. Knowing all the genes in the genome we could then interrogate how they are expressed, i.e. if they are switched __on__ or __off__ and in particular in chapter 4 we looked at how a specific gene is in charge of gradually switching off genes that are inherited from the mother in a newborn fish embryo. Finally in the last chapter since genome sequencing is now becoming much cheaper and simpler to achieve we set out to map the genome of the common carp and we discuss the best approaches and strategies to obtain a good genome sequence for this species. The common carp is a candidate model system for high-troughput screening. Show less
A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was... Show moreA data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug discovery. Thus, because OA and PD are characterized by clinical heterogeneity, a more sensitive classification of the cohort of patients may contribute to the search for the underlying diseases mechanism. In drug discovery, subtyping may improve the understanding of the similarity (and distance) between different phenotypic effects as induced by drugs and chemicals. Our second scenario aims to compare text classification algorithms. First, we show that common classifiers achieve comparable performance on most problems. Second, tightly constrained SVM solutions are high performers. In that situation, most training documents are bounded support vectors, SVM reduces to a nearest mean classifier and no training is necessary, which raises a question on SVM merits in sparse bag of words feature spaces. Also, SVM is shown to suffer from performance deterioration for particular combinations of training set size/number of features. This relate to outlying documents of distinct classes overlapping in the feature space. Show less
In general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors... Show moreIn general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors was constructed, and differences between neutral and harmful variants were studied. A database of compounds with their mutagenicity data was constructed, and substructures were extracted that distinguish between Ames positive and Ames negative compounds. 6. Keywords (At most 10, in English), preferably from the thesaurus in use within your discipline. Do not use very general terms. cheminformatics, chemoinformatics, bioinformatics, databases, data mining, drug discovery, SNPs, polymorphisms, substructures. Show less