Search results

(1 - 20 of 43)

Pages

Hsieh, P.H.; Lopes-Ramos, C.M.; Zucknick, M.; Sandve, G.K.; Glass, K.; Kuijjer, M.L. 2023

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data

Article / Letter to editor

open access

Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that... Show moreGene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that they are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Sequencing data, which are commonly normalized to remove technical variability. Here, we demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes. These false-positive associations can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular, when preprocessing large-scale heterogeneous data, quantile-based normalization methods such as smooth quantile normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. Show less

Fleming, R.M.T.; Haraldsdottir, H.S.; Minh, L.H.; Vuong, P.T.; Hankemeier, T.; Thiele, I. 2023

Cardinality optimization in constraint-based modelling: application to human metabolism

Article / Letter to editor

open access

Vis, J.K.; Santcroos, M.A.; Kosters, W.A.; Laros, J.F.J. 2023

A Boolean algebra for genetic variants

Article / Letter to editor

open access

MotivationBeyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all... Show moreMotivationBeyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all minimal alignments into account. We present an efficient algorithm to compute these relations, including a novel way of efficiently computing all minimal alignments within the best theoretical complexity bounds.ResultsWe show that these relations are common, and many non-trivial, for variants of the CFTR gene in dbSNP. Ultimately, we present an approach for the storing and indexing of variants in the context of a database that enables efficient querying for all these relations.Availability and implementationA Python implementation is available at as well as an interface at . Show less

Thiele, I.; Preciat, G.; Fleming, R.M. 2022

MetaboAnnotator: an efficient toolbox to annotate metabolites in genome-scale metabolic reconstructions

Article / Letter to editor

open access

MOTIVATION\nRESULTS\nAVAILABILITY AND IMPLEMENTATION\nSUPPLEMENTARY INFORMATION\nGenome-scale metabolic reconstructions have been assembled for thousands of organisms using a wide range of tools.... Show moreMOTIVATION\nRESULTS\nAVAILABILITY AND IMPLEMENTATION\nSUPPLEMENTARY INFORMATION\nGenome-scale metabolic reconstructions have been assembled for thousands of organisms using a wide range of tools. However, metabolite annotations, required to compare and link metabolites between reconstructions, remain incomplete. Here, we aim to further extend metabolite annotation coverage using various databases and chemoinformatic approaches.\nWe developed a COBRA toolbox extension, deemed MetaboAnnotator, which facilitates the comprehensive annotation of metabolites with database independent and dependent identifiers, obtains molecular structure files, and calculates metabolite formula and charge at pH 7.2. The resulting metabolite annotations allow for subsequent cross-mapping between reconstructions and mapping of, e.g., metabolomic data.\nMetaboAnnotator and tutorials are freely available at https://github.com/opencobra.\nSupplementary data are available at Bioinformatics online. Show less

Bizzarri, D.; Reinders, M.J.T.; Beekman, M.; Slagboom, P.E.; Akker, E.B. van den 2022

MiMIR: R-shiny application to infer risk factors and endpoints from Nightingale Health's H-1-NMR metabolomics data

Article / Letter to editor

open access

Motivation: H-1-NMR metabolomics is rapidly becoming a standard resource in large epidemiological studies to acquire metabolic profiles in large numbers of samples in a relatively low-priced and... Show moreMotivation: H-1-NMR metabolomics is rapidly becoming a standard resource in large epidemiological studies to acquire metabolic profiles in large numbers of samples in a relatively low-priced and standardized manner. Concomitantly, metabolomics-based models are increasingly developed that capture disease risk or clinical risk factors. These developments raise the need for user-friendly toolbox to inspect new H-1-NMR metabolomics data and project a wide array of previously established risk models. Results: We present MiMIR (Metabolomics-based Models for Imputing Risk), a graphical user interface that provides an intuitive framework for ad hoc statistical analysis of Nightingale Health's H-1-NMR metabolomics data and allows for the projection and calibration of 24 pre-trained metabolomics-based models, without any pre-required programming knowledge. Availability and implementation: The R-shiny package is available in CRAN or downloadable at , together with an extensive user manual (also available as Supplementary Documents to the article). Show less

Osorio, D.; Kuijjer, M.L.; Cai, J.J. 2022

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

Article / Letter to editor

open access

MotivationCharacterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the... Show moreMotivationCharacterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization.ResultsHere, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes’ unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Show less

Sinke, L.; Cats, D.; Heijmans, B.T. 2021

Omixer: multivariate and reproducible sample randomization to proactively counter batch effects in omics studies

Article / Letter to editor

metadata only

Motivation: Batch effects heavily impact results in omics studies, causing bias and false positive results, but software to control them preemptively is lacking. Sample randomization prior to... Show moreMotivation: Batch effects heavily impact results in omics studies, causing bias and false positive results, but software to control them preemptively is lacking. Sample randomization prior to measurement is vital for minimizing these effects, but current approaches are often ad hoc, poorly documented and ill-equipped to handle multiple batches and outcomes.Results: We developed Omixer-a Bioconductor package implementing multivariate and reproducible sample randomization for omics studies. It proactively counters correlations between technical factors and biological variables of interest by optimizing sample distribution across batches. Show less

Lefter, M.; Vis, J.K.; Vermaat, M.; Dunnen, J.T. den; Taschner, P.E.M.; Laros, J.F.J. 2021

Mutalyzer 2: next generation HGVS nomenclature checker

Article / Letter to editor

open access

Motivation: Unambiguous variant descriptions are of utmost importance in clinical genetic diagnostics, scientific literature and genetic databases. The Human Genome Variation Society (HGVS)... Show moreMotivation: Unambiguous variant descriptions are of utmost importance in clinical genetic diagnostics, scientific literature and genetic databases. The Human Genome Variation Society (HGVS) publishes a comprehensive set of guidelines on how variants should be correctly and unambiguously described. We present the implementation of the Mutalyzer 2 tool suite, designed to automatically apply the HGVS guidelines so users do not have to deal with the HGVS intricacies explicitly to check and correct their variant descriptions.Results: Mutalyzer is profusely used by the community, having processed over 133 million descriptions since its launch. Over a five year period, Mutalyzer reported a correct input in similar to 50% of cases. In 41% of the cases either a syntactic or semantic error was identified and for similar to 7% of cases, Mutalyzer was able to automatically correct the description. Show less

Marissen, R.; Palmblad, M. 2021

mzRecal: universal MS1 recalibration in mzML using identified peptides in mzIdentML as internal calibrants

Article / Letter to editor

metadata only

A Summary: In mass spectrometry-based proteomics, accurate peptide masses improve identifications, alignment and quantitation. Getting the most out of any instrument therefore requires proper... Show moreA Summary: In mass spectrometry-based proteomics, accurate peptide masses improve identifications, alignment and quantitation. Getting the most out of any instrument therefore requires proper calibration. Here, we present a new stand-alone software, mzRecal, for universal automatic recalibration of data from all common mass analyzers using standard open formats and based on physical principles. Show less

Mohammed, Y.; Bhowmick, P.; Michaud, S.A.; Sickmann, A.; Borchers, C.H. 2021

Mouse Quantitative Proteomics Knowledgebase: reference protein concentration ranges in 20 mouse tissues using 5000 quantitative proteomics assays

Article / Letter to editor

metadata only

Motivation Laboratory mouse is the most used animal model in biological research, largely due to its high conserved synteny with human. Researchers use mice to answer various questions ranging from... Show moreMotivation Laboratory mouse is the most used animal model in biological research, largely due to its high conserved synteny with human. Researchers use mice to answer various questions ranging from determining a pathological effect of knocked out/in gene to understanding drug metabolism. Our group developed >5000 quantitative targeted proteomics assays for 20 mouse tissues and determined the concentration ranges of a total of >1600 proteins using heavy labeled internal standards. We describe here MouseQuaPro; a knowledgebase that hosts this collection of carefully curated experimental data.Results The web-based application includes protein concentrations from >700 mouse tissue samples from three common research strains, corresponding to >200k experimentally determined concentrations. The knowledgebase integrates the assay and protein concentration information with their human orthologs, functional and molecular annotations, biological pathways, related human diseases and known gene expressions. At its core are the protein concentration ranges, which provide insights into (dis)similarities between tissues, strains and sexes. MouseQuaPro implements advanced search as well as filtering functionalities with a simple interface and interactive visualization. This information-rich resource provides an initial map of protein absolute concentration in mouse tissues and allows guided design of proteomics phenotyping experiments. The knowledgebase is available on mousequapro.proteincentre.com. Show less

Villegas-Morcillo, A.; Makrodimitris, S.; Ham, R.C.H.J. van; Gomez, A.M.; Sanchez, V.; Reinders, M.J.T. 2021

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

Article / Letter to editor

open access

Motivation: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these... Show moreMotivation: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available.Results: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Show less

Ubels, J.; Schaefers, T.; Punt, C.; Guchelaar, H.J.; Ridder, J. de 2020

RAINFOREST: a random forest approach to predict treatment benefit in data from (failed) clinical drug trials

Article in monograph or in proceedings

metadata only

Motivation: When phase III clinical drug trials fail their endpoint, enormous resources are wasted. Moreover, even if a clinical trial demonstrates a significant benefit, the observed effects are... Show moreMotivation: When phase III clinical drug trials fail their endpoint, enormous resources are wasted. Moreover, even if a clinical trial demonstrates a significant benefit, the observed effects are often small and may not outweigh the side effects of the drug. Therefore, there is a great clinical need for methods to identify genetic markers that can identify subgroups of patients which are likely to benefit from treatment as this may (i) rescue failed clinical trials and/or (ii) identify subgroups of patients which benefit more than the population as a whole. When single genetic biomarkers cannot be found, machine learning approaches that find multivariate signatures are required. For single nucleotide polymorphism (SNP) profiles, this is extremely challenging owing to the high dimensionality of the data. Here, we introduce RAINFOREST (tReAtment benefIt prediction using raNdom FOREST), which can predict treatment benefit from patient SNP profiles obtained in a clinical trial setting.Results: We demonstrate the performance of RAINFOREST on the CAIRO2 dataset, a phase III clinical trial which tested the addition of cetuximab treatment for metastatic colorectal cancer and concluded there was no benefit. However, we find that RAINFOREST is able to identify a subgroup comprising 27.7% of the patients that do benefit, with a hazard ratio of 0.69 (P = 0.04) in favor of cetuximab. The method is not specific to colorectal cancer and could aid in reanalysis of clinical trial data and provide a more personalized approach to cancer treatment, also when there is no clear link between a single variant and treatment benefit. Show less

Abdelaal, T.; Raadt, P. de; Lelieveldt, B.P.F.; Reinders, M.J.T.; Mahfouz, A. 2020

SCHNEL: scalable clustering of high dimensional single-cell data

Article / Letter to editor

open access

Motivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further... Show moreMotivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.Results: We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Show less

Zammit, A.; Helwerda, L.; Olsthoorn, R.C.L.; Verbeek, F.J.; Gultyaev, A.P. 2020

A database of flavivirus RNA structures with a search algorithm for pseudoknots and triple base interactions

Article / Letter to editor

open access

Morgan-Lang, C.; McLaughlin, R.; Armstrong, Z.W.B.; Zhang, G.; Chan, K.; Hallam, S.J. 2020

TreeSAPP: the tree-based sensitive and accurate phylogenetic profiler

Article / Letter to editor

open access

MOTIVATIONMicrobial communities drive matter and energy transformations integral to global biogeochemical cycles, yet many taxonomic groups facilitating these processes remain poorly represented in... Show moreMOTIVATIONMicrobial communities drive matter and energy transformations integral to global biogeochemical cycles, yet many taxonomic groups facilitating these processes remain poorly represented in biological sequence databases. Due to this missing information, taxonomic assignment of sequences from environmental genomes remains inaccurate.RESULTSWe present the Tree-based Sensitive and Accurate Phylogenetic Profiler (TreeSAPP) software for functionally and taxonomically classifying genes, reactions and pathways from genomes of cultivated and uncultivated microorganisms using reference packages representing coding sequences mediating multiple globally relevant biogeochemical cycles. TreeSAPP uses linear regression of evolutionary distance on taxonomic rank to improve classifications, assigning both closely related and divergent query sequences at the appropriate taxonomic rank. TreeSAPP is able to provide quantitative functional and taxonomic classifications for both assembled and unassembled sequences and files supporting interactive tree of life visualizations.AVAILABILITY AND IMPLEMENTATIONTreeSAPP was developed in Python 3 as an open-source Python package and is available on GitHub at https://github.com/hallamlab/TreeSAPP.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. Show less

Rafols, P.; Heijs, B.; Castillo, E. del; Yanes, O.; McDonnell, L.A.; Brezmes, J.; ... ; Correig, X. 2020

rMSIproc: an R package for mass spectrometry imaging data processing

Article / Letter to editor

open access

Mass spectrometry imaging (MSI) can reveal biochemical information directly from a tissue section. MSI generates a large quantity of complex spectral data which is still challenging to translate... Show moreMass spectrometry imaging (MSI) can reveal biochemical information directly from a tissue section. MSI generates a large quantity of complex spectral data which is still challenging to translate into relevant biochemical information. Here, we present rMSIproc, an open-source R package that implements a full data processing workflow for MSI experiments performed using TOF or FT-based mass spectrometers. The package provides a novel strategy for spectral alignment and recalibration, which allows to process multiple datasets simultaneously. This enables to perform a confident statistical analysis with multiple datasets from one or several experiments. rMSIproc is designed to work with files larger than the computer memory capacity and the algorithms are implemented using a multi-threading strategy. rMSIproc is a powerful tool able to take full advantage of modern computer systems to completely develop the whole MSI potential. Show less

Gulyaeva, A.A.; Sigorskih, A.I.; Ocheredko, E.S.; Samborskiy, D.V.; Gorbalenya, A.E. 2020

LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins

Article / Letter to editor

open access

Motivation: To facilitate accurate estimation of statistical significance of sequence similarity in profile-profile searches, queries should ideally correspond to protein domains. For multidomain... Show moreMotivation: To facilitate accurate estimation of statistical significance of sequence similarity in profile-profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance.Results: In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments. Show less

Cordes, M.; Pike-Overzet, K.; Eggermond, M. van; Vloemans, S.; Baert, M.R.; Garcia Perez, L.; ... ; Akker, E. van den 2020

ImSpectR: R package to quantify immune repertoire diversity in spectratype and repertoire sequencing data

Article / Letter to editor

open access

A Summary: An effective immune system is characterized by a diverse immune repertoire. There is a strong demand for accurate and quantitative methods to assess the diversity of the immune... Show moreA Summary: An effective immune system is characterized by a diverse immune repertoire. There is a strong demand for accurate and quantitative methods to assess the diversity of the immune repertoire for various (pre-)clinical applications, including the diagnosis and prognosis of primary immune deficiencies, or to assess the response to therapy. Current strategies for immune diversity assessment generally comprise the visual inspection of the length distribution of rearranged T- and B-cell receptors. Visual inspections, however, are prone to subjective assessments and thus lead to biases. Here, we introduce ImSpectR, a unified approach to quantify immunodiversity using either spectratype, repertoire sequencing or single cell RNA sequencing data. ImSpectR scores various types of deviations from the expected length distribution and integrates these into one measure, allowing for robust quantitative comparisons of immune diversity across individuals or conditions. Show less

Canouil, M.; Bouland, G.A.; Bonnefond, A.; Froguel, P.; Hart, L.M. 't; Slieker, R.C. 2020

NACHO: an R package for quality control of NanoString nCounter data

Article / Letter to editor

open access

A Summary: The NanoString (TM) nCounter((R)) is a platform for the targeted quantification of expression data in biofluids and tissues. While software by the manufacturer is available in addition... Show moreA Summary: The NanoString (TM) nCounter((R)) is a platform for the targeted quantification of expression data in biofluids and tissues. While software by the manufacturer is available in addition to third parties packages, they do not provide a complete quality control (QC) pipeline. Here, we present NACHO ('NAnostring quality Control dasHbOard'), a comprehensive QC R-package. The package consists of three subsequent steps: summarize, visualize and normalize. The summarize function collects all the relevant data and stores it in a tidy format, the visualize function initiates a dashboard with plots of the relevant QC outcomes. It contains QC metrics that are measured by default by the manufacturer, but also calculates other insightful measures, including the scaling factors that are needed in the normalization step. In this normalization step, different normalization methods can be chosen to optimally preprocess data. Together, NACHO is a comprehensive method that optimizes insight and preprocessing of nCounter((R)) data. Show less

Abdelaal, T.; Hollt, T.; Unen, V. van; Lelieveldt, B.P.F.; Koning, F.; Reinders, M.J.T.; Mahfouz, A. 2019

CyTOFmerge: integrating mass cytometry data across multiple panels

Article / Letter to editor

metadata only

Leiden University Scholarly Publications

Your Search

Enabled Filters

Sort Options

Refine Results

Resource Type

Availability

Creation Date

Faculty

Collection

Author

Language

Search results

Pages

Pages