Background: Inter-individual differences in drug response based on genetic variations can lead to drug toxicity and treatment inefficacy. A large part of this variability is caused by genetic... Show moreBackground: Inter-individual differences in drug response based on genetic variations can lead to drug toxicity and treatment inefficacy. A large part of this variability is caused by genetic variants in pharmacogenes. Unfortunately, the Single Nucleotide Variant arrays currently used in clinical pharmacogenomic (PGx) testing are unable to detect all genetic variability in these genes. Long-read sequencing, on the other hand, has been shown to be able to resolve complex (pharmaco) genes. In this study we aimed to assess the value of long-read sequencing for research and clinical PGx focusing on the important and highly polymorphic CYP2C19 gene.Methods and Results: With a capture-based long-read sequencing panel we were able to characterize the entire region and assign variants to their allele of origin (phasing), resulting in the identification of 813 unique variants in 37 samples. To assess the clinical utility of this data we have compared the performance of three different *-allele tools (Aldy, PharmCat and PharmaKU) which are specifically designed to assign haplotypes to pharmacogenes based on all input variants.Conclusion: We conclude that long-read sequencing can improve our ability to characterize the CYP2C19 locus, help to identify novel haplotypes and that *-allele tools are a useful asset in phenotype prediction. Ultimately, this approach could help to better predict an individual's drug response and improve therapy outcomes. However, the added value in clinical PGx might currently be limited. Show less
Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC)... Show moreData set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML. Show less
Bogaards, F.A.; Gehrmann, T.; Beekman, M.; Akker, E. ben van den; Rest, O. van de; Hangelbroek, R.W.J.; ... ; Slagboom, P.E. 2022
The response to lifestyle intervention studies is often heterogeneous, especially in older adults. Subtle responses that may represent a health gain for individuals are not always detected by... Show moreThe response to lifestyle intervention studies is often heterogeneous, especially in older adults. Subtle responses that may represent a health gain for individuals are not always detected by classical health variables, stressing the need for novel biomarkers that detect intermediate changes in metabolic, inflammatory, and immunity-related health. Here, our aim was to develop and validate a molecular multivariate biomarker maximally sensitive to the individual effect of a lifestyle intervention; the Personalized Lifestyle Intervention Status (PLIS). We used H-1-NMR fasting blood metabolite measurements from before and after the 13-week combined physical and nutritional Growing Old TOgether (GOTO) lifestyle intervention study in combination with a fivefold cross-validation and a bootstrapping method to train a separate PLIS score for men and women. The PLIS scores consisted of 14 and four metabolites for females and males, respectively. Performance of the PLIS score in tracking health gain was illustrated by association of the sex-specific PLIS scores with several classical metabolic health markers, such as BMI, trunk fat%, fasting HDL cholesterol, and fasting insulin, the primary outcome of the GOTO study. We also showed that the baseline PLIS score indicated which participants respond positively to the intervention. Finally, we explored PLIS in an independent physical activity lifestyle intervention study, showing similar, albeit remarkably weaker, associations of PLIS with classical metabolic health markers. To conclude, we found that the sex-specific PLIS score was able to track the individual short-term metabolic health gain of the GOTO lifestyle intervention study. The methodology used to train the PLIS score potentially provides a useful instrument to track personal responses and predict the participant's health benefit in lifestyle interventions similar to the GOTO study. Show less
Carbo, E.C.; Sidorov, I.A.; Rijn-Klink, A.L. van; Pappas, N.; Boheemen, S. van; Mei, H.L.; ... ; Vries, J.J.C. de 2022
Viral metagenomics is increasingly applied in clinical diagnostic settings for detection of pathogenic viruses. While several benchmarking studies have been published on the use of metagenomic... Show moreViral metagenomics is increasingly applied in clinical diagnostic settings for detection of pathogenic viruses. While several benchmarking studies have been published on the use of metagenomic classifiers for abundance and diversity profiling of bacterial populations, studies on the comparative performance of the classifiers for virus pathogen detection are scarce. In this study, metagenomic data sets (n = 88) from a clinical cohort of patients with respiratory complaints were used for comparison of the performance of five taxonomic classifiers: Centrifuge, Clark, Kaiju, Kraken2, and Genome Detective. A total of 1144 positive and negative PCR results for a total of 13 respiratory viruses were used as gold standard. Sensitivity and specificity of these classifiers ranged from 83 to 100% and 90 to 99%, respectively, and was dependent on the classification level and data pre-processing. Exclusion of human reads generally resulted in increased specificity. Normalization of read counts for genome length resulted in a minor effect on overall performance, however it negatively affected the detection of targets with read counts around detection level. Correlation of sequence read counts with PCR Ct-values varied per classifier, data pre-processing (R-2 range 15.1-63.4%), and per virus, with outliers up to 3 log(10) reads magnitude beyond the predicted read count for viruses with high sequence diversity. In this benchmarking study, sensitivity and specificity were within the ranges of use for diagnostic practice when the cut-off for defining a positive result was considered per classifier. Show less
Lippold, S.; Ru, A.H. de; Nouta, J.; Veelen, P.A. van; Palmblad, M.; Wuhrer, M.; Haan, N. de 2020
Glycoproteomic data are often very complex, reflecting the high structural diversity of peptide and glycan portions. The use of glycopeptide-centered glycoproteomics by mass spectrometry is rapidly... Show moreGlycoproteomic data are often very complex, reflecting the high structural diversity of peptide and glycan portions. The use of glycopeptide-centered glycoproteomics by mass spectrometry is rapidly evolving in many research areas, leading to a demand in reliable data analysis tools. In recent years, several bioinformatic tools were developed to facilitate and improve both the identification and quantification of glycopeptides. Here, a selection of these tools was combined and evaluated with the aim of establishing a robust glycopeptide detection and quantification workflow targeting enriched glycoproteins. For this purpose, a tryptic digest from affinity-purified immunoglobulins G and A was analyzed on a nano-reversed-phase liquid chromatography-tandem mass spectrometry platform with a high-resolution mass analyzer and higher-energy collisional dissociation fragmentation. Initial glycopeptide identification based on MS/MS data was aided by the Byonic software. Additional MS1-based glycopeptide identification relying on accurate mass and retention time differences using Glycopeptide-GraphMS considerably expanded the set of confidently annotated glycopeptides. For glycopeptide quantification, the performance of LaCyTools was compared to Skyline, and GlycopeptideGraphMS. All quantification packages resulted in comparable glycosylation profiles but featured differences in terms of robustness and data quality control. Partial cysteine oxidation was identified as an unexpectedly abundant peptide modification and impaired the automated processing of several IgA glycopeptides. Finally, this study presents a semiautomated workflow for reliable glyco-proteomic data analysis by the combination of software packages for MS/MS- and MS1-based glycopeptide identification as well as the integration of analyte quality control and quantification. Show less
Introduction: The SARS-CoV-2 pandemic of 2020 is a prime example of the omnipresent threat of emerging viruses that can infect humans. A protocol for the identification of novel coronaviruses by... Show moreIntroduction: The SARS-CoV-2 pandemic of 2020 is a prime example of the omnipresent threat of emerging viruses that can infect humans. A protocol for the identification of novel coronaviruses by viral metagenomic sequencing in diagnostic laboratories may contribute to pandemic preparedness.Aim: The aim of this study is to validate a metagenomic virus discovery protocol as a tool for coronavirus pandemic preparedness.Methods: The performance of a viral metagenomic protocol in a clinical setting for the identification of novel coronaviruses was tested using clinical samples containing SARS-CoV-2, SARS-CoV, and MERS-CoV, in combination with databases generated to contain only viruses of before the discovery dates of these coronaviruses, to mimic virus discovery.Results: Classification of NGS reads using Centrifuge and Genome Detective resulted in assignment of the reads to the closest relatives of the emerging coronaviruses. Low nucleotide and amino acid identity (81% and 84%, respectively, for SARS-CoV-2) in combination with up to 98% genome coverage were indicative for a related, novel coronavirus. Capture probes targeting vertebrate viruses, designed in 2015, enhanced both sequencing depth and coverage of the SARS-CoV-2 genome, the latter increasing from 71% to 98%.Conclusion: The model used for simulation of virus discovery enabled validation of the metagenomic sequencing protocol. The metagenomic protocol with virus probes designed before the pandemic, can assist the detection and identification of novel coronaviruses directly in clinical samples. Show less
The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the... Show moreThe corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools. Show less
Raz, Y.; Akker, E.B. van den; Roest, T.; Riaz, M.; Rest, O. van de; Suchiman, H.E.D.; ... ; Slagboom, P.E. 2020
Skeletal muscles control posture, mobility and strength, and influence whole-body metabolism. Muscles are built of different types of myofibers, each having specific metabolic, molecular, and... Show moreSkeletal muscles control posture, mobility and strength, and influence whole-body metabolism. Muscles are built of different types of myofibers, each having specific metabolic, molecular, and contractile properties. Fiber classification is, therefore, regarded the key for understanding muscle biology, (patho-) physiology. The expression of three myosin heavy chain (MyHC) isoforms, MyHC-1, MyHC-2A, and MyHC-2X, marks myofibers in humans. Typically, myofiber classification is performed by an eye-based histological analysis. This classical approach is insufficient to capture complex fiber classes, expressing more than one MyHC-isoform. We, therefore, developed a methodological procedure for high-throughput characterization of myofibers on the basis of multiple isoforms. The mean fluorescence intensity of the three most abundant MyHC isoforms was measured per myofiber in muscle biopsies of 56 healthy elderly adults, and myofiber classes were identified using computational biology tools. Unsupervised clustering revealed the existence of six distinct myofiber clusters. A comparison with the visual assessment of myofibers using the same images showed that some of these myofiber clusters could not be detected or were frequently misclassified. The presence of these six clusters was reinforced by RNA expressions levels of sarcomeric genes. In addition, one of the clusters, expressing all three MyHC isoforms, correlated with histological measures of muscle health. To conclude, this methodological procedure enables deep characterization of the complex muscle heterogeneity. This study opens opportunities to further investigate myofiber composition in comparative studies. Show less
When studying the microbiome using next-generation sequencing, the DNA extraction method, sequencing procedures, and bioinformatic processing are crucial to obtain reliable data. Method choice has... Show moreWhen studying the microbiome using next-generation sequencing, the DNA extraction method, sequencing procedures, and bioinformatic processing are crucial to obtain reliable data. Method choice has been demonstrated to strongly affect the final biological interpretation. We assessed the performance of three DNA extraction methods and two bioinformatic pipelines for bacterial microbiota profiling through 16S rRNA gene amplicon sequencing, using positive and negative controls for DNA extraction and sequencing and eight different types of high- or low-biomass samples. Performance was evaluated based on quality control passing, DNA yield, richness, diversity, and compositional profiles. All DNA extraction methods retrieved the theoretical relative bacterial abundance with a maximum 3-fold change, although differences were seen between methods, and library preparation and sequencing induced little variation. Bioinformatic pipelines showed different results for observed richness, but diversity and compositional profiles were comparable. DNA extraction methods were successful for feces and oral swabs, and variation induced by DNA extraction methods was lower than intersubject (biological) variation. For low-biomass samples, a mixture of genera present in negative controls and sample-specific genera, possibly representing biological signal, were observed. We conclude that the tested bioinformatic pipelines perform equally, with pipeline-specific advantages and disadvantages. Two out of three extraction methods performed equally well, while one method was less accurate regarding retrieval of compositional profiles. Lastly, we again demonstrate the importance of including negative controls when analyzing low-bacterial-biomass samples.IMPORTANCE Method choice throughout the workflow of a microbiome study, from sample collection to DNA extraction and sequencing procedures, can greatly affect results. This study evaluated three different DNA extraction methods and two bioinformatic pipelines by including positive and negative controls and various biological specimens. By identifying an optimal combination of DNA extraction method and bioinformatic pipeline use, we hope to contribute to increased methodological consistency in microbiota studies. Our methods were applied not only to commonly studied samples for microbiota analysis, e.g., feces, but also to more rarely studied, low-biomass samples. Microbiota composition profiles of low-biomass samples (e.g., urine and tumor biopsy specimens) were not always distinguishable from negative controls, or showed partial overlap, confirming the importance of including negative controls in microbiota studies, especially when low bacterial biomass is expected. Show less
White, S.J.; Laros, J.F.J.; Bakker, E.; Cambon-Thomsen, A.; Eden, M.; Leonard, S.; ... ; Dunnen, J.T. den 2017
Bottom-up glycoproteomics by liquid chromatography-mass spectrometry (LC-MS) is an established approach for assessing glycosylation in a protein- and site-specific manner. Consequently, tools are... Show moreBottom-up glycoproteomics by liquid chromatography-mass spectrometry (LC-MS) is an established approach for assessing glycosylation in a protein- and site-specific manner. Consequently, tools are needed to automatically align, calibrate, and integrate LC-MS glycoproteomics data. We developed a modular software package designed to tackle the individual aspects of an LC-MS experiment, called LaCyTools. Targeted alignment is performed using user defined m/z and retention time (t(r)) combinations. Subsequently, sum spectra are created for each user defined analyte group. Quantitation is performed on the sum spectra, where each user defined analyte can have its own tr, minimum, and maximum charge states. Consequently, LaCyTobls deals with multiple charge states, which gives an output per charge state if desired, and offers various analyte and spectra quality criteria. We compared throughput and performance of LaCyTools to combinations of available tools that deal with individual processing steps. LaCyTools yielded relative quantitation of equal precision (relative standard deviation <0.5%) and higher trueness due to the use of MS peak area instead of MS peak intensity. In conclusion, LaCyTools is an accurate automated data processing tool for high-throughput analysis of LC-MS glycoproteomics data. Released under the Apache 2.0 license, it is freely available on GitHub (https://github.com/Tarskin/LaCyTools). Show less