The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing... Show moreThe inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines. Show less
Varunjikar, M.S.; Bohn, T.; Sanden, M.; Belghit, I.; Pineda-Pampliega, J.; Palmblad, M.; ... ; Rasinger, J.D. 2023
The present study compared genetically modified (GM) crops with crops from different farming practices using high-resolution tandem mass spectrometry (HR-MS) and proteomics bioinformatics tools. In... Show moreThe present study compared genetically modified (GM) crops with crops from different farming practices using high-resolution tandem mass spectrometry (HR-MS) and proteomics bioinformatics tools. In a previously pub-lished study, a number of significant differences regarding nutritional and elemental composition between a selection of GM, non-GM conventionally farmed, and organic soybeans have been found. In the present study, the proteome-level equivalence of the same samples was assessed using HR-MS. Direct comparison of tandem mass spectra and bottom-up proteomics bioinformatics indicated that proteomes of all samples investigated were very similar overall, with only a few distinct protein expression clusters obtained for GM and organic samples. Standard bottom-up proteome analyses identified 1025 soy proteins; of these 39 were found to be differentially expressed (p < 0.01) between GM, non-GM conventionally farmed, and organically farmed soybeans. Subsequent bioinformatics analyses of these proteins highlighted several potentially affected biochemical pathways that could contribute to the compositional differences reported earlier. In addition, protein markers separating conventionally, and organically farmed soybean seeds were found and peptide markers for the detection of GM soy in food and feed samples are described. Taken together, the data presented here shows that HR-MS based proteomics approaches can be used for the detection of transgenic events in food and feed grade soy, the dif-ferentiation of organically and conventionally farmed plants, and provide mechanistic explanations of effects observed on the phenotypic level of GM plants. HR-MS and proteomic bioinformatics thus should be considered key tools when developing molecular panel approaches for detection and safety assessments of novel crop va-rieties destined for use in feed and food. Show less
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine... Show moreIn recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research. Show less
Capillary electrophoresis has matured into a highly sensitive and widely applied analytical method over the last forty years. Here we combine text mining and computational chemistry to paint, with... Show moreCapillary electrophoresis has matured into a highly sensitive and widely applied analytical method over the last forty years. Here we combine text mining and computational chemistry to paint, with very broad strokes, the applicability and trends in the scientific literature on capillary electrophoresis, simulta-neously demonstrating that this is not only possible, but reveal both expected and unexpected details of this history. All software and data are freely available on GitHub (https://github.com/ReinV/SCOPE) and OSF (https://osf.io/e56zt/). (c) 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less
Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC)... Show moreData set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML. Show less
We have explored the history of the field of capillary electrophoresis using bibliometric methods. The analysis shows that 416 prolific researchers are connected in a single, large, co-authorship... Show moreWe have explored the history of the field of capillary electrophoresis using bibliometric methods. The analysis shows that 416 prolific researchers are connected in a single, large, co-authorship network based on publications on capillary electrophoresis between 1980 and 2021, with a few pioneers having remained active throughout much of this time period. Looking at research topics revealed electro-chemistry, sensors, nanotechnology and metabolomics as 'hot' topics, with fundamental method development being more 'mature', and reveal that capillary electrophoresis technology have matured over a 30-year time period, with research efforts moving from separations to quantitative measurements to biomedical applications. The citation patterns showed the strongest coupling between journals of similar scope. Interactive versions of the bibliometric network visualizations are available on-line at https://tinyurl.com/2z7q7wcx (researcher co-authorship network), https://tinyurl.com/2jmhsgxx (research topic network) and https://tinyurl.com/2lnfzzgn (journal bibliographic coupling citation network).(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less
We have explored the history of the field of capillary electrophoresis using bibliometric methods. The analysis shows that 416 prolific researchers are connected in a single, large, co-authorship... Show moreWe have explored the history of the field of capillary electrophoresis using bibliometric methods. The analysis shows that 416 prolific researchers are connected in a single, large, co-authorship network based on publications on capillary electrophoresis between 1980 and 2021, with a few pioneers having remained active throughout much of this time period. Looking at research topics revealed electrochemistry, sensors, nanotechnology and metabolomics as ‘hot’ topics, with fundamental method development being more ‘mature’, and reveal that capillary electrophoresis technology have matured over a 30-year time period, with research efforts moving from separations to quantitative measurements to biomedical applications. The citation patterns showed the strongest coupling between journals of similar scope. Interactive versions of the bibliometric network visualizations are available on-line at https://tinyurl.com/2z7q7wcx (researcher co-authorship network), https://tinyurl.com/2jmhsgxx (research topic network) and https://tinyurl.com/2lnfzzgn (journal bibliographic coupling citation network). Show less
Ultrahigh resolution mass spectrometry (UHR-MS) coupled with direct infusion (DI) electrospray ionization offers a fast solution for accurate untargeted profiling. Fourier transform ion cyclotron... Show moreUltrahigh resolution mass spectrometry (UHR-MS) coupled with direct infusion (DI) electrospray ionization offers a fast solution for accurate untargeted profiling. Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometers have been shown to produce a wealth of insights into complex chemical systems because they enable unambiguous molecular formula assignment even if the vast majority of signals is of unknown identity. Interlaboratory comparisons are required to apply this type of instrumentation in quality control (for food industry or pharmaceuticals), large-scale environmental studies, or clinical diagnostics. Extended comparisons employing different FT-ICR MS instruments with qualitative direct infusion analysis are scarce since the majority of detected compounds cannot be quantified. The extent to which observations can be reproduced by different laboratories remains unknown. We set up a preliminary study which encompassed a set of 17 laboratories around the globe, diverse in instrumental characteristics and applications, to analyze the same sets of extracts from commercially available standard human blood plasma and Standard Reference Material (SRM) for blood plasma (SRM1950), which were delivered at different dilutions or spiked with different concentrations of pesticides. The aim of this study was to assess the extent to which the outputs of differently tuned FT-ICR mass spectrometers, with different technical specifications, are comparable for setting the frames of a future DI-FT-ICR MS ring trial. We concluded that a cluster of five laboratories, with diverse instrumental characteristics, showed comparable and representative performance across all experiments, setting a reference to be used in a future ring trial on blood plasma. Show less
Palmblad, M.; Asein, E.; Bergman, N.P.; Ivanova, A.; Ramasauskas, L.; Reyes, H.M.; ... ; Bergquist, J. 2022
A major obstacle for reusing and integrating existing data is finding the data that is most relevant in a given context. The primary metadata resource is the scientific literature describing the... Show moreA major obstacle for reusing and integrating existing data is finding the data that is most relevant in a given context. The primary metadata resource is the scientific literature describing the experiments that produced the data. To stimulate the development of natural language processing methods for extracting this information from articles, we have manually annotated 100 recent open access publications in Analytical Chemistry as semantic graphs. We focused on articles mentioning mass spectrometry in their experimental sections, as we are particularly interested in the topic, which is also within the domain of several ontologies and controlled vocabularies. The resulting gold standard dataset is publicly available and directly applicable to validating automated methods for retrieving this metadata from the literature. In the process, we also made a number of observations on the structure and description of experiments and open access publication in this journal. Show less
It has long been known that biological species can be identified from mass spectrometry data alone. Ten years ago, we described a method and software tool, compareMS2, for calculating a distance... Show moreIt has long been known that biological species can be identified from mass spectrometry data alone. Ten years ago, we described a method and software tool, compareMS2, for calculating a distance between sets of tandem mass spectra, as routinely collected in proteomics. This method has seen use in species identification and mixture characterization in food and feed products, as well as other applications. Here, we present the first major update of this software, including a new metric, a graphical user interface and additional functionality. The data have been deposited to ProteomeXchange with dataset identifier PXD034932. Show less
Driessen, M.; Plas-Duijvesteijn, S. van der; Kienhuis, A.S.; Brandhof, E.J. van den; Roodbergen, M.; Water, B. van de; ... ; Pennings, J.L.A. 2022
The zebrafish embryo (ZFE) is a promising alternative non-rodent model in toxicology, and initial studies suggested its applicability in detecting hepatic responses related to drug-induced liver... Show moreThe zebrafish embryo (ZFE) is a promising alternative non-rodent model in toxicology, and initial studies suggested its applicability in detecting hepatic responses related to drug-induced liver injury (DILI). Here, we hypothesize that detailed analysis of underlying mechanisms of hepatotoxicity in ZFE contributes to the improved identification of hepatotoxic properties of compounds and to the reduction of rodents used for hepatotoxicity assessment. ZFEs were exposed to nine reference hepatotoxicants, targeted at induction of steatosis, cholestasis, and necrosis, and effects compared with negative controls. Protein profiles of the individual compounds were generated using LC-MS/MS. We identified differentially expressed proteins and pathways, but as these showed considerable overlap, phenotype-specific responses could not be distinguished. This led us to identify a set of common hepatotoxicity marker proteins. At the pathway level, these were mainly associated with cellular adaptive stress-responses, whereas single proteins could be linked to common hepatotoxicity-associated processes. Applying several stringency criteria to our proteomics data as well as information from other data sources resulted in a set of potential robust protein markers, notably Igf2bp1, Cox5ba, Ahnak, Itih3b.2, Psma6b, Srsf3a, Ces2b, Ces2a, Tdo2b, and Anxa1c, for the detection of adverse responses. Show less
Driessen, M.; Plas-Duivesteijn, S. van der; Kienhuis, A.S.; Brandhof, E.J. van den; Roodbergen, M.; Water, B. van de; ... ; Pennings, J.L.A. 2022
The zebrafish embryo (ZFE) is a promising alternative non-rodent model in toxicology, and initial studies suggested its applicability in detecting hepatic responses related to drug-induced liver... Show moreThe zebrafish embryo (ZFE) is a promising alternative non-rodent model in toxicology, and initial studies suggested its applicability in detecting hepatic responses related to drug-induced liver injury (DILI). Here, we hypothesize that detailed analysis of underlying mechanisms of hepatotoxicity in ZFE contributes to the improved identification of hepatotoxic properties of compounds and to the reduction of rodents used for hepatotoxicity assessment. ZFEs were exposed to nine reference hepatotoxicants, targeted at induction of steatosis, cholestasis, and necrosis, and effects compared with negative controls. Protein profiles of the individual compounds were generated using LC-MS/MS. We identified differentially expressed proteins and pathways, but as these showed considerable overlap, phenotype-specific responses could not be distinguished. This led us to identify a set of common hepatotoxicity marker proteins. At the pathway level, these were mainly associated with cellular adaptive stress-responses, whereas single proteins could be linked to common hepatotoxicity-associated processes. Applying several stringency criteria to our proteomics data as well as information from other data sources resulted in a set of potential robust protein markers, notably Igf2bp1, Cox5ba, Ahnak, Itih3b.2, Psma6b, Srsf3a, Ces2b, Ces2a, Tdo2b, and Anxa1c, for the detection of adverse responses. Show less
Palmblad, M.; Bocker, S.; Degroeve, S.; Kohlbacher, O.; Kall, L.; Noble, W.S.; Wilhelm, M. 2022
Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility,... Show moreMachine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics. Show less
Untargeted proteomics can contribute to composition and authenticity analyses of highly processed mixed food and feed products. Here, we present the setup of an analytical flow tandem mass... Show moreUntargeted proteomics can contribute to composition and authenticity analyses of highly processed mixed food and feed products. Here, we present the setup of an analytical flow tandem mass spectrometry method (AF-HPLC HR-MS) for analysis of insect meal from five different species. Data acquired were compared with previously published data employing spectra matching and standard bottom-up proteomics bioinformatics analyses. In addition, data were screened for insect species marker peptides and common allergens, respectively. The results obtained indicate that the performance of the newly established AF-HPLC HR-MS workflow is in line with previously published methods for insect species differentiation. Data obtained in the present study, also lead to the discovery of novel markers for the development of targeted MS analyses of insect species in food-and feed mixes and highlighted that known allergen such as arginine kinase or tropomyosin were consistently detected across all five species tested. Show less
Replacement of high-value fish species with cheaper varieties or mislabelling of food unfit for human con-sumption is a global problem violating both consumers' rights and safety. For... Show moreReplacement of high-value fish species with cheaper varieties or mislabelling of food unfit for human con-sumption is a global problem violating both consumers' rights and safety. For distinguishing fish species in pure samples, DNA approaches are available; however, authentication and quantification of fish species in mixtures remains a challenge. In the present study, a novel high-throughput shotgun DNA sequencing approach applying masked reference libraries was developed and used for authentication and abundance calculations of fish species in mixed samples. Results demonstrate that the analytical protocol presented here can discriminate and predict relative abundances of different fish species in mixed samples with high accuracy. In addition to DNA analyses, shotgun proteomics tools based on direct spectra comparisons were employed on the same mixture. Similar to the DNA approach, the identification of individual fish species and the estimation of their respective relative abundances in a mixed sample also were feasible. Furthermore, the data obtained indicated that DNA sequencing using masked libraries predicted species-composition of the fish mixture with higher specificity, while at a taxonomic family level, relative abundances of the different species in the fish mixture were predicted with slightly higher accuracy using proteomics tools. Taken together, the results demonstrate that both DNA and protein-based approaches presented here can be used to efficiently tackle current challenges in feed and food authentication analyses. Show less
Lamprecht, A.L.; Palmblad, M.; Ison, J.; Schwämmle, V.; Al Manir, M.S.; Altinas, I.; ... ; Wolstencroft, K.J. 2021
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition... Show moreScientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future. Show less
Science is full of overlooked and undervalued research waiting to be rediscovered. Proteomics is no exception. In this perspective, we follow the ripples from a 1960 study of Zuckerkandl, Jones,... Show moreScience is full of overlooked and undervalued research waiting to be rediscovered. Proteomics is no exception. In this perspective, we follow the ripples from a 1960 study of Zuckerkandl, Jones, and Pauling comparing tryptic peptides across animal species. This pioneering work directly led to the molecular clock hypothesis and the ensuing explosion in molecular phylogenetics. In the decades following, proteins continued to provide essential clues on evolutionary history. While technology has continued to improve, contemporary proteomics has strayed from this larger biological context, rarely comparing species or asking how protein structure, function, and interactions have evolved. Here we recombine proteomics with molecular phylogenetics, highlighting the value of framing proteomic results in a larger biological context and how almost forgotten research, though technologically surpassed, can still generate new ideas and illuminate our work from a different perspective. Though it is infeasible to read all research published on a large topic, looking up older papers can be surprisingly rewarding when rediscovering a "gem" at the end of a long citation chain, aided by digital collections and perpetually helpful librarians. Proper literature study reduces unnecessary repetition and allows research to be more insightful and impactful by truly standing on the shoulders of giants. Show less
In the present study, we assessed if different legacy and novel molecular analyses approaches can detect and trace prohibited bovine material in insects reared to produce processed animal protein ... Show moreIn the present study, we assessed if different legacy and novel molecular analyses approaches can detect and trace prohibited bovine material in insects reared to produce processed animal protein (PAP). Newly hatched black soldier fly (BSF) larvae were fed one of the four diets for seven days; a control feeding medium (Ctl), control feed spiked with bovine hemoglobin powder (BvHb) at 1% (wet weight, w/w) (BvHb 1%, w/w), 5% (BvHb 5%, w/w) and 10% (BvHb 10%, w/w). Another dietary group of BSF larvae, namely *BvHb 10%, was first grown on BvHb 10% (w/w), and after seven days separated from the residual material and placed in another container with control diet for seven additional days. Presence of ruminant material in insect feed and in BSF larvae was assessed in five different laboratories using (i) real time-PCR analysis, (ii) multi-target ultra-high performance liquid chromatography coupled to tandem mass spectrometry (UHPLC-MS/MS), (iii) protein-centric immunoaffinity-LC-MS/MS, (iv) peptide-centric immunoaffinity-LC-MS/MS, (v) tandem mass spectral library matching (SLM), and (vi) compound specific amino acid analysis (CSIA). All methods investigated detected ruminant DNA or BvHb in specific insect feed media and in BSF larvae, respectively. However, each method assessed, displayed distinct shortcomings, which precluded detection of prohibited material versus non prohibited ruminant material in some instances. Taken together, these findings indicate that detection of prohibited material in the insect-PAP feed chain requires a tiered combined use of complementary molecular analysis approaches. We therefore advocate the use of a combined multi-tier molecular analysis suite for the detection, differentiation and tracing of prohibited material in insect-PAP based feed chains and endorse ongoing efforts to extend the currently available battery of PAP detection approaches with MS based techniques and possibly delta C-13(AA) fingerprinting. Show less