MotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We... Show moreMotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.ResultsWe developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.Availability and implementationThe PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Show less
Palmblad, M.; Lamprecht, A.L.; Ison, J.; Schwammle, V. 2019
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased... Show moreIn constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. We apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks. Show less
Taylor, S.L.; Ruhaak, L.R.; Weiss, R.H.; Kelly, K.; Kim, K. 2017
Elementary flux mode (EFM) analysis is a powerful technique for determining the metabolic capacities and robustness of stoichiometric networks. Recently, several improvements have been made to the... Show moreElementary flux mode (EFM) analysis is a powerful technique for determining the metabolic capacities and robustness of stoichiometric networks. Recently, several improvements have been made to the algorithm for enumerating the EFMs, making the study of large models possible. However, currently these tools require high performance workstations to perform large-scale EFM computations, thus limiting their applicability. We developed a more time and memory efficient implementation of the algorithm for EFM enumeration in MATLAB, called FluxModeCalculator, which enables large-scale EFM computation on ordinary desktop computers. Show less
Kroon, M.; Lameijer, E.W.; Lakenberg, N.; Hehir-Kwa, J.Y.; Thung, D.T.; Slagboom, P.E.; ... ; Ye, K. 2016
MOTIVATION RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing... Show moreMOTIVATION RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. RESULTS We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. AVAILABILITY The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion CONTACT y.zhang@lumc.nl; k.ye@lumc.nl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Show less
Zhang, Y.J.; Lameijer, E.W.; Hoen, P.A.C. 't; Ning, Z.M.; Slagboom, P.E.; Ye, K. 2012