While the entirety of 'Chemical Space' is huge (and assumed to contain between 10(63) and 10(200) 'small molecules'), distinct subsets of this space can nonetheless be defined according to certain... Show moreWhile the entirety of 'Chemical Space' is huge (and assumed to contain between 10(63) and 10(200) 'small molecules'), distinct subsets of this space can nonetheless be defined according to certain structural parameters. An example of such a subspace is the chemical space spanned by endogenous metabolites, defined as 'naturally occurring' products of an organisms' metabolism. In order to understand this part of chemical space in more detail, we analyzed the chemical space populated by human metabolites in two ways. Firstly, in order to understand metabolite space better, we performed Principal Component Analysis (PCA), hierarchical clustering and scaffold analysis of metabolites and non-metabolites in order to analyze which chemical features are characteristic for both classes of compounds. Here we found that heteroatom (both oxygen and nitrogen) content, as well as the presence of particular ring systems was able to distinguish both groups of compounds. Secondly, we established which molecular descriptors and classifiers are capable of distinguishing metabolites from non-metabolites, by assigning a 'metabolite-likeness' score. It was found that the combination of MDL Public Keys and Random Forest exhibited best overall classification performance with an AUC value of 99.13%, a specificity of 99.84% and a selectivity of 88.79%. This performance is slightly better than previous classifiers; and interestingly we found that drugs occupy two distinct areas of metabolite-likeness, the one being more 'synthetic' and the other being more 'metabolite-like'. Also, on a truly prospective dataset of 457 compounds, 95.84% correct classification was achieved. Overall, we are confident that we contributed to the tasks of classifying metabolites, as well as to understanding metabolite chemical space better. This knowledge can now be used in the development of new drugs that need to resemble metabolites, and in our work particularly for assessing the metabolite-likeness of candidate molecules during metabolite identification in the metabolomics field. Show less
Profiling of metabolites is increasingly used to study the functioning of biological systems. For some studies the volume of available samples is limited to only a few microliters or even less, for... Show moreProfiling of metabolites is increasingly used to study the functioning of biological systems. For some studies the volume of available samples is limited to only a few microliters or even less, for fluids such as cerebrospinal fluid (CSF) of small animals like mice or the analysis of individual oocytes. Here we present an analytical method using in-liner silylation coupled to gas chromatography/mass spectrometry (GC/MS), that is suitable for metabolic profiling in ultrasmall sample volumes of 2 mu L down to 10 nL. Method performance was assessed in various biosamples. Derivatization efficiencies for sugars, organic acids, and amino acids were satisfactory (105-120%), and repeatabilities were generally better than 15%, except for amino acids that had repeatabilities up to about 35-40%. For endogenous sugars and organic acids in fetal bovine serum, the response was linear for aliquots from 10 nL up to at least 1 mu L. The developed GC/MS method was applied for the analysis of different sample matrixes, i.e., fetal bovine serum, mouse CSF, and aliquots of the intracellular content of Xenopus laevis oocytes. To the best of our knowledge, we present here the first comprehensive GUMS metabolite profiles from mouse CSF and from the intracellular content of a single X. laevis oocyte. Show less