This thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies... Show moreThis thesis focuses on data found in the field of computational drug discovery. New insight can be obtained by applying machine learning in various ways and in a variety of domains. Two studies delved into the application of proteochemometrics (PCM), a machine learning technique that can be used to find relations in protein-ligand bioactivity data and then predict using a virtual screen whether compounds that had never been tested on a particular protein, or set of proteins. With this, sets of compounds were suggested for experimental validation that were significant in a myriad of ways. Another study investigated the mutational patterns in cancer, applying a large dataset of mutation data and identifying several motifs in G protein-coupled receptors. The thesis also contains the work done on the Papyrus dataset, a large scale bioactivity dataset that focuses on standardising data for computational drug discovery and providing an out-of-the-box set that can be used in a variety of settings. Show less
Gorostiola González, M.; Broek, R.L. van den; Braun, T.G.M.; Chatzopoulou Chatzi A.; Jespers, W.; IJzerman, A.P.; ... ; Westen, G.J.P. van 2023
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In... Show moreProteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology. Show less
This thesis describes the importance of being able to control the selectivity of potential drug candidates. It explains how computational models are employed to predict and rationalize compound... Show moreThis thesis describes the importance of being able to control the selectivity of potential drug candidates. It explains how computational models are employed to predict and rationalize compound-protein binding (affinity) and therewith, selectivity of compounds. Moreover, it shows that selectivity can purposely be tuned to target either a single protein or an entire panel of proteins. The challenges of selectivity modeling are addressed based on case studies in the sodium-dependent glucose co-transporters, G protein-coupled receptors, and kinases. Show less
Burggraaff, L.; Oranje, P.; Gouka, R.; Pijl, P. van der; Geldof, M.; Vlijmen, H.W.T. van; ... ; Westen, G.J.P. van 2019
Over the last decades several disciplines relevant to medicinal chemistry and preclinical drug discovery have made gigantic leaps; this includes chemistry, biology and measurement of bioactivity.... Show moreOver the last decades several disciplines relevant to medicinal chemistry and preclinical drug discovery have made gigantic leaps; this includes chemistry, biology and measurement of bioactivity. Better techniques have led to massive amounts of data. Moreover, sources of chemical and bioactivity data have become available in the public domain. Hence there is a need for new techniques combining and mining these data sources. This thesis focuses on computational methods combining data from these disciplines and demonstrates that the sum of these methods leads to better quality predictions than models using the individual data sources. One of the techniques central in this thesis is proteochemometric modeling, a machine learning approach linking chemical descriptors and protein descriptors to a biologically relevant output variable. This output variable describes the activity of molecules on biological macromolecules and hence proteochemometric models can make relevant predictions for both unseen molecules and unseen macromolecules (e.g. novel viral mutants). Secondly we present a novel technique that is able to combine information from multiple crystal structures in such a way that shared and unique pharmacophoric features can be isolated and visualized. Approaches presented here have been validated prospectively and have been shown to be widely applicable. Show less