Excitatory amino acid transporters (EAATs) are important regulators of amino acid transport and in particular glutamate. Recently, more interest has arisen in these transporters in the context of... Show moreExcitatory amino acid transporters (EAATs) are important regulators of amino acid transport and in particular glutamate. Recently, more interest has arisen in these transporters in the context of neurodegenerative diseases. This calls for ways to modulate these targets to drive glutamate transport, EAAT2 and EAAT3 in particular. Several inhibitors (competitive and noncompetitive) exist to block glutamate transport; however, activators remain scarce. Recently, GT949 was proposed as a selective activator of EAAT2, as tested in a radioligand uptake assay. In the presented research, we aimed to validate the use of GT949 to activate EAAT2-driven glutamate transport by applying an innovative, impedance-based, whole-cell assay (xCELLigence). A broad range of GT949 concentrations in a variety of cellular environments were tested in this assay. As expected, no activation of EAAT3 could be detected. Yet, surprisingly, no biological activation of GT949 on EAAT2 could be observed in this assay either. To validate whether the impedance-based assay was not suited to pick up increased glutamate uptake or if the compound might not induce activation in this setup, we performed radioligand uptake assays. Two setups were utilized; a novel method compared to previously published research, and in a reproducible fashion copying the methods used in the existing literature. Nonetheless, activation of neither EAAT2 nor EAAT3 could be observed in these assays. Furthermore, no evidence of GT949 binding or stabilization of purified EAAT2 could be observed in a thermal shift assay. To conclude, based on experimental evidence in the present study GT949 requires specific assay conditions, which are difficult to reproduce, and the compound cannot simply be classified as an activator of EAAT2 based on the presented evidence. Hence, further research is required to develop the tools needed to identify new EAAT modulators and use their potential as a therapeutic target. Show less
Gorostiola Gonzalez, M.; Sijben, H.J.; Dall' Acqua, L.; Liu, R.; IJzerman, A.P.; Heitman, L.H.; Westen, G.J.P. van 2023
Oxidative stress is the consequence of an abnormal increase of reactive oxygen species (ROS). ROS are generated mainly during the metabolism in both normal and pathological conditions as well as... Show moreOxidative stress is the consequence of an abnormal increase of reactive oxygen species (ROS). ROS are generated mainly during the metabolism in both normal and pathological conditions as well as from exposure to xenobiotics. Xenobiotics can, on the one hand, disrupt molecular machinery involved in redox processes and, on the other hand, reduce the effectiveness of the antioxidant activity. Such dysregulation may lead to oxidative damage when combined with oxidative stress overpassing the cell capacity to detoxify ROS. In this work, a green fluorescent protein (GFP)-tagged nuclear factor erythroid 2-related factor 2 (NRF2)-regulated sulfiredoxin reporter (Srxn1-GFP) was used to measure the antioxidant response of HepG2 cells to a large series of drug and drug-like compounds (2230 compounds). These compounds were then classified as positive or negative depending on cellular response and distributed among different modeling groups to establish structure-activity relationship (SAR) models. A selection of models was used to prospectively predict oxidative stress induced by a new set of compounds subsequently experimentally tested to validate the model predictions. Altogether, this exercise exemplifies the different challenges of developing SAR models of a phenotypic cellular readout, model combination, chemical space selection, and results interpretation. Show less
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial... Show moreDevelopments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation. Show less
Fluorescence-guided surgery (FGS) can play a key role in improving radical resection rates by assisting surgeons to gain adequate visualization of malignant tissue intraoperatively. Designed... Show moreFluorescence-guided surgery (FGS) can play a key role in improving radical resection rates by assisting surgeons to gain adequate visualization of malignant tissue intraoperatively. Designed ankyrin repeat proteins (DARPins) possess optimal pharmacokinetic and other properties for in vivo imaging. This study aims to evaluate the preclinical potential of epithelial cell adhesion molecule (EpCAM)-binding DARPins as targeting moieties for near-infrared fluorescence (NIRF) and photoacoustic (PA) imaging of cancer. EpCAM-binding DARPins Ac2, Ec4.1, and non-binding control DARPin Off7 were conjugated to IRDye 800CW and their binding efficacy was evaluated on EpCAM-positive HT-29 and EpCAM-negative COLO-320 human colon cancer cell lines. Thereafter, NIRF and PA imaging of all three conjugates were performed in HT-29_luc2 tumor-bearing mice. At 24 h post-injection, tumors and organs were resected and tracer biodistributions were analyzed. Ac2-800CW and Ec4.1-800CW specifically bound to HT-29 cells, but not to COLO-320 cells. Next, 6 nmol and 24 h were established as the optimal in vivo dose and imaging time point for both DARPin tracers. At 24 h post-injection, mean tumor-to-background ratios of 2.60 ± 0.3 and 3.1 ± 0.3 were observed for Ac2-800CW and Ec4.1-800CW, respectively, allowing clear tumor delineation using the clinical Artemis NIRF imager. Biodistribution analyses in non-neoplastic tissue solely showed high fluorescence signal in the liver and kidney, which reflects the clearance of the DARPin tracers. Our encouraging results show that EpCAM-binding DARPins are a promising class of targeting moieties for pan-carcinoma targeting, providing clear tumor delineation at 24 h post-injection. The work described provides the preclinical foundation for DARPin-based bimodal NIRF/PA imaging of cancer. Show less
PurposeFluorescence-guided surgery (FGS) can play a key role in improving radical resection rates by assisting surgeons to gain adequate visualization of malignant tissue intraoperatively. Designed... Show morePurposeFluorescence-guided surgery (FGS) can play a key role in improving radical resection rates by assisting surgeons to gain adequate visualization of malignant tissue intraoperatively. Designed ankyrin repeat proteins (DARPins) possess optimal pharmacokinetic and other properties for in vivo imaging. This study aims to evaluate the preclinical potential of epithelial cell adhesion molecule (EpCAM)-binding DARPins as targeting moieties for near-infrared fluorescence (NIRF) and photoacoustic (PA) imaging of cancer.MethodsEpCAM-binding DARPins Ac2, Ec4.1, and non-binding control DARPin Off7 were conjugated to IRDye 800CW and their binding efficacy was evaluated on EpCAM-positive HT-29 and EpCAM-negative COLO-320 human colon cancer cell lines. Thereafter, NIRF and PA imaging of all three conjugates were performed in HT-29_luc2 tumor-bearing mice. At 24 h post-injection, tumors and organs were resected and tracer biodistributions were analyzed.ResultsAc2-800CW and Ec4.1-800CW specifically bound to HT-29 cells, but not to COLO-320 cells. Next, 6 nmol and 24 h were established as the optimal in vivo dose and imaging time point for both DARPin tracers. At 24 h post-injection, mean tumor-to-background ratios of 2.60 & PLUSMN; 0.3 and 3.1 & PLUSMN; 0.3 were observed for Ac2-800CW and Ec4.1-800CW, respectively, allowing clear tumor delineation using the clinical Artemis NIRF imager. Biodistribution analyses in non-neoplastic tissue solely showed high fluorescence signal in the liver and kidney, which reflects the clearance of the DARPin tracers.ConclusionOur encouraging results show that EpCAM-binding DARPins are a promising class of targeting moieties for pan-carcinoma targeting, providing clear tumor delineation at 24 h post-injection. The work described provides the preclinical foundation for DARPin-based bimodal NIRF/PA imaging of cancer. Show less
Gorostiola González, M.; Broek, R.L. van den; Braun, T.G.M.; Chatzopoulou Chatzi A.; Jespers, W.; IJzerman, A.P.; ... ; Westen, G.J.P. van 2023
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In... Show moreProteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology. Show less
Beerkens, B.L.H.; Snijders, I.M.; Snoeck, J.; Liu, R.; Tool, A.T.J.; Le Dévédec, S.E.; ... ; Es, D. van der 2023
The adenosine A3 receptor (A3AR) is a G protein-coupled receptor (GPCR) that exerts immunomodulatory effects in pathophysiological conditions such as inflammation and cancer. Thus far, studies... Show moreThe adenosine A3 receptor (A3AR) is a G protein-coupled receptor (GPCR) that exerts immunomodulatory effects in pathophysiological conditions such as inflammation and cancer. Thus far, studies toward the downstream effects of A3AR activation have yielded contradictory results, thereby motivating the need for further investigations. Various chemical and biological tools have been developed for this purpose, ranging from fluorescent ligands to antibodies. Nevertheless, these probes are limited by their reversible mode of binding, relatively large size, and often low specificity. Therefore, in this work, we have developed a clickable and covalent affinity-based probe (AfBP) to target the human A3AR. Herein, we show validation of the synthesized AfBP in radioligand displacement, SDS-PAGE, and confocal microscopy experiments as well as utilization of the AfBP for the detection of endogenous A3AR expression in flow cytometry experiments. Ultimately, this AfBP will aid future studies toward the expression and function of the A3AR in pathologies. Show less
Protein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP... Show moreProtein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multitarget drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of protein kinase activity data in the public domain, which can be used in many different ways. Multitask machine learning models are expected to excel for these kinds of data sets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multitask modeling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a protein kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing protein kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random split-based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multitask deep learning models, on this very sparse data set, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set. Show less
Oratie uitgesproken door Prof. Dr. Gerard J.P. van Westen bij de aanvaarding van het ambt van hoogleraar Kunstmatige Intelligentie & FarmacoChemie aan de Universiteit van Leiden op 26 mei 2023
Luukkonen, S.I.M.; Maagdenberg, H.W. van den; Emmerich, M.T.M.; Westen, G.J.P. van 2023
The factors determining a drug's success are manifold, making de novo drug design an inherently multi-objective optimisation (MOO) problem. With the advent of machine learning and optimisation... Show moreThe factors determining a drug's success are manifold, making de novo drug design an inherently multi-objective optimisation (MOO) problem. With the advent of machine learning and optimisation methods, the field of multi-objective compound design has seen a rapid increase in developments and applications. Population-based metaheuris-tics and deep reinforcement learning are the most commonly used artificial intelligence methods in the field, but recently conditional learning methods are gaining popularity. The former approaches are coupled with a MOO strat-egy which is most commonly an aggregation function, but Pareto-based strategies are widespread too. Besides these and conditional learning, various innovative approaches to tackle MOO in drug design have been proposed. Here we provide a brief overview of the field and the latest innovations. Show less
Bournez, C.; Riool, M.; Boer, L. de; Cordfunke, R.A.; Best, L. de; Leeuwen, R. van; ... ; Westen, G.J.P. van 2023
To combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic... Show moreTo combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic organisms, including bacteria, viruses, parasites, and fungi. Here, we present the development of a novel machine learning model capable of predicting the activity of antimicrobial peptides (AMPs), CalcAMP. AMPs, in particular short ones (<35 amino acids), can become an effective solution to face the multi-drug resistance issue arising worldwide. Whereas finding potent AMPs through classical wet-lab techniques is still a long and expensive process, a machine learning model can be useful to help researchers to rapidly identify whether peptides present potential or not. Our prediction model is based on a new data set constructed from the available public data on AMPs and experimental antimicrobial activities. CalcAMP can predict activity against both Gram-positive and Gram-negative bacteria. Different features either concerning general physicochemical properties or sequence composition have been assessed to retrieve higher prediction accuracy. CalcAMP can be used as an promising prediction asset to identify short AMPs among given peptide sequences. Show less
Bournez, C.; Riool, M.; Boer, L. de; Cordfunke, R.A.; Best, L. de; Leeuwen, R. van; ... ; Westen, G.J.P. van 2023
To combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic... Show moreTo combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic organisms, including bacteria, viruses, parasites, and fungi. Here, we present the development of a novel machine learning model capable of predicting the activity of antimicrobial peptides (AMPs), CalcAMP. AMPs, in particular short ones (<35 amino acids), can become an effective solution to face the multi-drug resistance issue arising worldwide. Whereas finding potent AMPs through classical wet-lab techniques is still a long and expensive process, a machine learning model can be useful to help researchers to rapidly identify whether peptides present potential or not. Our prediction model is based on a new data set constructed from the available public data on AMPs and experimental antimicrobial activities. CalcAMP can predict activity against both Gram-positive and Gram-negative bacteria. Different features either concerning general physicochemical properties or sequence composition have been assessed to retrieve higher prediction accuracy. CalcAMP can be used as an promising prediction asset to identify short AMPs among given peptide sequences. Show less
Hollander, L.S. den; Béquignon, O.J.M.; Wang, X.; Wezel, K. van; Broekhuis, J.D.; Gorostiola Gonzalez, M.; ... ; Heitman, L.H. 2023
Solute carriers (SLCs) are relatively underexplored compared to other prominent protein families such as kinases and G protein-coupled receptors. However, proteins from the SLC family play an... Show moreSolute carriers (SLCs) are relatively underexplored compared to other prominent protein families such as kinases and G protein-coupled receptors. However, proteins from the SLC family play an essential role in various diseases. One such SLC is the high-affinity norepinephrine transporter (NET/SLC6A2). In contrast to most other SLCs, the NET has been relatively well studied. However, the chemical space of known ligands has a low chemical diversity, making it challenging to identify chemically novel ligands. Here, a computational screening pipeline was developed to find new NET inhibitors. The approach increases the chemical space to model for NETs using the chemical space of related proteins that were selected utilizing similarity networks. Prior proteochemometric models added data from related proteins, but here we use a data-driven approach to select the optimal proteins to add to the modeled data set. After optimizing the data set, the proteochemometric model was optimized using stepwise feature selection. The final model was created using a two-step approach combining several proteochemometric machine learning models through stacking. This model was applied to the extensive virtual compound database of Enamine, from which the top predicted 22,000 of the 600 million virtual compounds were clustered to end up with 46 chemically diverse candidates. A subselection of 32 candidates was synthesized and subsequently tested using an impedance-based assay. There were five hit compounds identified (hit rate 16%) with sub-micromolar inhibitory potencies toward NET, which are promising for follow-up experimental research. This study demonstrates a data-driven approach to diversify known chemical space to identify novel ligands and is to our knowledge the first to select this set based on the sequence similarity of related targets. Show less
Liu, X.; Ye, K.; Vlijmen, H. van; IJzerman, A.P.; Westen, G.J.P. van 2023
Rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like... Show moreRational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like molecules. With the rapid growth of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. Here, a Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules a novel positional encoding for each atom and bond based on an adjacency matrix was proposed, extending the architecture of the Transformer. The graph Transformer model contains growing and connecting procedures for molecule generation starting from a given scaffold based on fragments. Moreover, the generator was trained under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, the method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results show that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds. Show less
Schoenmaker, L.; Béquignon, O.J.M.; Jespers, W.; Westen, G.J.P. van 2023
Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements... Show moreGenerative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60-90% of invalid generator outputs and fixes 35-80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60-95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates. Show less
Béquignon, O.J.M.; Bongers, B.J.; Jespers, W.; IJzerman, A.P.; Water, B. van de; Westen, G.J.P. van 2023
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However,... Show moreWith the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research. Show less
Hollander, L.S. den; Béquignon, O.J.M.; Wang, X.; Wezel, K. van; Broekhuis, J.; Gorostiola González, M.; ... ; Heitman, L.H. 2022
CC chemokine receptor 2 (CCR2), a G protein-coupled receptor, plays a role in many cancer-related processes such as metastasis formation and immunosuppression. Since ∼ 20 % of human cancers contain... Show moreCC chemokine receptor 2 (CCR2), a G protein-coupled receptor, plays a role in many cancer-related processes such as metastasis formation and immunosuppression. Since ∼ 20 % of human cancers contain mutations in G protein-coupled receptors, ten cancer-associated CCR2 mutants obtained from the Genome Data Commons were investigated for their effect on receptor functionality and antagonist binding. Mutations were selected based on either their vicinity to CCR2's orthosteric or allosteric binding sites or their presence in conserved amino acid motifs. One of the mutant receptors, namely S101P2.63 with a mutation near the orthosteric binding site, did not express on the cell surface. All other studied mutants showed a decrease in or a lack of G protein activation in response to the main endogenous CCR2 ligand CCL2, but no change in potency was observed. Furthermore, INCB3344 and LUF7482 were chosen as representative orthosteric and allosteric antagonists, respectively. No change in potency was observed in a functional assay, but mutations located at F1163.28 impacted orthosteric antagonist binding significantly, while allosteric antagonist binding was abolished for L134Q3.46 and D137N3.49 mutants. As CC chemokine receptor 2 is an attractive drug target in cancer, the negative effect of these mutations on receptor functionality and drugability should be considered in the drug discovery process. Show less