Literature DB >> 29172488

Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling.

Gary Tresadern, Andres A Trabanco, Laura Pérez-Benito, John P Overington¹, Herman W T van Vlijmen², Gerard J P van Westen¹.

Abstract

Proteochemometric modeling (PCM) is a computational approach that can be considered an extension of quantitative structure-activity relationship (QSAR) modeling, where a single model incorporates information for a family of targets and all the associated ligands instead of modeling activity versus one target. This is especially useful for situations where bioactivity data exists for similar proteins but is scarce for the protein of interest. Here we demonstrate the application of PCM to identify allosteric modulators of metabotropic glutamate (mGlu) receptors. Given our long-running interest in modulating mGlu receptor function we compiled a matrix of compound-target bioactivity data. Some members of the mGlu family are well explored both internally and in the public domain, while there are much fewer examples of ligands for other targets such as the mGlu7 receptor. Using a PCM approach mGlu7 receptor hits were found. In comparison to conventional single target modeling the identified hits were more diverse, had a better confirmation rate, and provide starting points for further exploration. We conclude that the robust structure-activity relationship from well explored target family members translated to better quality hits for PCM compared to virtual screening (VS) based on a single target.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 29172488 PMCID： PMC5755953 DOI： 10.1021/acs.jcim.7b00338

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

One difficult aspect of drug discovery is simultaneous multiparametric optimization (target affinity, selectivity, ADME, toxicology, etc.). Properties like absorption, distribution, metabolism, excretion, and toxicology have been studied for some time; however, the systematic prediction and prevention of off-target effects is relatively novel. The advent of chemogenomic and proteochemometric approaches has provided computational tools for exploration of drug activity space on not one but multiple targets.[1] The importance of compounds being active on multiple targets (bioactivity spectra) rather than single target activity is particularly relevant in the field of G Protein-Coupled Receptors (GPCRs) and viral inhibitors.[2−4] Additionally, recent ligand based similarity metrics have confirmed the existence of common ligands across protein families and even classes.[5,6] Proteochemometric modeling (PCM) uses statistical approaches (machine learning) to predict the bioactivity of molecules versus groups of targets.[7,8] PCM is founded on the same principles as quantitative structure–activity relationship (QSAR) modeling but introduces an explicit protein (target) descriptor based on its sequence. Hence PCM differs from ligand-based approaches (such as chemogenomic methods) where the similarity between proteins is inferred from the similarity between their ligands or bioactivity data alone. Indeed, the protein similarity information that is added to the model is complementary to ligand information. The protein descriptor is commonly obtained via the physicochemical description of aligned protein sequences.[9,10] The descriptors can be derived from either the full sequence or just the binding pocket. As the protein descriptor captures aspects of target similarity, PCM can also predict the activity of known ligands versus new sequences based on the similarity of these proteins.[11] PCM has been applied to diverse targets (including Class A GPCRs, viral enzymes, kinases, and transporter proteins) and ligands (small molecules and peptides).[12] The metabotropic glutamate (mGlu) receptor family consists of 8 class C GPCRs subdivided into three groups according to sequence similarity and signaling pharmacology: group I mGlu 1&5, group II mGlu 2&3, and group III mGlu 4, 6, 7, and 8.[13,14] They are important drug discovery targets and despite many reported synthetic orthosteric agonists and antagonists, allosteric modulation is arguably the preferred means to modulate mGlu receptor function.[15] Allosteric modulators function in the presence of orthosteric agonists and typically either increase (positive allosteric modulators, PAMs) or decrease (negative allosteric modulators, NAMs) receptor response. Also, silent allosteric modulators (SAMs) are known to bind and have apparently little or no functional effect. While glutamate binds in the large extracellular N-terminal domain, most allosteric modulators of mGlu receptors are understood to bind in the 7-transmembrane (7-TM) domain.[16−18] Some mGlu receptors are more explored from a drug discovery point of view than others (Figure A). Over the last 15 years many groups including our own laboratories have explored allosteric modulators of mGlu5,[19,20] mGlu2,[21−25] and mGlu1.[26,27] Hence, the abundance of mGlu family bioactivity data at Janssen is consistent with the trends in the public domain (Figure A). The group III mGlu7 receptor is one of the least explored of the family, although reports suggests it may be relevant for cognition.[28] Only very few reference compounds are reported for this target; MMPIP is a known mGlu7 NAM, or allosteric antagonist,[29] and AMN-082[30] is a PAM that also has monoaminergic GPCR activity detrimental for its use as a tool compound[31] (Figure B). This target is a challenge for computational VS. Crystal structures of the 7-TM are only available for group I mGlu1 and mGlu5 receptors in the inactive state, and a structure based VS approach could be high-risk, meanwhile there are insufficient mGlu7 active compounds to develop a pharmacophore. With our interest in mGlu receptor allosteric modulators we created a platform of assays to measure activation or inhibition of signaling for all 8 receptors. Multiple mGlu active chemical series were tested versus this panel of assays. This data set supports VS with PCM and using the mGlu bioactivity data to find new hits for less explored receptors such as mGlu7. Here we describe our hit generation strategy for the mGlu7 target involving a gene-family screening approach for the mGlu receptors and building and applying mGlu receptor PCMs leading to the identification of new mGlu7 allosteric modulator hits.

Figure 1

(A) Pie chart showing the reported ligands for mGlu receptors in the Thomson Reuters Integrity database.[32] The most explored are mGlu5, mGlu2, and mGlu1. Extracted on March 15th 2017. (B) Known mGlu7 receptor reference compounds: allosteric antagonist/NAM MMPIP and agonist/PAM AMN-082.

Methods

Data Set

Input data came from two sources: Janssen internal mGlu family screening and ChEMBL (release 19).[33] The Janssen biological data comprised approximately 2500 compounds tested in the mGlu receptor functional assays described in Table (experimental methods provided in the Supporting Information). Activity of a molecule at a given mGlu receptor was classified as true or false. A compound was defined as inactive in an mGlu agonist, antagonist, or PAM assay if the pEC50 (or pIC50) from a concentration response study was <5.0 (EC50 or IC50 > 10 μM). In addition, molecules without concentration response data but a single concentration screen (EMAX) < 20% were also defined as inactive. A compound was defined as active only if a pEC50 was >5.0. A high single point % EMAX but without an attempted concentration response activity was not considered sufficient to count as active, and these molecules were discarded from further consideration. Details of the data set are provided in Table . The matrix of 2455 compounds and 18 assays (agonism and antagonism in all 8 mGlu’s and PAM for mGlu2 and mGlu5) corresponded to 33445 compound and receptor bioactivity pairs (that is a measurement of compound activity or inactivity in one mGlu receptor assay). Meanwhile, the data from ChEMBL consisted of 3211 unique compound and receptor bioactivity pairs. For duplicate pairs the mean was used; in total 2716 compounds and 15 mGlu’s (multiple species) were covered. pChEMBL values >6 were considered active, and pChEMBL values <5 were considered inactive. Intermediate values were removed to avoid confounding data of weakly active close analogues compared with inactive molecules and to ensure that true actives are above a stringent micromolar threshold. From the total set of 5755 actives and 30901 inactives, 5 different balanced sets were created through stratified random selection per receptor (using 5 different seeds) each containing approximately 4500 active data points and 4500 inactive data points (see Table S1 for a typical example). Molecules were prepared for modeling in Pipeline Pilot using components to strip salts, standardize molecules, and add hydrogens and were ionized at pH 7.4 as was done previously.[34−36]

Table 1

Details of Janssen Bioactivity Assays Used in This Studya

mGlu receptor	species	assay details
1	H	agonist and antagonist Ca²⁺ response
2	H	PAM GTPγS, PAM Ca²⁺, Ag GTPγS, and antagonist Ca²⁺
3	H	agonist and antagonist Ca²⁺ response
4	H	agonist and antagonist GTPγS response
5	H	PAM and agonist Ca²⁺ response
6	R	agonist and antagonist GTPγS response
7	H	agonist and antagonist Ca²⁺ response
8	H	agonist and antagonist Ca²⁺ response

Abbreviations: human (H), rat (R), positive allosteric modulator (PAM), guanosine 5′-O-[gamma-thio]triphosphate (GTPγS), calcium (Ca2+.)

Table 2

Details of the Full Data Set in This Studya

			bioactivity pairs
mGlu receptor	species	total compds	from ChEMBL	from Janssen	total “active” bioactivity pairs
1	H	4552	375	4177	391
	M	15	15	0	15
	R	342	342	0	316
2	H	5946	305	5641	2234
	R	244	244	0	240
3	H	3732	18	3714	269
	R	32	32	0	29
4	H	4029	99	3930	99
	R	32	32	0	32
5	H	5164	1027	4137	1422
	M	2	2	0	2
	R	690	690	0	644
6	H	5	5	0	1
	R	4094	0	4094	5
7	H	3997	0	3997	23
	R	20	20	0	20
8	H	3760	5	3755	13
total	17	36656	3211	33445	5755

Abbreviations: human (H), rat (R), mouse (M).

Abbreviations: human (H), rat (R), positive allosteric modulator (PAM), guanosine 5′-O-[gamma-thio]triphosphate (GTPγS), calcium (Ca2+.) Abbreviations: human (H), rat (R), mouse (M).

Binding Site Amino Acids

All Janssen in vitro biological data was generated on the human mGlu receptors, except mGlu6 where the rat clone was used. Data from ChEMBL originated from human and rat mGlu receptors in all cases except mGlu7, which was only from rat, and mGlu1 and mGlu5 that also included mouse data. Previously we demonstrated that human and rat GPCR paralogs can be successfully combined in a single PCM model.[4] Sequence identity between 7-TM domains of mGlu receptors in the same groups (I, II, and III) was typically 75–85%, whereas between members of different groups it was approximately 45–50%, Figure S1. The high identity permitted a facile alignment (Figure S2). The recently solved crystal structures of NAMs binding in the 7-TM domains of mGlu1 and mGlu5 receptors allowed us to identify the relevant allosteric binding site amino acids (Table and Figure ). A manual selection of 34 amino acids was made within a 5 Å radius around the ligands in the mGlu1 and mGlu5 crystal structures. The selection was extended to other mGlu receptors based on the same positions in the sequence alignment (Table S2).

Table 3

mGlu Receptor Allosteric Modulator Binding Site Amino Acids Used for PCMa

TM2	TM3	TM4	ECL2b	TM5	TM6	TM7
2.46a.42c	3.28a.32c	4.53a43c	45.5	5.40a.40c	6.44a.46c	7.35a.29c
2.49a.45c	3.29a.33c		45.52	5.43a.43c	6.47a.49c	7.38a.32c
2.50a.46c	3.32a.36c			5.44a.44c	6.48a.50c	7.41a.35c
2.53a.49c	3.33a.37c			5.47a.47c	6.51a.53c	7.42a.36c
2.56a.52c	3.35a.39c			5.51a.51c	6.55a.57c	7.45a.39c
2.60a.56c	3.36a.40c					7.46a40c
	3.39a.43c					7.49a.43c
	3.40a.44c

Amino acids are identified by their adapted Ballesteros-Weinstein numbering according to recent recommendations.[37]

Based on loop naming nomenclature from http://gpcrdb.org/.

Figure 2

(A) Nonsequential alignment of chosen binding site amino acids, coloring is based on Clustal X similarity. (B) mGlu1 and mGlu5 7-TM crystal structures showing NAMs and binding site amino acids. (C) An example of mGlu7 7-TM model receptor generated based on the sequence alignment and showing the same corresponding allosteric binding site amino acids.

Amino acids are identified by their adapted Ballesteros-Weinstein numbering according to recent recommendations.[37] Based on loop naming nomenclature from http://gpcrdb.org/. (A) Nonsequential alignment of chosen binding site amino acids, coloring is based on Clustal X similarity. (B) mGlu1 and mGlu5 7-TM crystal structures showing NAMs and binding site amino acids. (C) An example of mGlu7 7-TM model receptor generated based on the sequence alignment and showing the same corresponding allosteric binding site amino acids.

mGlu PCM Model Building

Models were built using the R statistics randomForest (RF) component available in Pipeline Pilot.[36,38] We have used RF previously as the method of choice in PCM modeling with good results. As this method is nonlinear, no cross-term descriptors are required.[35,39] Models used 500 trees, class sizes were equalized, and at each split a random 30% of the descriptors was sampled to identify the best separation at that point, and out-of-bag validation was used.

Compound and Target Descriptors

Various trial models were built to test the RF model input parameters as well as the model performance with different protein and molecule descriptors. These trials consisted of tests on subsets of the input data and different subsets of descriptors, for instance, comparing model validation statistics such as sensitivity, specificity for models built with 50% of the available data and applying to the remaining data. From this work, the best target descriptors were derived to be 3 Z-scales per amino acid, also including an added average measure for the full binding pocket sequence. The Z-scale descriptors capture the diversity of amino acids as they are the first three uncorrelated components originating from a principal component analysis of physicochemical properties (experimental and calculated) of amino acids. This set of descriptors was shown to perform optimally in previous GPCR PCM studies.[10,39] Protein descriptors were calculated for the binding site amino acid positions. A distance matrix with calculated Euclidian distances between the different receptors using the Z-scale based descriptors is given in Table S3. In the case of the small molecule descriptors, chemical fingerprints were combined with physicochemical properties. Based on occurrence frequency 768 bits were selected using the Pipeline Pilot component ‘Fingerprints to Properties’. The main advantage of this approach is that model interpretation allows linking back to the original substructure for which the bit encodes. Target frequency presence for bits was present in 50% of the compounds (avoiding a focus on features with low information density due to omnipresence or rare presence). Frequency based selection was preferred over Bayesian selection as the latter performs poorly in the context of multitarget models. It was found that functional-class fingerprints (FCFP6) outperformed extended connectivity fingerprints (ECFP6).[40] Physicochemical properties used can be found in Table S4. In summary, each data point was described by 768 (FCFP6) + 105 (protein) + 34 (small molecule physicochemical) descriptors. Subsequently these descriptors were used in the various external validation and prospective applications.

Ligand Based Similarity Search

ECFP6 fingerprints were used to identify close analogues of only mGlu7 actives from the Janssen compound collection. In a classic ligand-centric approach, the initial focus is on identifying the closest structural analogues, and hence ECFP fingerprints were preferred because they use actual atom and bond types and capture substructures. Further comparison of the value of protein descriptors within the PCM was performed within the descriptor set validation section.

Structure Based Docking

As mentioned in the Introduction, the situation does not favor a structure-based approach given the lack of bioactive molecules for docking validation and no available receptor structure. We have previously reported modeling of mGlu receptors but usually in tandem with experiment.[41] Here a model of the mGlu7 7-TM domain was built based on the mGlu family sequence alignments and the mGlu1 and mGlu5 receptor structures. Ligands were maintained during model building to maintain an open 7-TM binding cavity. Known active and inactive molecules were then docked into the 7-TM binding cavity using Glide SP.[42] Small molecules and protein were prepared using the appropriate lig-prep and protein preparation tools. Default settings were used for docking.

Results and Discussion

Learning Curve External Validation

A learning curve was created sampling model performance in duplicate using 30%, 50%, and 70% of the data as training and using the remainder as test set (Table ). This was done in duplicate with differing seeds. Performance of the full model, along with best and worst performing receptors at a 70% training and 30% testing split, is shown in Figure A. At 70% split the models had an average sensitivity (sens) of 0.90 ± 0.00 (mean and standard deviation) (sens = TP/(TP + FN), where TP and FN refer to the number of true positives and false negatives). The specificity (spec) was 0.91 ± 0.00 (spec = TN/(TN + FP)), where TN and FP refer to the number of true negatives and false positives). The Matthews Correlation Coefficient (MCC) was 0.81 (±0.00).[43] ROC scores (area under the curve for receiver operator characteristic curves, plotting the FP rate on the x-axis and the TP rate on the y-axis) for the mean performance and best and worst performing receptors are given in Figure A. The performance for the rat mGlu5 receptor is the worst. This is likely caused by a discrepancy in chemical and sequence similarity (where a high sequence similarity is not coupled to a high chemical similarity of the compounds tested). For the rat mGlu5 the distance to the rest of the training set (1 minus the Tanimoto similarity) based on the compound structures is the highest (0.45 where the average is 0.19) of the receptors with enough data for the learning curve. Conversely, the distance to the training set is rather low when the distance is calculated based on the protein descriptors (0.81 where the average is 0.87). See Figure S3. We speculate that this mismatch is the cause of the poor performance. This would mean that the chemical space modeled for rat mGlu5 is partially outside of the applicability domain. However, it should also be noted that for rat mGlu5 few actives were present, and hence by balancing the data much information is discarded, making the modeling more difficult given the differences in chemical space.

Table 4

Statistics of the Models Used in the Various External Validation Applicationsa

	learning curve external validation			model ensemble external validation
	30% model 1	50% model 1	70% model 1	model 1	model 3	consensus
active data points (training)	1336	2310	3222	4549	4531	4843
inactive data points (training)	1271	2207	3103	4502	4580	10588
active data points (validation)	3210	2239	1327	1206	1224	912
inactive data points (validation)	3205	2295	1337	26399	26321	20313
OoB sensitivity	0.89	0.90	0.92	0.92	0.92	n/a
OoB specificity	0.88	0.89	0.90	0.91	0.91	n/a
OoB ROC AUC	0.94	0.96	0.96	0.97	0.97	n/a
ExtVal sensitivity	0.89	0.91	0.90	0.88	0.90	0.91
ExtVal specificity	0.88	0.90	0.91	0.94	0.95	0.94
ExtVal MCC	0.77	0.81	0.81	0.57	0.62	0.58
ExtVal ROC AUC	0.94	0.96	0.96	0.97	0.97	0.97

Figure 3

PCM model random learning curve external validation. (A) External validation ROC plot for overall performance (0.96 yellow), the best performing receptor (human mGlu4, 0.99 in blue), and the worst performing receptor (rat mGlu5, 0.81 in orange). (B) Performance of learning curves with increasing training sets specifically on human mGlu7. As the training set size increases the ROC is seen to increase from 0.79 for 30% (blue), through 0.83 for 50% (yellow), to 0.88 for 70% (orange) training set size, respectively.

Overview of representative models created in the external validation. Shown are one of each created learning curve models (30%, 50%, 70%), 2 out of 5 models created for ensemble model screening (model 1 and model 3), and finally the performance of the consensus model used for prospective application. The abbreviations are as follows: External Validation (ExtVal), Out-of-Bag (OoB), Matthews Correlation Coefficient (MCC, see main text for details), Receiver Operator Characteristic (ROC), Area Under the Curve (AUC), Sensitivity is defined as True Positives divided by the sum of True Positives and False Negatives, Specificity is defined as True Negatives divided by the sum of True Negatives and False Positives. Note that no OoB parameters are present for the consensus application as this method consists of 5 separate OoB validated models for which data for 2 is shown. PCM model random learning curve external validation. (A) External validation ROC plot for overall performance (0.96 yellow), the best performing receptor (human mGlu4, 0.99 in blue), and the worst performing receptor (rat mGlu5, 0.81 in orange). (B) Performance of learning curves with increasing training sets specifically on human mGlu7. As the training set size increases the ROC is seen to increase from 0.79 for 30% (blue), through 0.83 for 50% (yellow), to 0.88 for 70% (orange) training set size, respectively. Specifically, for the mGlu7 human receptor sens was 0.71 (±0.06), spec was 0.88 (±0.18), and MCC was 0.61 (±0.28). ROC curves for 30% (0.79), 50% (0.83), and 70% (0.88) splits are given in Figure B. We conclude that the mGlu7 human receptor performed slightly below average but well above the worst receptor performance of rat mGlu5.

Model Ensemble External Validation

For screening purposes an ensemble of 5 models was used due to the highly imbalanced training set. The 5 models were generated on balanced partitions of the training set capturing all information on the active and inactive compounds (Table S1). The partitions contained approximately 80% of the actives in the training set (∼4500) and about 20% (∼1200) in the test set (Table ), with a similar number of inactive compounds in the training set and the remainder in the test set. For this application the average out-of-bag validated sens was 0.92 and spec was 0.91, with an ROC of 0.97 (Table and Figure S4). External validation was a slightly worse average with sens at 0.91, spec at 0.94, and MCC at 0.58, and the associated ROC score was 0.96 (Table and Figure S4). Consensus model performance was also tested and shown to be slightly better via external validation. In this application, sens was 0.91, spec was 0.94, and MCC was 0.58, with an ROC of 0.97 (Table ). The worse performance compared to the learning curve on the test set is likely due to the large imbalance in the external validation (Table S1), where only about 4% of the data points are active, compared to an approximately 50:50 split used for model training.

Descriptor Set Contribution Validation

We also investigated the added value of the different descriptors by randomizing the FCFP6 bits, the physicochemical compound descriptors, the protein descriptors, and the response variable or by leaving out compound or protein descriptors completely. In addition, a random model (where the modeled class was obtained by a random number generator and active labels were assigned when this number was >0.5) and an inactive biased random model (where active labels were assigned when the number was >0.7 due to the large activity imbalance) were included (Figure S5). Note that in these cases the training set was scrambled, but the validation set was kept true. The extra testing demonstrated that model sensitivity, specificity, and MCC improved with the presence of each of the included descriptors. It should be noted that the MCC ranges from −1 (anticorrelation) through 0 (random model) to 1 (perfect model). Compared to sens and spec, the MCC shows the biggest deterioration due to this larger range. From these results we conclude that the improved performance of the PCM is due not only solely to the addition of more molecules and their associated bioactivity but also attributable to the binding site similarity linking the data.

External Validation of the PCM Model

The PCM model was further validated by testing the performance on Janssen in-house mGlu1 and mGlu2 data sets. With inactives representing diverse chemical structures from previous high throughput screens (HTS) and actives taken from a mixture of both diverse HTS hits and lead-optimization programs, this represented a realistic and challenging test for the model. First, application to the mGlu1 receptor data set (comprising 588 actives and 207857 inactives) revealed a good early enrichment for the model over the first 2–5% of the database (Figure A), with 35 of the known actives being found in the top 2000 ranked molecules, and 25.5% of actives identified after searching 10% of the database. This corresponded to a sens and spec after searching 2% of the database of 0.12 and 0.98, respectively, and after searching 5% of the database 0.19 and 0.95. Meanwhile, for the mGlu2 data set (comprising 3412 actives and 206090 inactives) performance was worse (Figure B), and only 12.4% of actives were retrieved after searching 10% of the database. This corresponded to a sens and spec after searching 2% of the database of 0.04 and 0.99, respectively, and 0.08 and 0.98 after searching 5% of the database. This is due to the diversity in the mGlu2 actives, arising from multiple HTS and many structurally different lead series. In contrast, the Janssen mGlu1 actives are predominantly from the same reported chemical class, offering a better chance for the model to identify them.

Figure 4

Enrichment curves showing the retrieval of known actives versus % of database searched for Janssen internal mGlu1 (A) and mGlu2 (B) data sets.

Enrichment curves showing the retrieval of known actives versus % of database searched for Janssen internal mGlu1 (A) and mGlu2 (B) data sets. The PCM was further tested by applying to new mGlu7 PAM screening data performed subsequent to model building. The set contained 1088 unique molecules, 110 actives, and 978 inactives. The resulting sens and spec were 0.25 and 0.72, respectively, a reasonable true positive rate for prospective VS. The data set contained many close analogues from an internal mGlu7 PAM medicinal chemistry program, some active and others inactive; this was a challenge for the model and increased the number of false positives. The classification of such small structural changes from lead optimization is beyond the scope of the model. To further contextualize model performance, we compared with docking into an mGlu7 7-TM receptor model. The same 110 actives and a larger set of 7855 HTS inactives were used for docking with Glide SP. A VS of this type would usually be performed on hundreds of thousands of molecules and the top 2–5% recommended for in vitro screening. Hence, comparing sens and spec after searching 2% of the database showed values of 0.05 and 0.98, respectively, or after searching 5% of the database they were 0.08 and 0.95. This is in a similar performance range to the worst-case PCM validation on the mGlu2 HTS data set. For the true prospective application final PCM models were trained on all data and applied for the selection of compounds to target the mGlu7 receptor.

Prospective VS with PCM To Identify mGlu7 PAMs

Our focus was hit finding for a difficult target, allosteric modulators of mGlu7 receptor, based on gene family mGlu receptor screening followed by PCM for VS. The PCM was used for VS of the Janssen R&D corporate compound collection. First, Janssen compounds were filtered for stock availability. Restrictive physicochemical property filters were applied to identify only CNS-lead-like hits. Compounds with MW >400, number of H-bond donors >2, molecular polar surface area >70 Å2, AlogP >6, nitrogen plus oxygen count >7, and number of rotatable bonds >10 were removed. Undesirable substructures and compounds previously tested versus mGlu7 were also removed. Approximately 200,000 compounds remained. Molecular and protein fingerprints corresponding to the mGlu7 7-TM binding site were calculated for each molecule, and the likelihood of activity was predicted using the model. In total 2130 molecules were predicted as having mGlu7 activity. The top ranked 394 were selected for screening (Table ).

Table 5

Summary of mGlu7 PAM VS and Resulting Hits

method	PSa compds tested	PS activeb	PS hit rate	conf compds testedc	pEC₅₀ > 4.52 ag or PAMd	active in autofluorescence	no. of confirmed actives	final confirmed hit rate
single target approach: fingerprint analogues of only mGlu₇ actives	202	27	13%	25	17	12	5	2.5%
multitarget approach: select molecules based on likelihood to be mGlu₇ active from PCM	394	42	11%	41	18	1	17	4.3%

PS compds tested refers to compounds tested in the primary screen.

>50% effect at 3 or 10 μM in either agonist or PAM assay.

conf compds tested refers to number of compounds tested in confirmation assays.

Ag refers to agonist.

PS compds tested refers to compounds tested in the primary screen. >50% effect at 3 or 10 μM in either agonist or PAM assay. conf compds tested refers to number of compounds tested in confirmation assays. Ag refers to agonist. In addition, a comparison was performed versus a typical single target ligand based VS to identify close analogues. Molecules were selected from the Janssen compound collection based on their similarity to in-house mGlu7 actives. Previous in-house in vitro mGlu7 screens had delivered hits with pEC50 up to 6.5 in mGlu7 agonist/PAM assays. In total 92 diverse active compounds were identified from our existing internal data that either had a measurable mGlu7 receptor pEC50 or EMAX > 40%; for further details see Figure S6. Each compound was used as a query for ECFP6 fingerprint searches, and analogues from the Janssen collection with Tanimoto similarity >0.5 were retained. The molecules were subjected to the same filters as described above for the PCM VS. Physicochemical property filters were applied, and undesirable substructures were removed. This represents a typical approach with a single target ligand-based modeling paradigm given the scarcity of data for the target and low activity of the reference compounds. In total 202 compounds were identified and recommended for biological screening. First, all compounds were tested in primary mGlu7 assays to assess their likelihood of activity. At the time, the mechanism of action of AMN-082 was not fully understood, and it possibly acts as a dual agonist/PAM. Hence, we did not want to discard the chance of finding CNS-drug-like (nonamino acid like) allosteric agonists as well as PAMs. Therefore, the initial primary screen (PS) was performed with two assays in a low throughput manner, testing for >50% effect at 3 or 10 μM concentrations in either agonist or PAM assay. This resulted in 41 weak hits from PCM and 25 from the single target fingerprint approach (Table and Figure ). These compounds were then assessed in concentration response. The initial diversity of the primary screening hits from the PCM model was greater than that from the fingerprint approach, Figure . Subsequently, 18 hits from PCM and 17 from fingerprints showed confirmed activity better than 30 μM (pEC50 > 4.52).

Figure 5

Stochastic proximity embedding (SPE) diversity map capturing the substructural diversity of the primary screening hits. Primary screen hits from PCM are shown in red, and hits from only fingerprint analogues of mGlu7 actives are shown in blue. The plot highlights the diversity of the PCM hits (red) compared to the initial fingerprint queries (green) and the resulting fingerprint hits (blue). ECFP4 fingerprints were used as descriptors. SPE generates low-dimensional Euclidean embeddings that preserve the similarities between the chemical structures.[46] Confirmed hits from fingerprints are molecules numbered 1 and 2 whose structures are shown in the top left, and their location in the diversity map is within the blue circle. Meanwhile, hits from PCM are numbered 3 to 6, their structures are shown in the bottom of the figure, and their locations in the diversity map are circled in red. The hits from PCM extend into a diversity space beyond those of the fingerprint queries and hits. The fingerprint search resulted in a larger proportion of false positives due to autofluorescence, 12 out of 17 compounds (Table ). Table showed there were very few and only weakly active mGlu7 ligands at the start of the project, with relatively high logP (details Figure S6). The queries themselves were not characterized as autofluorescent, but their low activities make them suboptimal for similarity searches. In contrast, only one of the PCM hits was discarded based on autofluorescence. The PCM was built from more data and more robust data avoiding promiscuous molecules that fail in confirmation assays.[44] For example, much originated from long running discovery programs for targets such as mGlu2 and mGlu5 that contributed many of the active compounds in the PCM data set. This is not a weakness of the fingerprint method, but a result of performing ligand based VS on a novel target without robust queries. The results highlight that the PCM model delivered hit compounds with greater structural diversity and a lower proportion of false positives. Regarding the final confirmed hits, the fingerprint hits were all analogues from the same chemical series. In contrast, PCM hits contained diverse chemical scaffolds and more promise for future work, Figure . The hit rate from the prospective VS was lower than the validation studies. We attribute this to the low activity of the known actives used for model building and the restrictive physicochemical property filters used to select compounds making this a very challenging validation. This resulted in various high-ranking PCM molecules that were lower MW substructures of known actives but insufficient to be active mGlu7 allosteric modulators, see examples in Figure S7. This is a byproduct of performing VS with few potent reference compounds, in this case pIC50’s from 6 to 6.6. Typically, VS hits are less active because they are unoptimized “off-the-shelf” molecules. Hence, with a 10 μM concentration screening cutoff, there is only a small window in which to find new actives. A further explanation of the varying performance was seen with the distance to training set for the compounds recommended for screening. With an average FCFP_6 Tanimoto distance of 0.57 (±0.20), this distance was higher compared to the mean distance between compounds in the set in general (0.19) and tested on mGlu7 (0.01). These observations suggest that the applicability domain of the model cannot extend too far from the structural similarity space of the active ligands. Hence, overall the model was trying to predict activity at the limits of its applicability domain. It should also be noted that allosteric modulators have previously been found to be part of a slightly different chemical space as compared to orthosteric compounds (in general found to be more lipophilic, more rigid, and to bind with a lower absolute affinity).[45] Active hits from the VS were sourced from the Janssen corporate compound collection, no new synthesis was performed at this time, and batch purity information is provided in the Supporting Information (Table S5). The selectivity of the hits 1 to 4 was assessed by in vitro screening in the same panel of mGlu1 to mGlu8 receptor activation or inactivation assays. No activity was seen up to concentration cutoffs of 10 μM for compounds 1, 2, and 4, while molecule 3 showed micromolar activity with pEC50 of 6.2 in mGlu3 and mGlu4 agonism assays. Thus, compound 4 was revealed to be similarly active not only for mGlu7 but also for other mGlu receptors. Hit 3 showed visual similarity with reference compound AMN-082 (Figure ), containing a distal benzhydryl motif but with more attractive alternative substructures and breaking the symmetry of AMN-082. Further substructure and analogue searches did not lead to more active hits compared to 3; however, chemistry around this hit based on synergies with AMN-082 led to rapid improvement of potency to a 10 nM mGlu7 PAM, which will be disclosed in an upcoming report.

Conclusion

In conclusion, we have described a hit generation approach for the mGlu7 receptor. Using mGlu receptor family screening followed by PCM identified new allosteric modulators of the less explored mGlu7 receptor within the mGlu family. Given that no receptor structure was available and very few reported ligands, classical target oriented approaches were challenging. A docking approach showed a low true positive retrieval rate. Hence, this was an ideal scenario to benefit from abundant data for similar targets in the same family. We performed multiple rounds of PCM model validation. Performance varied from a high true positive retrieval rate seen in internal cross-validation to intermediate or low values applied to external data sets (HTS and newly screened compounds) or the prospective study. Cross-validation showed that the PCM model benefited from the protein descriptors, hence there was value in using the multitarget and intertarget descriptors. From the prospective study, the diversity of the initial screening hits was higher for the multitarget PCM compared to a single target fingerprint similarity. Also, and particularly interesting, was the better confirmation rate of the hits from PCM that were selected with information from robust SAR of similar targets compared with the weakly active singletons selected with the single target. Our results illustrate the value of PCM-based VS in cases where limited chemical information is available for the target of interest but where target family members have been explored more extensively. Future work will describe the follow-up of these hits and additional mGlu7 PAM chemical series.

39 in total

1. Stochastic proximity embedding.

Authors: Dimitris K Agrafiotis
Journal: J Comput Chem Date: 2003-07-30 Impact factor: 3.376

2. Activation of Metabotropic Glutamate Receptor 7 Is Required for Induction of Long-Term Potentiation at SC-CA1 Synapses in the Hippocampus.

Authors: Rebecca Klar; Adam G Walker; Dipanwita Ghose; Brad A Grueter; Darren W Engers; Corey R Hopkins; Craig W Lindsley; Zixiu Xiang; P Jeffrey Conn; Colleen M Niswender
Journal: J Neurosci Date: 2015-05-13 Impact factor: 6.167

3. Chemistry: Chemical con artists foil drug discovery.

Authors: Jonathan Baell; Michael A Walters
Journal: Nature Date: 2014-09-25 Impact factor: 49.962

4. Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an allosteric modulator.

Authors: Huixian Wu; Chong Wang; Karen J Gregory; Gye Won Han; Hyekyung P Cho; Yan Xia; Colleen M Niswender; Vsevolod Katritch; Jens Meiler; Vadim Cherezov; P Jeffrey Conn; Raymond C Stevens
Journal: Science Date: 2014-03-06 Impact factor: 47.728

Review 5. Metabotropic glutamate receptors: physiology, pharmacology, and disease.

Authors: Colleen M Niswender; P Jeffrey Conn
Journal: Annu Rev Pharmacol Toxicol Date: 2010 Impact factor: 13.820

6. Scaffold hopping from pyridones to imidazo[1,2-a]pyridines. New positive allosteric modulators of metabotropic glutamate 2 receptor.

Authors: Gary Tresadern; Jose María Cid; Gregor J Macdonald; Juan Antonio Vega; Ana Isabel de Lucas; Aránzazu García; Encarnación Matesanz; María Lourdes Linares; Daniel Oehlrich; Hilde Lavreysen; Ilse Biesmans; Andrés A Trabanco
Journal: Bioorg Med Chem Lett Date: 2009-11-10 Impact factor: 2.823

7. Discovery of 1-butyl-3-chloro-4-(4-phenyl-1-piperidinyl)-(1H)-pyridone (JNJ-40411813): a novel positive allosteric modulator of the metabotropic glutamate 2 receptor.

Authors: José María Cid; Gary Tresadern; Guillaume Duvey; Robert Lütjens; Terry Finn; Jean-Philippe Rocher; Sonia Poli; Juan Antonio Vega; Ana Isabel de Lucas; Encarnación Matesanz; María Lourdes Linares; José Ignacio Andrés; Jesús Alcazar; José Manuel Alonso; Gregor J Macdonald; Daniel Oehlrich; Hilde Lavreysen; Abdelah Ahnaou; Wilhelmus Drinkenburg; Claire Mackie; Stefan Pype; David Gallacher; Andrés A Trabanco
Journal: J Med Chem Date: 2014-07-28 Impact factor: 7.446

8. A pharmacological organization of G protein-coupled receptors.

Authors: Henry Lin; Maria F Sassano; Bryan L Roth; Brian K Shoichet
Journal: Nat Methods Date: 2013-01-06 Impact factor: 28.547

9. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets.

Authors: Gerard Jp van Westen; Remco F Swier; Jörg K Wegner; Adriaan P Ijzerman; Herman Wt van Vlijmen; Andreas Bender
Journal: J Cheminform Date: 2013-09-23 Impact factor: 5.514

10. Chemical, target, and bioactive properties of allosteric modulation.

Authors: Gerard J P van Westen; Anna Gaulton; John P Overington
Journal: PLoS Comput Biol Date: 2014-04-03 Impact factor: 4.475

1 in total

Review 1. Current computational methods for predicting protein interactions of natural products.

Authors: Aurélien F A Moumbock; Jianyu Li; Pankaj Mishra; Mingjie Gao; Stefan Günther
Journal: Comput Struct Biotechnol J Date: 2019-10-28 Impact factor: 7.271

1 in total