Literature DB >> 35982407

A machine learning model for classifying G-protein-coupled receptors as agonists or antagonists.

Jooseong Oh1, Hyi-Thaek Ceong2, Dokyun Na3, Chungoo Park4.   

Abstract

BACKGROUND: G-protein coupled receptors (GPCRs) sense and transmit extracellular signals into the intracellular machinery by regulating G proteins. GPCR malfunctions are associated with a variety of signaling-related diseases, including cancer and diabetes; at least a third of the marketed drugs target GPCRs. Thus, characterization of their signaling and regulatory mechanisms is crucial for the development of effective drugs.
RESULTS: In this study, we developed a machine learning model to identify GPCR agonists and antagonists. We designed two-step prediction models: the first model identified the ligands binding to GPCRs and the second model classified the ligands as agonists or antagonists. Using 990 selected subset features from 5270 molecular descriptors calculated from 4590 ligands deposited in two drug databases, our model classified non-ligands, agonists, and antagonists of GPCRs, and achieved an area under the ROC curve (AUC) of 0.795, sensitivity of 0.716, specificity of 0.744, and accuracy of 0.733. In addition, we verified that 70% (44 out of 63) of FDA-approved GPCR-targeting drugs were correctly classified into their respective groups.
CONCLUSIONS: Studies of ligand-GPCR interaction recognition are important for the characterization of drug action mechanisms. Our GPCR-ligand interaction prediction model can be employed in the pharmaceutical sciences for the efficient virtual screening of putative GPCR-binding agonists and antagonists.
© 2022. The Author(s).

Entities:  

Keywords:  G-protein-coupled receptors; GPCR agonists and antagonists; GPCR–ligand interactions; Machine learning; Two-step random forest classification

Mesh:

Substances:

Year:  2022        PMID: 35982407      PMCID: PMC9389651          DOI: 10.1186/s12859-022-04877-7

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.307


Background

G-protein coupled receptors (GPCRs) belong to membrane protein families that sense and transmit extracellular signals to the intracellular region by regulating G proteins. GPCRs are involved in diverse signaling pathways triggered by hormones and neurotransmitters, and participate in cell growth, differentiation, vision, olfaction, and gustatory system [1]. When a ligand binds to a GPCR, the receptor undergoes a conformational change that can either activate (called an agonist) or inhibit (called an antagonist) signal transduction pathways [2]. Approximately one-third of the drugs on the market target GPCRs [2, 3] and are used to treat various human diseases including cardiac malfunction, asthma, and migraines [4]. In 2017, Hauser et al. reported that approximately 34% (475 drugs) of all US FDA (Food and Drug Administration)-approved drugs act on GPCR targets, and that most agents in clinical trials target novel GPCRs [5]. Owing to recent technological advances in receptor pharmacology, new avenues for GPCR drug discovery have emerged that diverge from the traditional view of signal transduction as a linear chain of events involving the heterotrimeric G proteins. However, GPCR drug discovery has long been focused on the identification of new compounds targeting GPCRs and their ligand binding sites. The classification of the agonist and antagonist properties of existing and newly discovered ligands is needed to optimize drug efficacy and develop appropriate therapeutic strategies that selectively activate or block relevant pathways. Using a support vector machine (SVM) learning algorithm with 4884 chemical descriptors as input, Bushdid et al. [6] virtually screened 258 chemical compounds and determined agonists for the human G-protein-coupled odorant receptor (OR) 51E1 as well as human receptors OR1A1 and OR2W1, and mouse receptor MOR256-3. The predicted novel agonists were identified with a hit rate of 39–50%. Two newly identified agonists for OR51E1 were functionally validated through in vitro assays. In addition, to predict ligands and their roles in the human olfactory receptor OR1G1, Jabeen and Ranganathan [7] built classification models (SVM, random forest, naïve bayes, and neural networks) based on 13 relevant features for a dataset of 74 agonists and 74 antagonists. The area under the ROC curve (AUC) was 0.652–0.827. Using over 200,000 compounds, the best performing classifier, naïve bayes model, predicted 37 compounds as agonists for OR1G1 with > 80% probability score. In this study, we developed a ligand-based machine learning model to identify novel human GPCR agonists and antagonists, irrespective of GPCR types. Using the existing knowledge-base to predict ligand activity according to similarities/dissimilarities of known active ligands, we designed two-step machine learning models that first identify the ligands binding to GPCRs and then classify the ligands as agonists or antagonists. GPCR ligand information from the International Union of Basic and Clinical Pharmacology (IUPHAR)/British Pharmacological Society (BPS) Guide to PHARMACOLOGY database (GtoPdb) [8] and Context-Oriented Directed Associations (CODA) [9] database were used to train two random forest (RF) models that will act independently but successively to classify query components into non-ligands, agonists, and antagonists of GPCRs. The optimal performance parameters for the integrated two-step models were AUC = 0.795, accuracy = 0.733, sensitivity = 0.716, and specificity = 0.744. Hence, our model allowed us to understand the molecular mechanisms of GPCR–ligand interactions. This model can be employed in the pharmaceutical sciences to screen novel drugs and therapeutic agents.

Results and discussion

Data collection and preprocessing

Out of 14,659 initially available human ligand-target interactions, 4590 ligand-target pairs were analyzed. We obtained 1058 and 1150 ligands that act as agonists (hereafter called GPCR-agonist) and antagonists (hereafter called GPCR-antagonist), respectively; the remaining 2382 ligands were classified as non-ligands of GPCRs (hereafter called GPCR-nontarget). To eliminate potentially redundant ligands, ligands were clustered with their ECFP4 (extended connectivity fingerprints of bond diameter 4) fingerprints encoding the ligand’s structural characteristics as a vector [10] using an agglomerative hierarchical clustering method. This algorithm iteratively merges subclusters based on their similarity (above 0.8 in this study [11]) considering interconnectivity and closeness of the clusters [12]. Only representative ligands in each cluster were used for training and test dataset. Consequently, 758 GPCR-agonists, 950 GPCR-antagonists, and 2206 GPCR-nontargets were selected for further analysis.

Molecular descriptor calculation and feature selection

We calculated 5270 molecular descriptors using Dragon software, and they were used for feature selection. Using Boruta algorithm that performs the comparison of the real predictor features with those of random (so-called shadow) variables, 990 selected predictor features (Additional file 1) with significantly larger importance values were taken as inputs for machine learning classifiers (Fig. 1A).
Fig. 1

Overall workflows of A feature selection process and B two-step binary-class RF models

Overall workflows of A feature selection process and B two-step binary-class RF models

Machine learning model construction and evaluation

We designed two-step binary-class classifiers, given their superior accuracy estimates compared to multi-class classifiers [13]. The first model (T-model) predicted GPCR-target or GPCR-nontarget; the second model (A-model) predicted GPCR-agonist or GPCR-antagonist (Fig. 1B). Specifically, when a query molecule is input, the T-model predicts whether or not the molecule is a GPCR ligand. If not, it is classified as a GPCR-nontarget molecule. If classified as a GPCR-target, the A-model predicts whether it acts as a GPCR agonist or antagonist. For the T-model, 1708 GPCR-target (from 758 GPCR-agonists and 950 GPCR-antagonists) and 2206 GPCR-nontarget were used in the training dataset. Because no statistical model functions at 100% accuracy, some of the GPCR-nontarget classified molecules could potentially interact with GPCRs; thus, we used all of the available data to minimize the data imbalance [14, 15]. For the A-model, we used 758 GPCR-agonists and 950 GPCR-antagonists in the training data set. The T-model and A-model were built separately using the RF classifier and were evaluated using the leave-one-out cross-validation (LOO-CV) method. The T-model and A-model achieved an AUC of 0.787 and 0.823, respectively. The final integrated two-step model produced an AUC of 0.795 (accuracy = 0.733, sensitivity = 0.716, and specificity = 0.744) (Table 1).
Table 1

Performance parameters of the two-step binary-class models

ModelAccuracySensitivitySpecificityPPV1NPV2F13MCC4AUC5
T-model0.7260.6990.7440.6520.7830.6750.4390.787
A-model0.7580.7800.7440.6470.8490.7070.5100.823
Integrated60.7330.7160.7440.6510.7970.6820.4540.795

1Positive predictive value

2Negative predictive value

3Harmonic means of PPV and sensitivity

4Matthews correlation coefficient

5Area under the ROC curve

6The performance values were measured with micro-average

Performance parameters of the two-step binary-class models 1Positive predictive value 2Negative predictive value 3Harmonic means of PPV and sensitivity 4Matthews correlation coefficient 5Area under the ROC curve 6The performance values were measured with micro-average

Model validation with FDA-approved GPCR drugs

To validate our model under different experimental conditions, we used FDA-approved GPCR-targeting drugs. Data for 134 drugs were collected, of which data for 63 drugs with ligand-binding types and SMILES (simplified molecular input line entry system) descriptors were used for the model validation procedures. Our T-model predicted that 52 of 63 (82.5%) drugs could interact with GPCRs. According to the A-model, 44 of the 52 GPCR-target drugs (84.6%) were correctly categorized as agonists or antagonists. Consequently, 44 out of 63 (69.8%) FDA-approved GPCR-targeting drugs were correctly classified into their respective groups (GPCR-agonist, GPCR-antagonist, GPCR-nontarget) (Table 2). In addition to the positive data, our T-model was also tested on negative dataset. To this end, we collected 1278 GPCR-nontarget drugs (out of 14,594 drugs) from DrugBank database. After excluding ligands for which descriptors were not calculated by Dragon software, we retained 982 drugs as GPCR-nontarget drugs. 808 of 982 GPCR-nontarget drugs (82.3%) were correctly predicted by our T-model (Additional file 2). Though our study considered a relatively small sample size, our results clearly showed that the integrated two-step RF models had a high and balanced prediction accuracy. Future studies should consider the practical compatibility of virtual screening with larger sample size datasets and more complex models associated with signaling pathways.
Table 2

Model evaluation using FDA-approved GPCR drugs

Drug nameTargetFDA-approved actionT-model-predicted actionA-model-predicted actionReferences
Beclometasone dipropionateGlucocorticoid receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
AdenosineAdenosine receptor A1GPCR-agonistGPCR-nontargetGPCR-agonist[16]
RegadenosonAdenosine receptor A2aGPCR-agonistGPCR-targetGPCR-agonist[16]
NicardipineAlpha-1A adrenergic receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
OxymetazolineAlpha-2B adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
PrazosinAlpha-1A adrenergic receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
ApraclonidineAlpha-2A adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
DexmedetomidineAlpha-2A adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
AcebutololBeta-1 adrenergic receptorGPCR-agonistGPCR-nontargetGPCR-antagonist[16]
MirabegronBeta-3 adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
CandesartanType-1 angiotensin II receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
PentagastrinGastrin/cholecystokinin type B receptorGPCR-agonistGPCR-nontargetGPCR-agonist[16]
MaravirocC–C chemokine receptor type 5GPCR-antagonistGPCR-targetGPCR-antagonist[16]
BiperidenMuscarinic acetylcholine receptor M1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
PropanthelineMuscarinic acetylcholine receptor M1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
UmeclidiniumMuscarinic acetylcholine receptor M1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
NabiloneCannabinoid receptor 2GPCR-agonistGPCR-targetGPCR-agonist[16]
ZafirlukastCysteinyl leukotriene receptor 1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
DopamineDopamine D2 receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
AmbrisentanEndothelin-1 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
BosentanEndothelin-1 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
VorapaxarProteinase-activated receptor 1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
BaclofenGamma-aminobutyric acid type B receptor subunit 2GPCR-agonistGPCR-nontargetGPCR-agonist[16]
EstradiolEstrogen receptor alphaGPCR-agonistGPCR-nontargetGPCR-agonist[16]
LevodopaDopamine D1 receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
DronabinolCannabinoid receptor 1GPCR-agonistGPCR-nontargetGPCR-agonist[16]
BumetanideSolute carrier family 12 member 1GPCR-antagonistGPCR-targetGPCR-agonist[16]
Nicotinic acidHydroxycarboxylic acid receptor 3GPCR-agonistGPCR-targetGPCR-agonist[16]
SuvorexantOrexin receptor type 1GPCR-antagonistGPCR-targetGPCR-antagonist[16]
CetirizineHistamine H1 receptorGPCR-antagonistGPCR-targetGPCR-agonist[16]
BetazoleHistamine H2 receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
ClozapineDopamine D2 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
Frovatriptan5-hydroxytryptamine receptor 1DGPCR-agonistGPCR-targetGPCR-agonist[16]
Eletriptan5-hydroxytryptamine receptor 1DGPCR-agonistGPCR-targetGPCR-agonist[16]
Ergotamine5-hydroxytryptamine receptor 1DGPCR-agonistGPCR-targetGPCR-antagonist[16]
AmoxapineSodium-dependent serotonin transporterGPCR-antagonistGPCR-nontargetGPCR-antagonist[16]
LurasidoneDopamine D2 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
ChloroquineGlutathione S-transferase A2GPCR-antagonistGPCR-targetGPCR-agonist[16]
TasimelteonMelatonin receptor type 1AGPCR-agonistGPCR-targetGPCR-agonist[16]
NiclosamideDNAGPCR-antagonistGPCR-nontargetGPCR-antagonist[16]
LevocabastineHistamine H1 receptorGPCR-antagonistGPCR-targetGPCR-agonist[16]
NaltrexoneDelta-type opioid receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
AnileridineMu-type opioid receptorGPCR-agonistGPCR-targetGPCR-antagonist[16]
AlfentanilMu-type opioid receptorGPCR-agonistGPCR-targetGPCR-antagonist[16]
CangrelorP2Y purinoceptor 12GPCR-antagonistGPCR-targetGPCR-antagonist[16]
TreprostinilProstacyclin receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
IndomethacinProstaglandin G/H synthase 2GPCR-antagonistGPCR-targetGPCR-agonist[16]
Prostaglandin E1Prostaglandin E2 receptor EP2 subtypeGPCR-agonistGPCR-targetGPCR-agonist[16]
Prostaglandin E2Prostaglandin E2 receptor EP2 subtypeGPCR-agonistGPCR-nontargetGPCR-agonist[16]
MisoprostolProstaglandin E2 receptor EP3 subtypeGPCR-agonistGPCR-targetGPCR-agonist[16]
LatanoprostProstaglandin F2-alpha receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
EpoprostenolP2Y purinoceptor 12GPCR-agonistGPCR-targetGPCR-agonist[16]
SonidegibSmoothened homologGPCR-antagonistGPCR-targetGPCR-antagonist[16]
AprepitantNeurokinin 1 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[16]
IloprostProstacyclin receptorGPCR-agonistGPCR-targetGPCR-agonist[16]
DroxidopaAlpha-1A adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[5]
NaloxegolMu-type opioid receptorGPCR-antagonistGPCR-nontargetGPCR-agonist[5]
NetupitantNeurokinin 1 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[5]
OlodaterolBeta-2 adrenergic receptorGPCR-agonistGPCR-targetGPCR-agonist[5]
RolapitantNeurokinin 1 receptorGPCR-antagonistGPCR-targetGPCR-antagonist[5]
SelexipagProstacyclin receptorGPCR-agonistGPCR-targetGPCR-agonist[5]
Pimavanserin5-hydroxytryptamine receptor 2AGPCR-agonistGPCR-targetGPCR-agonist[5]
NaldemedineMu-type opioid receptorGPCR-antagonistGPCR-nontargetGPCR-antagonist[5]

Note that incorrectly predicted events are shown in bold

Model evaluation using FDA-approved GPCR drugs Note that incorrectly predicted events are shown in bold

Conclusion

Because the GPCRs are involved in diverse cellular signaling transductions and therefore play essential and important roles in pharmaceutical research, they have long been considered as prime targets for drug discovery. However, unlike other cellular proteins, experimental screening of GPCR structure–function and ligand-identification is expensive and time-consuming. Machine learning-based approaches have recently gained popularity in GPCR-based virtual drug discovery. In this study, we developed in-silico models to predict GPCR-agonists and GPCR-antagonists with reasonably high accuracy. The key contribution of this work is two folds: first one is presenting a GPCR-type independent classification model that could classify both GPCR agonists and antagonists together, regardless of the GPCR types, and second is using over 14,000 of publicly available ligand-target interaction data that could make the model more accurate and could be used in future similar studies. Although our prediction models require further testing, they could be applied in drug discovery technologies to predict putative GPCR-binding ligands from millions of unlabeled chemical compounds.

Methods

Data acquisition

We acquired pharmacological datasets relating to ligand-activity-target relationships from the GtoPdb (https://www.guidetopharmacology.org) [8], including data for over 1700 drug targets with over 9000 related ligands, and the CODA network database [9], including drug–drug target associations with related molecular, phenomic, and anatomical variables. Out of 14,659 human ligand-target interactions, 4590 ligand-target pairs were analyzed in this study and included both a ligand-binding type (e.g., agonist and antagonist) and a SMILES descriptor. We collated a list of the FDA-approved GPCR-targeting drugs [5, 16] and screened the DrugBank database [17] for ligand-binding types and SMILES descriptors related to these drugs. ECFP4 fingerprints were calculated using Dragon software (version 7.0.10) [18], and the Tanimoto index [19] was used to determine the similarity between ligands.

Feature selection

Dragon software (version 7.0.10) [18] was used to calculate the chemical and physical properties (molecular descriptors) of chemicals from their SMILES as an input. These chemoinformatic properties include 1D descriptors, such as the number of atom types and structural fragments of the molecule, and 2D descriptors, such as structural features, logP, and connectivity indices [18, 20]. We applied the Boruta packages (version 7.0.0) [21] with default parameters to obtain the best subset of descriptors. To screen the key features in each class, the FSelector package [22] in R software was used.

Machine learning model and performance evaluation

We applied a RF machine learning model, using the randomForest function in the R randomForest package [23]. For the main two parameters, the number of random explanatory variables for splitting each tree node, mtry, and the number of trees, ntree, were set at number of features and 100, respectively. To validate the constructed RF model, we used the LOO-CV for method selection [24]. To obtain the performance measurement values (true positive, TP; true negative, TN; false positive, FP; false negative, FN) of the integrated two-step models, a micro-average calculation [25] was used. Additional file 1: Selected 990 features after applying Boruta algorithm. Additional file 2: Model evaluation using GPCR-nontarget drugs from DrugBank and UniProt database.
  17 in total

1.  Mold(2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics.

Authors:  Huixiao Hong; Qian Xie; Weigong Ge; Feng Qian; Hong Fang; Leming Shi; Zhenqiang Su; Roger Perkins; Weida Tong
Journal:  J Chem Inf Model       Date:  2008-06-20       Impact factor: 4.956

Review 2.  The GPCR Network: a large-scale collaboration to determine human GPCR structure and function.

Authors:  Raymond C Stevens; Vadim Cherezov; Vsevolod Katritch; Ruben Abagyan; Peter Kuhn; Hugh Rosen; Kurt Wüthrich
Journal:  Nat Rev Drug Discov       Date:  2012-12-14       Impact factor: 84.694

Review 3.  The essential role of G protein-coupled receptor (GPCR) signaling in regulating T cell immunity.

Authors:  Dashan Wang
Journal:  Immunopharmacol Immunotoxicol       Date:  2018-02-12       Impact factor: 2.730

Review 4.  The structure and function of G-protein-coupled receptors.

Authors:  Daniel M Rosenbaum; Søren G F Rasmussen; Brian K Kobilka
Journal:  Nature       Date:  2009-05-21       Impact factor: 49.962

Review 5.  Trends in GPCR drug discovery: new agents, targets and indications.

Authors:  Alexander S Hauser; Misty M Attwood; Mathias Rask-Andersen; Helgi B Schiöth; David E Gloriam
Journal:  Nat Rev Drug Discov       Date:  2017-10-27       Impact factor: 84.694

6.  Agonists of G-Protein-Coupled Odorant Receptors Are Predicted from Chemical Features.

Authors:  C Bushdid; C A de March; S Fiorucci; H Matsunami; J Golebiowski
Journal:  J Phys Chem Lett       Date:  2018-04-17       Impact factor: 6.475

7.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Authors:  Dávid Bajusz; Anita Rácz; Károly Héberger
Journal:  J Cheminform       Date:  2015-05-20       Impact factor: 5.514

8.  DrugBank 5.0: a major update to the DrugBank database for 2018.

Authors:  David S Wishart; Yannick D Feunang; An C Guo; Elvis J Lo; Ana Marcu; Jason R Grant; Tanvir Sajed; Daniel Johnson; Carin Li; Zinat Sayeeda; Nazanin Assempour; Ithayavani Iynkkaran; Yifeng Liu; Adam Maciejewski; Nicola Gale; Alex Wilson; Lucy Chin; Ryan Cummings; Diana Le; Allison Pon; Craig Knox; Michael Wilson
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

9.  CODA: Integrating multi-level context-oriented directed associations for analysis of drug effects.

Authors:  Hasun Yu; Jinmyung Jung; Seyeol Yoon; Mijin Kwon; Sunghwa Bae; Soorin Yim; Jaehyun Lee; Seunghyun Kim; Yeeok Kang; Doheon Lee
Journal:  Sci Rep       Date:  2017-08-08       Impact factor: 4.379

10.  Predicting protein-ligand interactions based on bow-pharmacological space and Bayesian additive regression trees.

Authors:  Li Li; Ching Chiek Koh; Daniel Reker; J B Brown; Haishuai Wang; Nicholas Keone Lee; Hien-Haw Liow; Hao Dai; Huai-Meng Fan; Luonan Chen; Dong-Qing Wei
Journal:  Sci Rep       Date:  2019-05-22       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.