| Literature DB >> 32312232 |
Pu Wang1, Xiaotong Huang1, Wangren Qiu2, Xuan Xiao3.
Abstract
BACKGROUND: G protein-coupled receptors (GPCRs) mediate a variety of important physiological functions, are closely related to many diseases, and constitute the most important target family of modern drugs. Therefore, the research of GPCR analysis and GPCR ligand screening is the hotspot of new drug development. Accurately identifying the GPCR-drug interaction is one of the key steps for designing GPCR-targeted drugs. However, it is prohibitively expensive to experimentally ascertain the interaction of GPCR-drug pairs on a large scale. Therefore, it is of great significance to predict the interaction of GPCR-drug pairs directly from the molecular sequences. With the accumulation of known GPCR-drug interaction data, it is feasible to develop sequence-based machine learning models for query GPCR-drug pairs.Entities:
Keywords: Bag-of-words; Discrete Fourier transform; GPCR-drug interaction; Machine learning
Mesh:
Substances:
Year: 2020 PMID: 32312232 PMCID: PMC7171867 DOI: 10.1186/s12859-020-3488-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1ROC curves of ten-fold cross-validation on D92M with different amino acid indices for encoding GPCRs
Fig. 2ROC curves of ten-fold cross-validation on D92M while representing drugs with primary molecular fingerprint (without DFT) or frequency amplitudes (with DFT)
Fig. 3ROC curves of ten-fold cross-validation on D92M with different feature representations of GPCRs
Fig. 4AUCs of ten-fold cross-validation on D92M with different K values in DWKNN
Performance comparisons between base learner and ensemble models on D92M over leave-one-out cross-validation. All the results are obtained by setting 0.5 as the default discrimination threshold to generate the prediction label except the Maximum MCC values which are obtained by identifying the thresholds that maximize the values of MCC
| Metrics | Base learner | Ensemble model with different | ||||
|---|---|---|---|---|---|---|
| 2 | 4 | 6 | 8 | 10 | ||
| Acc (%) | 83.60 | 84.09 | 85.05 | 84.25 | 84.41 | 84.73 |
| Sn (%) | 81.42 | 80.31 | 81.1 | 80.47 | 80.63 | 80.94 |
| Sp (%) | 84.73 | 86.04 | 87.1 | 86.2 | 86.37 | 86.69 |
| MCC | 0.64 | 0.65 | 0.67 | 0.66 | 0.66 | 0.66 |
| Maximum MCC | 0.65 | 0.68 | 0.68 | 0.69 | 0.69 | 0.69 |
Performance comparisons of different methods on D92M over leave-one-out cross-validation. The best results for each metric are in bold
| Method | Sn (%) | Sp (%) | Acc (%) | Str(%) | MCC |
|---|---|---|---|---|---|
| iGPCR-Drug | 78.3 | 91.4 | 86.9 | 84.9 | 0.71 |
| OET-KNN | 77.8 | 88.7 | 85.0 | 83.3 | 0.67 |
| QuickRBF | 74.8 | 92.4 | 86.4 | 83.6 | 0.69 |
| SVM | 74.2 | 92.7 | 86.3 | 83.5 | 0.69 |
| RF | 76.5 | 87.3 | 84.7 | 0.71 | |
| RF + PPP | 79.7 | 92.8 | |||
| Proposed(Base learner) | 84.7 | 83.6 | 83.1 | 0.64 | |
| Proposed(Ensemble model with | 81.1 | 87.1 | 85.1 | 84.1 | 0.67 |
Performance comparisons of different methods on the independent test dataset check390. The best results for each metric are in bold
| Method | Sn (%) | Sp (%) | Acc (%) | Str(%) | MCC | Threshold |
|---|---|---|---|---|---|---|
| iGPCR-Drug | 80.8 | 66.9 | 71.6 | 73.9 | 0.45 | N/A |
| OET-KNN | 67.7 | 78.7 | 76.9 | 0.52 | 0.5 | |
| QuickRBF | 76.2 | 77.7 | 77.2 | 77.6 | 0.52 | 0.45 |
| SVM | 76.2 | 78.9 | 78.0 | 77.6 | 0.53 | 0.42 |
| RF | 78.5 | 78.1 | 78.2 | 78.3 | 0.54 | 0.51 |
| RF + PPP | 83.1 | 79.6 | 80.8 | 81.3 | 0.60 | 0.51 |
| Proposed(Base learner) | 80.0 | 81.3 | 81.9 | 0.61 | 0.5 | |
| Proposed(Ensemble model with | 83.1 | 82.7 | 0.5 |
Fig. 5Flowchart of creating GPCR wordbook
Fig. 6Fragments of length 2 sampled from the GPCR sequences encoded by hydropathy property. The sampled fragments belong to the same cluster are drawn in the same color and shape. The black asterisks are the clustering centers
Fig. 7Framework of the proposed basic method
Fig. 8Framework of the proposed ensemble method