| Literature DB >> 31247932 |
Chang Lu1,2, Zhe Liu1,2, Enju Zhang1,2, Fei He3,4, Zhiqiang Ma5,6, Han Wang7,8.
Abstract
Membrane proteins (MPs) are involved in many essential biomolecule mechanisms as a pivotal factor in enabling the small molecule and signal transport between the two sides of the biological membrane; this is the reason that a large portion of modern medicinal drugs target MPs. Therefore, accurately identifying the membrane protein-ligand binding sites (MPLs) will significantly improve drug discovery. In this paper, we propose a sequence-based MPLs predictor called MPLs-Pred, where evolutionary profiles, topology structure, physicochemical properties, and primary sequence segment descriptors are combined as features applied to a random forest classifier, and an under-sampling scheme is used to enhance the classification capability with imbalanced samples. Additional ligand-specific models were taken into consideration in refining the prediction. The corresponding experimental results based on our method achieved an appreciable performance, with 0.63 MCC (Matthews correlation coefficient) as the overall prediction precision, and those values were 0.604, 0.7, and 0.692, respectively, for the three main types of ligands: drugs, metal ions, and biomacromolecules. MPLs-Pred is freely accessible at http://icdtools.nenu.edu.cn/.Entities:
Keywords: binding site prediction; ligand-specific model; membrane protein; protein-ligand
Year: 2019 PMID: 31247932 PMCID: PMC6651575 DOI: 10.3390/ijms20133120
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The relative composition of (a) universal ligand-binding residues, (b) drug-like compound-binding residues, (c) metal-binding residues, and (d) biomacromolecule-binding residues based on background distribution of all residues in corresponding datasets.
Figure 2Two-sample logos of universal ligand-binding residues against non-binding residues.
Figure 3The heat map of the Pearson correlation coefficient between features and labels. Red represents positive correlations and blue represents negative correlations. The darker the color, the higher the correlation.
The performance of different combinations of features.
| Feature Combination | ACC | Spe | Sen | MCC |
|---|---|---|---|---|
| PSSM |
|
| 0.386 | 0.582 |
| TOPO | 0.548 | 0.309 | 0.788 | 0.113 |
| PCP | 0.973 | 0.998 | 0.216 | 0.413 |
| SeqSeg | 0.973 | 0.999 | 0.189 | 0.389 |
| PSSM+TOPO | 0.98 | 0.998 | 0.415 | 0.603 |
| PSSM+PCP | 0.979 | 0.998 | 0.421 | 0.599 |
| PSSM+ SeqSeg | 0.98 | 0.998 | 0.415 | 0.601 |
| TOPO+PCP | 0.973 | 0.998 | 0.22 | 0.416 |
| TOPO+SeqSeg | 0.904 | 0.981 | 0.139 | 0.201 |
| PCP+SeqSeg | 0.973 | 0.998 | 0.213 | 0.41 |
| PSSM+TOPO+PCP | 0.979 | 0.998 | 0.422 | 0.6 |
| PSSM+TOPO+SeqSeg | 0.98 | 0.998 | 0.416 | 0.603 |
| PSSM+PCP+SeqSeg | 0.979 | 0.998 | 0.421 | 0.601 |
| TOPO+PCP+SeqSeg | 0.973 | 0.998 | 0.216 | 0.414 |
| PSSM+TOPO+PCP+SeqSeg | 0.971 | 0.997 |
|
|
Bold text in the table indicates that the feature achieved the highest value on this evaluation index.
Figure 4The tendency of MCC value as the ratio of non-binding residues and binding residues increase over (a) 10-fold cross-validation test and (b) independent validation test.
Comparison of random forests (RF) with other classifiers.
| Method | ACC | Spe | Sen | MCC |
|---|---|---|---|---|
| SVM | 0.9578 | 0.98 |
| 0.347 |
| Naïve Bayes | 0.844 | 0.85 | 0.67 | 0.246 |
| AdaBoost * |
| 0.995 | 0.334 | 0.472 |
| RF | 0.971 |
| 0.464 |
|
* Adaboost classifier-employed decision tree as the basic classifier. Bold text in the table indicates that the feature achieved the highest value on this evaluation index.
Performance of the membrane protein-ligand binding site predictor (MPLs-Pred) on the training dataset with the universal model and ligand-specific models over 10-fold cross-validation.
| Model | ACC | Spe | Sen | MCC |
|---|---|---|---|---|
| Universal | 0.971 | 0.997 | 0.464 | 0.627 |
| Drug | 0.973 | 1.0 | 0.153 | 0.366 |
| Metal | 0.984 | 0.997 | 0.589 | 0.704 |
| Biomacromolecule | 0.936 | 0.993 | 0.481 | 0.629 |
Performance of MPLs-Pred on the independent testing dataset with the universal model and ligand-specific models.
| Model | ACC | Spe | Sen | MCC |
|---|---|---|---|---|
| Universal | 0.996 | 0.998 | 0.618 | 0.63 |
| Drug | 0.997 | 1.0 | 0.397 | 0.604 |
| Metal | 0.996 | 0.998 | 0.759 | 0.7 |
| Biomacromolecule | 0.997 | 0.999 | 0.596 | 0.692 |
Figure 5Visualization of membrane protein-ligand binding sites and their corresponding prediction result generated by MPLs-Pred with the universal model and ligand-specific models. As examples: (a) P00959: metal-binding MP with four zinc ion-binding residues, and (b) Q43133: drug-binding MP with six dimethylallyl diphosphate-binding residues.
Comparison of MPLs-Pred with the previous study on the independent dataset.
| Method | ACC | Spe | Sen | MCC |
|---|---|---|---|---|
| Tm-lig | 0.857 | 0.857 |
| 0.126 |
| MPLs-Pred |
|
| 0.618 |
|
Bold text in the table indicates that the feature achieved the highest value on this evaluation index.
Figure 6Gene ontology enrichment of Homo’s ligand-binding proteins: (a) biological process enrichment; (b) cell component enrichment; (c) molecular function enrichment. The graphs were made with OmicsBean.
Figure 7The KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment of Homo’s ligand-binding proteins. The pictures were made with OmicsBean.
The detailed composition of new built standard datasets.
| Dataset | Training Dataset | Testing Dataset | ||||
|---|---|---|---|---|---|---|
| No. of Proteins | No. of Residues | Ratio 1 | No. of Protein | No. of Residues | Ratio 1 | |
| Universal | 2500 2 | (10,143, 1,524,372) | 1:150 | 234 | (836, 164,792) | 1:197 |
| Drug | 655 | (1839, 386,979) | 1:210 | 45 | (121, 25,193) | 1:208 |
| Metal | 1375 | (5734, 804,610) | 1:140 | 117 | (503, 85,437) | 1:170 |
| Biomacromolecule | 857 | (2505, 435,298) | 1:174 | 67 | (161, 35,022) | 1:218 |
1 Figure in Ratio represents the ratio of binding to non-binding residues. 2 There are overlaps among datasets since some proteins interact with two or more type of ligands.