| Literature DB >> 32265982 |
Xiuzhen Hu1, Zhenxing Feng1, Xiaojin Zhang1, Liu Liu1, Shan Wang1.
Abstract
Many proteins realize their special functions by binding with specific metal ion ligands during a cell's life cycle. The ability to correctly identify metal ion ligand-binding residues is valuable for the human health and the design of molecular drug. Precisely identifying these residues, however, remains challenging work. We have presented an improved computational approach for predicting the binding residues of 10 metal ion ligands (Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Ca2+, Mg2+, Mn2+, Na+, and K+) by adding reclassified relative solvent accessibility (RSA). The best accuracy of fivefold cross-validation was higher than 77.9%, which was about 16% higher than the previous result on the same dataset. It was found that different reclassification of the RSA information can make different contributions to the identification of specific ligand binding residues. Our study has provided an additional understanding of the effect of the RSA on the identification of metal ion ligand binding residues.Entities:
Keywords: binding residues; metal ion ligand; position weight matrix; relative solvent accessibility; secondary structure
Year: 2020 PMID: 32265982 PMCID: PMC7096583 DOI: 10.3389/fgene.2020.00214
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
The benchmark datasets of 10 metal ion ligands.
| Metal ion ligand | Number of chains | P | N | L |
| Zn2+ | 1428 | 6408 | 405113 | 7 |
| Cu2+ | 117 | 485 | 33948 | 13 |
| Fe2+ | 92 | 382 | 29345 | 9 |
| Fe3+ | 217 | 1057 | 68829 | 9 |
| Co2+ | 194 | 875 | 55050 | 11 |
| Ca2+ | 1237 | 6789 | 396957 | 9 |
| Mg2+ | 1461 | 5212 | 480307 | 9 |
| Mn2+ | 459 | 2124 | 156625 | 7 |
| Na+ | 78 | 489 | 27408 | 9 |
| K+ | 57 | 535 | 18777 | 11 |
FIGURE 1Flowchart of the method for the identification of metal ion ligand-binding residues.
Predicted results for K+ ligand-binding residues.
| Feature parameter | Sn (%) | Sp (%) | FPR (%) | Acc (%) | MCC |
| WA | 60.7 | 60.2 | 39.8 | 60.5 | 0.209 |
| WA + QS | 63.2 | 60.2 | 39.8 | 61.7 | 0.234 |
| WA + QS + DH | 65.4 | 61.9 | 38.1 | 63.6 | 0.273 |
| WA + QS + DH + SS | 73.8 | 58.5 | 41.5 | 66.2 | 0.327 |
| WA + QS + DH + SS + SA_2 | 80.2 | 76.3 | 23.7 | 78.2 | 0.565 |
FIGURE 2The statistical distribution of relative solvent accessibility in positive and negative set for K+ ligand. Note: the abscissa axis is the values of the relative solvent accessibility, and the ordinate is the number of amino acids corresponding to each predicted value. The solid red line represents the positive set, and the dotted blue line represents the negative set.
Predicted results of K+ ligand-binding residues.
| SA classification | Sn (%) | Sp (%) | FPR (%) | Acc (%) | MCC |
| SA_2 | 80.2 | 76.3 | 23.7 | 78.2 | 0.565 |
| SA_4 | 85.4 | 81.9 | 18.1 | 83.6 | 0.673 |
| SA_V | 87.5 | 85.0 | 15.0 | 86.3 | 0.725 |
| SA_P | 81.7 | 77.4 | 22.6 | 79.5 | 0.591 |
The optimal predicted results of 10 metal ion ligand-binding residues and corresponding specific classifications of relative solvent accessibility.
| Ligand | SA classification | Sn (%) | Sp (%) | FPR (%) | Acc (%) | MCC |
| Zn2+ | SA_4 | 92.6 | 90.3 | 9.7 | 91.5 | 0.829 |
| Cu2+ | SA_4 | 94.0 | 94.2 | 5.8 | 94.1 | 0.883 |
| Fe2+ | SA_4 | 99.2 | 100 | 0 | 99.6 | 0.992 |
| Fe3+ | SA_V | 88.6 | 91.4 | 8.6 | 90.0 | 0.801 |
| Co2+ | SA_V | 79.8 | 89.6 | 10.4 | 84.7 | 0.697 |
| Ca2+ | SA_2 | 76.6 | 79.2 | 20.8 | 77.9 | 0.558 |
| Mg2+ | SA_4 | 91.6 | 91.5 | 8.5 | 91.6 | 0.831 |
| Mn2+ | SA_P | 81.3 | 88.3 | 11.7 | 84.8 | 0.698 |
| Na+ | SA_V | 85.9 | 84.0 | 16.0 | 85.0 | 0.700 |
| K+ | SA_V | 87.5 | 85.0 | 15.0 | 86.3 | 0.725 |
The features rejected by using the Boruta feature selection algorithm.
| Metal ion ligand | Rejected features |
| Zn2+ | WA6, DH4, DH12, DH13, DH14 |
| Cu2+ | WA2, WA5, WA6, WA8, WA15, WA18, WA19, WA20, WA21, WA22, WA23, WA24, QS1, QS2, QS3, QS4, QS5, QS6, QS7, QS8, QS9, QS10, QS11, QS15, QS16, QS17, QS18, QS19, QS20, QS22, QS23, QS24, QS25, DH1, DH2, DH3, DH4, DH5, DH6, DH7, DH8, DH9, DH10, DH11, DH12, DH15, DH16, DH17, DH18, DH19, DH20, DH21, DH22, DH23, DH24, DH25, DH26, SS1, SS2, SS3, SS4, SS5, SS6, SS8, SS13, SS14, SS15, SS16, SS17, SS21, SS22, SS26, SA1, SA2, SA3, SA4, SA5, SA6, SA7, SA8, SA9, SA10, SA12, SA21, SA23, SA24, SA25, SA26 |
| Fe2+ | WA1, WA2, WA4, WA8, WA12, WA13, WA14, WA15, WA17, QS1, QS2, QS3, QS4, QS5, QS6, QS7, QS8, QS9, QS10, QS11, QS13, QS14, QS17, QS18, DH1, DH2, DH3, DH4, DH5, DH6, DH7, DH8, DH11, DH12, DH13, DH14, DH15, DH16, SS2, SS4, SS9, SS10, SS11, SS12, SS13, SS15, SS16, SS17, SS18, SA 11, SA12, SA18 |
| Fe3+ | WA1, WA2, WA5, WA8, WA11, WA12, WA13, WA14, WA15, WA17, WA18, QS1, QS2, QS5, QS7, QS8, QS11, QS12, QS13, QS14, QS16, QS17, QS18, DH1, DH2, DH5, DH6, DH11, DH13, DH14, DH15, DH16, SA 18 |
| Co2+ | WA1, WA2, WA4, WA6, WA10, WA13, WA14, WA15, WA16, WA17, WA18, WA19, WA20, WA21, WA22, QS1, QS2, QS3, QS4, QS7, QS8, QS9, QS13, QS14, QS15, QS16, QS17, QS18, QS19, QS20, QS21, QS22, DH1, DH2, DH3, DH4, DH5, DH6, DH7, DH8, DH9, DH10, DH13, DH15, DH16, DH17, DH18, DH19, DH20, DH21, DH22, SS19, SS20, SS21, SA1, SA2, SA16, SA19, SA20, SA21, SA22 |
| Mn2+ | WA10, WA12, WA13, QS2, QS4, QS9, QS10, QS11, QS12, DH9, DH11, DH12, DH14 |
| Na+ | WA3, WA4, WA5, WA6, WA7, WA8, WA10, QS2, QS3, QS4, QS5, QS6, QS7, QS8, QS10, QS12, QS13, QS15, QS16, QS17, QS18, DH1, DH2, DH3, DH4, DH5, DH6, DH7, DH8, DH11, DH12, DH13, DH14, DH15, DH16, DH17, DH18, SS1, SS3, SS5, SS8, SS9, SS10, SS16, SS17, SS18, SA1, SA2, SA3, SA4, SA5, SA6, SA13, SA18 |
| K+ | WA1, WA2, WA3, WA5, WA6, WA7, WA8, WA9, WA10, WA13, WA14, WA18, WA19, WA21, WA22, QS1, QS2, QS3, QS4, QS5, QS6, QS7, QS8, QS9, QS10, QS13, QS14, QS15, QS17, QS18, QS19, QS20, QS21, QS22, DH1, DH2, DH3, DH4, DH5, DH6, DH7, DH8, DH9, DH13, DH14, DH15, DH16, DH17, DH18, DH19, DH20, DH21, DH22, SS1, SS2, SS3, SS4, SS5, SS6, SS21, SS22, SA1, SA2, SA4, SA5, SA6, SA8, SA10, SA13, SA14, SA17, SA18, SA19, SA20, SA21, SA22 |
FIGURE 3The feature importance of Zn2+ ligand indicated by MeanDecreaseAccuracy value (A) and MeanDecreaseGini value (B) from Random Forest. Note: the larger the MeanDecreaseAccuracy and MeanDecreaseGini values, the higher the importance of the feature parameters. WA1-WA18 is the features of amino acid, QS1-QS18 is the features of hydrophobic, DH1-DH18 is the features of charge, SS1-SS18 is the features of secondary structure, and SA1-SA18 is the features of relative solvent accessibility.
Comparison of predicted results based on the full feature and Boruta’s feature.
| Ligand | Feature selection | Feature dimension | Sn (%) | Sp (%) | Acc (%) | MCC |
| Zn2+ | Full | 70 | 92.6 | 90.3 | 91.5 | 0.829 |
| Boruta | 65 | 92.7 | 89.1 | 90.9 | 0.818 | |
| Cu2+ | Full | 130 | 94.0 | 94.2 | 94.1 | 0.883 |
| Boruta | 42 | 93.4 | 93.8 | 93.6 | 0.872 | |
| Fe2+ | Full | 90 | 99.2 | 100 | 99.6 | 0.992 |
| Boruta | 40 | 96.1 | 96.1 | 96.1 | 0.921 | |
| Fe3+ | Full | 90 | 88.6 | 91.4 | 90.0 | 0.801 |
| Boruta | 57 | 88.0 | 90.7 | 89.4 | 0.787 | |
| Co2+ | Full | 110 | 79.8 | 89.6 | 84.7 | 0.697 |
| Boruta | 49 | 79.5 | 89.1 | 84.3 | 0.690 | |
| Ca2+ | Full | 90 | 76.6 | 79.2 | 77.9 | 0.558 |
| Boruta | 90 | 76.6 | 79.2 | 77.9 | 0.558 | |
| Mg2+ | Full | 90 | 91.6 | 91.5 | 91.6 | 0.831 |
| Boruta | 90 | 91.6 | 91.5 | 91.6 | 0.831 | |
| Mn2+ | Full | 70 | 81.3 | 88.3 | 84.8 | 0.698 |
| Boruta | 57 | 81.4 | 88.0 | 84.7 | 0.695 | |
| Na+ | Full | 90 | 85.9 | 84.0 | 85.0 | 0.700 |
| Boruta | 36 | 83.6 | 82.4 | 83.0 | 0.661 | |
| K+ | Full | 110 | 87.5 | 85.0 | 86.3 | 0.725 |
| Boruta | 34 | 83.7 | 82.2 | 83.0 | 0.660 |
The statistics of the training dataset and the independent testing dataset.
| Ligand | Training dataset | Independent testing dataset | ||||
| Chains | P | N | Chains | P | N | |
| Zn2+ | 1142 | 5145 | 321,161 | 286 | 1263 | 83,952 |
| Cu2+ | 93 | 377 | 27,548 | 24 | 108 | 6400 |
| Fe2+ | 73 | 301 | 23,824 | 19 | 81 | 5521 |
| Fe3+ | 173 | 859 | 54,945 | 44 | 198 | 13,884 |
| Co2+ | 155 | 707 | 44300 | 39 | 168 | 10,750 |
| Ca2+ | 989 | 5256 | 312,876 | 248 | 1533 | 84,081 |
| Mg2+ | 1168 | 4069 | 384,365 | 293 | 1143 | 95,942 |
| Mn2+ | 367 | 1685 | 124,543 | 92 | 439 | 32,082 |
| Na+ | 62 | 408 | 22,411 | 16 | 81 | 4997 |
| K+ | 45 | 410 | 14,882 | 12 | 125 | 3895 |
Comparison of our independent test results with previous results.
| Ligand | L | Method | Sn (%) | Sp (%) | Acc (%) | MCC |
| Zn2+ | 7 | This work | 78.1 | 82.7 | 82.7 | 0.1865 |
| 7 | Cao et al. | 94.1 | 84.3 | |||
| Cu2+ | 13 | This work | 74.1 | 76.8 | 76.7 | 0.1519 |
| 13 | Cao et al. | 91.7 | 82.9 | |||
| Fe2+ | 9 | This work | 96.3 | 91.8 | ||
| 9 | Cao et al. | 90.1 | 73.6 | 73.9 | 0.1708 | |
| Fe3+ | 9 | This work | 90.9 | 83.5 | ||
| 9 | Cao et al. | 87.9 | 72.7 | 72.9 | 0.1584 | |
| Co2+ | 11 | This work | 76.8 | 83.6 | ||
| 11 | Cao et al. | 73.2 | 82.3 | 82.2 | 0.1760 | |
| Ca2+ | 9 | This work | 60.0 | 79.3 | ||
| 9 | Cao et al. | 59.5 | 79.2 | 78.9 | 0.1251 | |
| Mg2+ | 9 | This work | 75.7 | 84.0 | ||
| 9 | Cao et al. | 50.2 | 81.9 | 81.6 | 0.0871 | |
| Mn2+ | 7 | This work | 76.8 | 80.2 | ||
| 7 | Cao et al. | 76.5 | 79.8 | 79.8 | 0.1599 | |
| Na+ | 9 | This work | 43.2 | 84.5 | ||
| 9 | Cao et al. | 33.3 | 78.2 | 77.5 | 0.0348 | |
| K+ | 11 | This work | 51.2 | 73.1 | ||
| 11 | Cao et al. | 45.6 | 62.8 | 62.3 | 0.0301 |
Comparison of our optimal predicted results in fivefold cross-validation with previous results.
| Ligand | Method | Sn (%) | Sp (%) | Acc (%) | MCC |
| Zn2+ | This work | 92.6 | 90.3 | 91.5 | 0.829 |
| Wang et al. | 94.2 | 84.2 | 89.2 | 0.789 | |
| Cao et al. | 99.8 | 99.5 | |||
| Cu2+ | This work | 94.0 | 94.2 | 94.1 | 0.883 |
| Wang et al. | 91.3 | 86.8 | 89.0 | 0.782 | |
| Cao et al. | 95.5 | 97.1 | |||
| Fe2+ | This work | 99.2 | 100 | ||
| Wang et al. | 90.1 | 81.9 | 86.0 | 0.722 | |
| Cao et al. | 91.9 | 90.7 | 91.3 | 0.826 | |
| Fe3+ | This work | 88.6 | 91.4 | ||
| Wang et al. | 86.2 | 85.5 | 85.9 | 0.717 | |
| Cao et al. | 86.9 | 88.7 | 87.8 | 0.756 | |
| Co2+ | This work | 79.8 | 89.6 | ||
| Wang et al. | 75.3 | 86.4 | 80.9 | 0.621 | |
| Cao et al. | 80.8 | 85.1 | 83.0 | 0.660 | |
| Ca2+ | This work | 76.6 | 79.2 | ||
| Wang et al. | 68.8 | 75.3 | 72.1 | 0.443 | |
| Cao et al. | 71.3 | 79.1 | 74.8 | 0.502 | |
| Mg2+ | This work | 91.6 | 91.5 | ||
| Wang et al. | 71.1 | 73.1 | 72.1 | 0.442 | |
| Cao et al. | 76.6 | 73.9 | 75.3 | 0.505 | |
| Mn2+ | This work | 81.3 | 88.3 | ||
| Wang et al. | 82.0 | 83.9 | 83.0 | 0.659 | |
| Cao et al. | 82.1 | 84.4 | 83.2 | 0.664 | |
| Na+ | This work | 85.9 | 84.0 | ||
| Wang et al. | 68.9 | 74.0 | 71.0 | 0.430 | |
| Cao et al. | 82.2 | 76.2 | 79.4 | 0.586 | |
| K+ | This work | 87.5 | 85.0 | ||
| Wang et al. | 71.6 | 64.5 | 68.0 | 0.362 | |
| Cao et al. | 77.3 | 83.2 | 80.3 | 0.607 |
FIGURE 4The comparison of prediction performances between several machine learning methods based on the same features by using five fold cross-validation test.