| Literature DB >> 28854211 |
Xiaoyong Cao1, Xiuzhen Hu1, Xiaojin Zhang1, Sujuan Gao1,2, Changjiang Ding1, Yonge Feng2, Weihua Bao1.
Abstract
The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28854211 PMCID: PMC5576659 DOI: 10.1371/journal.pone.0183756
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The statistics of the dataset using the sequence segment of length 17 for the ten metal ions.
| Metal ion | Chains | Binding segments | Non-binding segments |
|---|---|---|---|
| Zn2+ | 1428(142) | 6408 | 405113 |
| Cu2+ | 117(110) | 485 | 33948 |
| Fe2+ | 92(227) | 382 | 29345 |
| Fe3+ | 217(103) | 1057 | 68829 |
| Co2+ | 194(0) | 875 | 55050 |
| Mn2+ | 459(379) | 2124 | 156625 |
| Ca2+ | 1237(179) | 6789 | 396957 |
| Mg2+ | 1461(103) | 5212 | 480307 |
| K+ | 57(53) | 535 | 18777 |
| Na+ | 78(78) | 489 | 27408 |
aThe number of protein chains. The number in parentheses is the number of proteins in the Dataset of Hu et al.
The statistics of the training dataset and the independent test dataset.
| Ligand | Training dataset | Independent test dataset | ||||
|---|---|---|---|---|---|---|
| Chains | P | N | Chains | P | N | |
| Zn2+ | 1142 | 5145 | 321161 | 286 | 1263 | 83952 |
| Cu2+ | 93 | 377 | 27548 | 24 | 108 | 6400 |
| Fe2+ | 73 | 301 | 23824 | 19 | 81 | 5521 |
| Fe3+ | 173 | 859 | 54945 | 44 | 198 | 13884 |
| Ca2+ | 989 | 5256 | 312876 | 248 | 1533 | 84081 |
| Mg2+ | 1168 | 4069 | 384365 | 293 | 1143 | 95942 |
| Mn2+ | 367 | 1685 | 124543 | 92 | 439 | 32082 |
| Na+ | 62 | 408 | 22411 | 16 | 81 | 4997 |
| K+ | 45 | 410 | 14882 | 12 | 125 | 3895 |
| Co2+ | 155 | 707 | 44300 | 39 | 168 | 10750 |
aThe number of positive (binding) samples
bThe number of negative (non-binding) samples.
Fig 1Schematic diagram of the proposed method.
Fig 2Illustration of position-specific conservation of amino acid residues in the binding and non-binding sequence segments for ions of (A) Ca2+, (B) Mg2+, (C) K+, (D) Na+, (E) Zn2+ and (F) Cu2+.
The larger residues are more conserved than the smaller ones. Each subfigure of (A), (B), (C), (D), (E), and (F) contains two figures, where the left one indicates the position-specific conservation in positive sequence segments and the right one indicates the position-specific conservation in negative sequence segments.
Fig 3Statistical analysis of the amino acid composition in positive and negative segments for Na+, K+, Mg2+, Ca2+, Zn2+, and Cu2+.
Hydrophilic-hydrophobic classification of amino acids.
| Classification | Amino Acids | Classification | Amino Acids |
|---|---|---|---|
| strongly hydrophilic | R, D, E, N, Q, K, H | Proline | P |
| weakly hydrophilic | L, I, V, A, M, F | Glycine | G |
| strongly hydrophobic | S, T, Y, W | Cysteine | C |
The polarization charge property of amino acids.
| Classification | Amino Acids |
|---|---|
| positive charged | K, R, P |
| negative charged | D, E |
| uncharged | N, Q, H, L, I, V, A, M, F, S, T, Y, W, C, G |
Fig 4The distribution of relative solvent accessibilities for binding and non-binding residues of (A) Fe3+ ligand and (B) Mn2+ ligand.
Performance of PWSM by 5-fold cross-validation.
| Ligand | Optimal windows (W) | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| Zn2+ | 7 | ||||
| Cu2+ | 13 | ||||
| Fe2+ | 9 | ||||
| Fe3+ | 9 | ||||
| Co2+ | 11 | ||||
| Mn2+ | 7 | 87.3 | 63.6 | 75.9 | 0.526 |
| Ca2+ | 9 | 57.9 | 80.6 | 69.2 | 0.395 |
| Mg2+ | 9 | 55.6 | 80.9 | 68.3 | 0.378 |
| K+ | 11 | 61.3 | 72.0 | 66.6 | 0.335 |
| Na+ | 9 | 30.1 | 95.3 | 62.7 | 0.335 |
The performance of SVM(S(P)+ID(AA)) by 5-fold cross-validation.
| Ligand | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Mn2+ | 73.4 | 83.9 | 78.7 | 0.577 |
| Ca2+ | 71.1 | 58.0 | 70.8 | 0.422 |
| Mg2+ | 64.2 | 73.9 | 69.0 | 0.382 |
| K+ | 72.2 | 67.5 | 69.8 | 0.397 |
| Na+ | 73.6 | 70.1 | 71.9 | 0.438 |
Recognition results of ligand binding residues for K+ ion.
| Algorithm (Parameter) | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| PWSM(P) | 61.3 | 72.0 | 66.6 | 0.335 |
| SVM(ID(AA)+S(P)) | 72.2 | 67.5 | 69.8 | 0.397 |
| SVM(ID(AA)+S(P)+SS+S(SS)) | 74.2 | 67.3 | 70.7 | 0.416 |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)) | 78.5 | 72.7 | 75.6 | 0.513 |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(C)) | 70.2 | 88.1 | 79.2 | 0.593 |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(C)+S(SA)) | 77.3 | 83.2 | 80.3 | 0.607 |
ID(AA) represents the ID values of amino acid composition, S(P) represents the scoring values of position amino acid conservation information, SS represents the scoring values of the frequency of secondary structure, S(SS) represents the scoring values of second structure information, and S(H) represents the scoring values of hydrophobicity and hydrophilicity information. S(C) represents the scoring values of polarization charge information. S(SA) represents the scoring values of solvent accessibility information.
The performance of Ca2+ and Mn2+ by 5-fold cross-validation with feature tuning.
| ID | Algorithm (Parameter) | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| Ca2+ | SVM(ID(AA)+S(P)+SS+S(SS)) | 69.0 | 75.7 | 72.3 | 0.448 |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)) | 68.3 | 76.5 | 72.4 | 0.450 | |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(C)+S(SA)) | 69.7 | 82.0 | 75.8 | 0.521 | |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(C)+S(SA)) | 71.3 | 79.1 | 74.8 | 0.502 | |
| Mn2+ | SVM(ID(AA)+S(P)+SS+S(SS)+S(H)) | 77.6 | 84.2 | 80.8 | 0.618 |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(C)) | 78.2 | 83.9 | 81.1 | 0.622 | |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(C)+S(SA)) | 82.1 | 84.4 | 83.2 | 0.664 | |
| SVM(ID(AA)+S(P)+SS+S(SS)+S(H)+S(SA)) | 82.0 | 84.8 | 83.4 | 0.667 |
The performance of the metal-ion-binding-residue prediction of SVM using 5-fold cross-validation.
| Ligand | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Zn2+ | 99.8 | 99.5 | 99.7 | 0.993 |
| Cu2+ | 95.5 | 97.1 | 96.3 | 0.926 |
| Fe2+ | 91.9 | 90.7 | 91.3 | 0.826 |
| Fe3+ | 86.9 | 88.7 | 87.8 | 0.756 |
| Ca2+ | 71.3 | 79.1 | 74.8 | 0.502 |
| Mg2+ | 76.6 | 73.9 | 75.3 | 0.505 |
| Mn2+ | 82.1 | 84.4 | 83.2 | 0.664 |
| Na+ | 82.2 | 76.2 | 79.4 | 0.586 |
| K+ | 77.3 | 83.2 | 80.3 | 0.607 |
| Co2+ | 80.8 | 85.1 | 83.0 | 0.660 |
Comparison of our independent test results with IonSeq.
| Ligand | L | Method | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|---|
| Zn2+ | 13 | IonSeq | 43.56 | 99.75 | 99.21 | 0.5043 |
| 7 | OUR’S | 94.1 | 84.3 | 84.4 | 0.2528 | |
| Cu2+ | 15 | IonSeq | 50.65 | 99.69 | 99.01 | 0.5772 |
| 13 | OUR’S | 91.7 | 82.9 | 83.0 | 0.2458 | |
| Fe2+ | 9 | IonSeq | 54.08 | 99.51 | 98.84 | 0.6370 |
| 9 | OUR’S | 90.1 | 73.6 | 73.9 | 0.1708 | |
| Fe3+ | 11 | IonSeq | 52.27 | 99.81 | 99.21 | 0.2111 |
| 9 | OUR’S | 87.9 | 72.7 | 72.9 | 0.1584 | |
| Ca2+ | 9 | IonSeq | 22.72 | 99.04 | 98.18 | 0.1825 |
| 9 | OUR’S | 59,5 | 79.2 | 78.9 | 0.1251 | |
| Mg2+ | 15 | IonSeq | 5.57 | 99.98 | 99.49 | 0.4553 |
| 9 | OUR’S | 50.2 | 81.9 | 81.6 | 0.0871 | |
| Mn2+ | 11 | IonSeq | 31.07 | 99.82 | 99.01 | 0.1516 |
| 7 | OUR’S | 76.5 | 79.8 | 79.8 | 0.1599 | |
| Na+ | 13 | IonSeq | 77.14 | 74.04 | 74.09 | 0.2283 |
| 9 | OUR’S | 33.3 | 78.2 | 77.5 | 0.0348 | |
| K+ | 11 | IonSeq | 8.52 | 99.88 | 97.32 | 0.2283 |
| 11 | OUR’S | 45.6 | 62.8 | 62.3 | 0.0301 | |
| Co2+ | - | IonSeq | - | - | - | - |
| 11 | OUR’S | 0.732 | 0.823 | 0.822 | 0.176 |