| Literature DB >> 36035120 |
Sixi Hao1,2, Xiuzhen Hu1,2, Zhenxing Feng1,2, Kai Sun1,2, Xiaoxiao You1,2, Ziyang Wang1,2, Caiyun Yang1,2.
Abstract
Proteins need to interact with different ligands to perform their functions. Among the ligands, the metal ion is a major ligand. At present, the prediction of protein metal ion ligand binding residues is a challenge. In this study, we selected Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Mn2+, Ca2+ and Mg2+ metal ion ligands from the BioLip database as the research objects. Based on the amino acids, the physicochemical properties and predicted structural information, we introduced the disorder value as the feature parameter. In addition, based on the component information, position weight matrix and information entropy, we introduced the propensity factor as prediction parameters. Then, we used the deep neural network algorithm for the prediction. Furtherly, we made an optimization for the hyper-parameters of the deep learning algorithm and obtained improved results than the previous IonSeq method.Entities:
Keywords: binding residues; deep neural network algorithm; disorder value; metal ion ligand; propensity factors
Year: 2022 PMID: 36035120 PMCID: PMC9402973 DOI: 10.3389/fgene.2022.969412
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
The non-redundant data set for eight metal ion ligands.
| Ligands | L | Chains | P | N |
|---|---|---|---|---|
| Zn2+ | 13 | 1,428 | 6,408 | 405,113 |
| Cu2+ | 15 | 117 | 485 | 33,948 |
| Fe2+ | 9 | 92 | 382 | 29,345 |
| Fe3+ | 11 | 217 | 1,057 | 68,829 |
| Co2+ | 11 | 194 | 875 | 55,050 |
| Mn2+ | 11 | 459 | 2,124 | 156,625 |
| Ca2+ | 9 | 1,237 | 6,789 | 396,957 |
| Mg2+ | 15 | 1,461 | 5,212 | 480,307 |
Note: Ligands represents metal ion ligand; L represents the sequence fragment length; Chains represents the number of chains in a protein; P represents the binding residues; N represents the non-binding residues.
FIGURE 1Distribution of disorder values of the binding residue and non-binding residue of Ca2+ and Cu2+ ligands. Note: The ordinate is the probability of the disorder value; P, represents the binding residues; N, represents the non-binding residues.
FIGURE 2Classification of charge features and hydrophilic-hydrophobic features of amino acids. Note: (A) is 3 categories of the charge features; (B) is 6 categories of the hydrophilic-hydrophobic features.
FIGURE 3Statistical analysis of the propensity factors of binding residues and non-binding residues. Note: In Figure 3, the ordinate represents the value of propensity factors, and P and N represent binding residues and non-binding residues, respectively. Figures (A) and (B) are the statistical analysis of propensity factors of amino acids of Ca2+ and Cu2+ ligands, respectively; The abscissa represents 20 amino acids. Figures (C) and (D) are the statistical analysis of propensity factors of charge features of Ca2+ and Cu2+ ligands, respectively; and the abscissa represents the three charge classifications. Figures (E) and (F) are the statistical analysis of the propensity factors of hydrophilic-hydrophobic features of Ca2+ and Cu2+ ligands, respectively; and the abscissa represents the six hydrophilic-hydrophobic classifications.
Comparison of 5-fold cross-validation results.
| Ligand | Algorithm | Hidden layers | Hidden neurons | Batch size | Sn(%) | Sp(%) | Acc(%) | MCC |
|---|---|---|---|---|---|---|---|---|
| Zn2+ | DNNa | 2 | 64 | 64 | 26.65 | 99.34 | 98.21 | 0.3147 |
| DNNb | 2 | 64 | 64 | 31.49 | 99.51 | 98.45 | 0.3923 | |
| DNNc | 2 | 16 | 16 | 33.33 | 99.73 | 98.69 | 0.4630 | |
| IonSeq | — | — | — | 43.56 | 99.21 | 99.75 | 0.5043 | |
| Cu2+ | DNNa | 2 | 64 | 64 | 38.97 | 98.62 | 97.78 | 0.3237 |
| DNNb | 2 | 64 | 64 | 42.06 | 99.07 | 98.27 | 0.3982 | |
| DNNc | 4 | 64 | 16 | 49.90 | 99.38 | 98.68 | 0.5070 | |
| IonSeq | — | — | — | 50.65 | 99.01 | 99.69 | 0.5868 | |
| Fe2+ | DNNa | 2 | 64 | 64 | 29.32 | 98.74 | 97.85 | 0.2504 |
| DNNb | 2 | 64 | 64 | 33.25 | 99.15 | 98.30 | 0.3264 | |
| DNNc | 2 | 16 | 16 | 35.84 | 99.27 | 98.45 | 0.3659 | |
| IonSeq | — | — | — | 54.08 | 99.51 | 98.84 | 0.5772 | |
| Fe3+ | DNNa | 2 | 64 | 64 | 27.27 | 99.47 | 98.32 | 0.3254 |
| DNNb | 2 | 64 | 64 | 29.29 | 99.49 | 98.39 | 0.3452 | |
| DNNc | 2 | 16 | 16 | 32.08 | 99.51 | 98.49 | 0.3953 | |
| IonSeq | — | — | — | 52.27 | 99.81 | 99.21 | 0.6370 | |
| Co2+ | DNNa | 2 | 64 | 64 | 11.53 | 99.18 | 97.81 | 0.1354 |
| DNNb | 2 | 64 | 64 | 16.00 | 99.36 | 98.06 | 0.2051 | |
| DNNc | 4 | 16 | 16 | 17.83 | 99.37 | 98.10 | 0.2254 | |
| IonSeq | — | — | — | — | — | — | — | |
| Mn2+ | DNNa | 2 | 64 | 64 | 15.74 | 99.71 | 98.60 | 0.2462 |
| DNNb | 2 | 64 | 64 | 17.62 | 99.70 | 98.61 | 0.277 | |
| DNNc | 3 | 16 | 32 | 18.17 | 99.74 | 98.65 | 0.2933 | |
| IonSeq | — | — | — | 31.07 | 99.82 | 99.01 | 0.4553 | |
| Ca2+ | DNNa | 2 | 64 | 64 | 20.42 | 98.52 | 97.20 | 0.1831 |
| DNNb | 2 | 64 | 64 | 26.46 | 98.68 | 97.42 | 0.2315 | |
| DNNc | 2 | 32 | 32 | 28.14 | 98.72 | 97.62 | 0.2664 | |
| IonSeq | — | — | — | 22.72 | 99.04 | 98.18 | 0.2111 | |
| Mg2+ | DNNa | 2 | 64 | 64 | 22.85 | 96.38 | 96.67 | 0.1852 |
| DNNb | 2 | 64 | 64 | 32.85 | 98.33 | 97.61 | 0.2291 | |
| DNNc | 4 | 64 | 32 | 34.82 | 98.52 | 97.83 | 0.2565 | |
| IonSeq | — | — | — | 5.57 | 99.98 | 99.49 | 0.1825 |
Note: DNNa, is the prediction result of without optimization of hyper-parameters and without adding disorder value and propensity factor; DNNb, is the prediction result of without optimization of hyper-parameters and adding disorder value and propensity factor; DNNc, is the prediction result of optimization of hyper-parameters and adding disorder value and propensity factor; IonSeq is data obtained from Reference (Hu et al., 2016b).
FIGURE 4The results of 5-fold cross-validation of Ca2+ (A) and Cu2+ (B) ligands. Note: The abscissa is the four evaluation indexes, and the ordinate is the value of the evaluation index. The ordinate is the value of the evaluation index. The blue bar represents the prediction results of the basic feature parameters, the yellow bar represents the prediction results of 1+propensity factor, the green bar represents the prediction results of 1+disorder value, and the red bar represents the prediction results of 2+ disorder value.
Value range of hyper-parameters.
| Hyper-parameters | Value range |
|---|---|
| Hidden layers | 1,2,3,4,5,6,7,8 |
| Hidden layer nodes | 2,4,8,16,32,64,128 |
| Batch size | 2,4,8,16,32,64,128 |
FIGURE 5Curve of MCC value and Sn value of Ca2+ ligands with hyper-parameters. Note: The abscissas of (A-C) represent three hyperparameters, respectively. The ordinate is the value of MCC and Sn; MCC and Sn are the evaluation index.