| Literature DB >> 32977371 |
Yanfen Lyu1, Xinqi Gong1,2.
Abstract
Study of interface residue pairs is important for understanding the interactions between monomers inside a trimer protein-protein complex. We developed a two-layer support vector machine (SVM) ensemble-classifier that considers physicochemical and geometric properties of amino acids and the influence of surrounding amino acids. Different descriptors and different combinations may give different prediction results. We propose feature combination engineering based on correlation coefficients and F-values. The accuracy of our method is 65.38% in independent test set, indicating biological significance. Our predictions are consistent with the experimental results. It shows the effectiveness and reliability of our method to predict interface residue pairs of protein trimers.Entities:
Keywords: a two-layer SVM ensemble-classifier; feature combination engineering; trimer protein–protein complexes
Mesh:
Substances:
Year: 2020 PMID: 32977371 PMCID: PMC7582526 DOI: 10.3390/molecules25194353
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Key steps of the method.
Detailed information of the training set and testing set.
| Data Set Name | PDB Code | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training Set | 1A12 | 1AHS | 1AWI | 1B77 | 1BGX | 1CJD | 1CUN | 1DKG | 1EER | 1EL6 | 1F6F | 1FNS | 1FPO |
| 1G2X | 1HWG | 1IDP | 1IK9 | 1J5S | 1JPS | 1JRH | 1KI9 | 1KKE | 1L5A | 1LW1 | 2ADV | 2AZE | |
| 2B2Y | 2B4I | 2BSD | 2CU5 | 2DJ6 | 2E2A | 2E4M | 2FB5 | 2FM8 | 2FVH | 2FZ1 | 2GDG | 2GMI | |
| 2I15 | 2P90 | 2PBQ | 3CC0 | 3EMF | 3F5C | 3G65 | 3GI9 | 3N4G | 3NAP | 3O2D | 3R1G | 3VA2 | |
| Testing Set | 1OSP | 1OY3 | 1P32 | 1Q5X | 1QB3 | 1S7O | 1SG2 | 1STZ | 1SY6 | 1W9Z | 1WDJ | 1YNB | 1ZA7 |
| 2IG8 | 2IUM | 2IY0 | 2IZW | 2MS2 | 2R3U | 2WR5 | 3DLI | 3FFD | 3M6N | 3OWT | 3P5J | 3QKS | |
Five physicochemical properties for the 20 amino acids.
| Amino Acid | Φ1 | Φ2 | Φ3 | Φ4 | Φ5 | Φ6 | Φ7 |
|---|---|---|---|---|---|---|---|
| A | 0.62 | 0.046 | 8.1 | −1.302 | 1.57 | 0.17 | 0.50 |
| C | 0.29 | 0.128 | 5.5 | 0.465 | −1.02 | −0.24 | −0.02 |
| D | −0.9 | 0.105 | 13 | 0.302 | −0.259 | 1.23 | 3.64 |
| E | −0.74 | 0.151 | 12.3 | −1.453 | 0.113 | 2.02 | 3.63 |
| F | 1.19 | 0.29 | 5.2 | −0.59 | −0.397 | −1.13 | −1.71 |
| G | 0.48 | 0 | 9 | 1.652 | 1.045 | 0.01 | 1.15 |
| H | −0.4 | 0.23 | 10.4 | −0.417 | −1.474 | 0.96 | 2.33 |
| I | 1.38 | 0.186 | 5.2 | −0.547 | 0.393 | −0.31 | −1.12 |
| K | −1.5 | 0.219 | 11.3 | −0.561 | −0.277 | 0.99 | 2.80 |
| L | 1.06 | 0.186 | 4.9 | −0.987 | 1.266 | −0.56 | −1.25 |
| M | 0.64 | 0.221 | 5.7 | −1.524 | −1.005 | −0.23 | −0.67 |
| N | −0.78 | 0.134 | 11.6 | 0.828 | −0.169 | 0.42 | 0.85 |
| P | 0.12 | 0.131 | 8 | 2.081 | 0.421 | 0.45 | 0.14 |
| Q | −0.85 | 0.18 | 10.5 | −0.179 | −0.503 | 0.58 | 0.77 |
| R | −2.53 | 0.291 | 10.5 | −0.055 | 0.44 | 0.81 | 1.81 |
| S | −0.18 | 0.062 | 9.2 | 1.399 | 0.67 | 0.13 | 0.46 |
| T | −0.05 | 0.108 | 8 | 0.326 | 0.908 | 0.14 | 0.25 |
| V | 1.08 | 0.14 | 5.9 | −0.279 | 1.242 | 0.07 | −0.46 |
| W | 0.81 | 0.409 | 5.4 | 0.009 | −2.128 | −1.85 | −2.09 |
| Y | 0.26 | 0.298 | 6.2 | 0.83 | −0.838 | −0.94 | −0.71 |
We used Φ1, Φ2, Φ3, Φ4, Φ5, Φ6, and Φ7 to represent the hydrophobicity 1, polarizability, polarity, secondary structure, codon diversity, hydrophobicity 2, and hydrophobicity 3 for the 20 amino acids.
The five geometric features and their calculation tools.
| Features | Abbreviation | Software or Researchers |
|---|---|---|
| Accessible surface area | ASA | Naccess V2.1.1 |
| Relative accessible surface area | RASA | Naccess V2.1.1 |
| Exterior contact area | ECA | Qcontacts |
| Interior contact area | ICA | Qcontacts |
| Exterior void area | EVA | NACCES V2.1.1, Qcontacts |
Forty-eight characteristics to describe a residue.
| A Residue of Protein Monomer Characteristics | |||||
|---|---|---|---|---|---|
| ASA_A | Φ4 | AAFIPF1(5) | AABIPF2(4) | AAFIPF4(5) | AAFIPF5(3) |
| RASA_A | Φ5 | AABIPF1(1) | AABIPF3(1) | AABIPF4(1) | AAFIPF5(4) |
| ECA_A | Φ6 | AABIPF1(2) | AABIPF3(2) | AABIPF4(2) | AAFIPF5(5) |
| ICA_A | Φ7 | AABIPF1(3) | AABIPF3(5) | AABIPF4(3) | AABIPF5(1) |
| EVA_A | AAFIPF1(1) | AABIPF1(4) | AAFIPF4(1) | AABIPF4(4) | AABIPF5(2) |
| Φ1 | AAFIPF1(2) | AABIPF1(5) | AAFIPF4(2) | AABIPF4(5) | AABIPF5(3) |
| Φ2 | AAFIPF1(3) | AAFIPF2(2) | AAFIPF4(3) | AAFIPF5(1) | AABIPF5(4) |
| Φ3 | AAFIPF1(4) | AABIPF2(2) | AAFIPF4(4) | AAFIPF5(2) | AABIPF5(5) |
We used Φ1, Φ2, Φ3, Φ4, Φ5, Φ6, and Φ7 to represent basic first-order sequence features.We use 48 characteristics to describe a residue in the second set of feature vector.
Figure 2Flow chart of the two-layer support vector machine (SVM) ensemble-classifier method.
Two evaluation indexes of the testing set prediction results.
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
| |
| 1osp | 2 | 3 | 2 | 3 | 2 | 5 | 3 | 10 |
| 1oy3 | 3 | 4 | 3 | 5 | 3 | 6 | 3 | 9 |
| 1p32 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 4 |
| 1q5x | 3 | 5 | 3 | 5 | 3 | 6 | 3 | 10 |
| 1qb3 | 3 | 4 | 3 | 4 | 3 | 4 | 3 | 9 |
| 1s7o | 3 | 7 | 3 | 9 | 3 | 11 | 3 | 14 |
| 1sg2 | 2 | 3 | 3 | 5 | 3 | 7 | 3 | 9 |
| 1stz | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1sy6 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 5 |
| 1w9z | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 7 |
| 1wdj | 2 | 2 | 2 | 4 | 2 | 6 | 2 | 8 |
| 1ynb | 2 | 2 | 2 | 3 | 2 | 4 | 2 | 4 |
| 1za7 | 1 | 4 | 2 | 6 | 3 | 9 | 3 | 12 |
| 2ig8 | 1 | 1 | 1 | 2 | 2 | 3 | 3 | 5 |
| 2ium | 1 | 2 | 3 | 4 | 3 | 6 | 3 | 9 |
| 2iy0 | 3 | 10 | 3 | 14 | 3 | 16 | 3 | 19 |
| 2izw | 1 | 2 | 2 | 3 | 2 | 4 | 3 | 9 |
| 2ms2 | 3 | 5 | 3 | 8 | 3 | 8 | 3 | 10 |
| 2r3u | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 2 |
| 2wr5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3dli | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 6 |
| 3ffd | 3 | 4 | 3 | 4 | 3 | 8 | 3 | 10 |
| 3m6n | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 3owt | 1 | 3 | 2 | 4 | 2 | 5 | 2 | 8 |
| 3p5j | 0 | 0 | 2 | 2 | 2 | 3 | 3 | 4 |
| 3qks | 0 | 0 | 1 | 1 | 2 | 2 | 3 | 6 |
of the testing set prediction results
| t |
|
|
|
| |
|---|---|---|---|---|---|
| z | |||||
|
| 34.62% | 42.31% | 46.15% | 65.38% | |
|
| 53.85% | 73.08% | 80.77% | 84.62% | |
|
| 76.92% | 84.62% | 88.46% | 92.31% | |
Figure 3Experimental three-dimensional structure of the 3ffd and 1s7o trimer protein complexes. Figure (a) and Figure (b) are the three-dimensional structure of 3ffd and 1s7o protein trimer. We label three protein monomers with pink, blue, and green. The number of markers in the black circle indicates the correct predicted interface residue pair position on the two protein monomers.
Two evaluation indexes of the testing set 2 prediction results.
| Protein Name |
|
|
| |||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| 1osp | 3 | 5 | 3 | 6 | 3 | 10 |
| 1oy3 | 1 | 3 | 1 | 4 | 1 | 5 |
| 1p32 | 3 | 6 | 3 | 6 | 3 | 10 |
| 1q5x | 3 | 4 | 3 | 5 | 3 | 9 |
| 1qb3 | 3 | 11 | 3 | 13 | 3 | 17 |
| 1s7o | 3 | 7 | 3 | 9 | 3 | 10 |
| 1sg2 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1stz | 2 | 2 | 3 | 3 | 3 | 5 |
| 1sy6 | 2 | 4 | 2 | 5 | 2 | 7 |
| 1w9z | 1 | 2 | 2 | 3 | 2 | 4 |
| 1wdj | 3 | 8 | 3 | 11 | 3 | 12 |
| 1ynb | 1 | 1 | 1 | 2 | 2 | 4 |
| 1za7 | 3 | 6 | 3 | 6 | 3 | 8 |
| 2ig8 | 3 | 12 | 3 | 14 | 3 | 21 |
| 2ium | 2 | 3 | 2 | 4 | 3 | 9 |
| 2iy0 | 3 | 6 | 3 | 8 | 3 | 9 |
| 2izw | 0 | 0 | 0 | 0 | 1 | 1 |
| 2ms2 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2r3u | 3 | 3 | 3 | 4 | 3 | 7 |
| 2wr5 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3dli | 2 | 5 | 2 | 6 | 2 | 7 |
| 3ffd | 0 | 0 | 2 | 2 | 2 | 3 |
| 3m6n | 2 | 3 | 3 | 5 | 3 | 8 |
of the testing set and testing set 2 prediction results.
| t |
|
|
| ||||
|---|---|---|---|---|---|---|---|
| z | Result1 | Result2 | Result1 | Result2 | Result1 | Result2 | |
|
| 42.31% | 43.48% | 46.15% | 52.17% | 65.38% | 56.52% | |
|
| 73.08% | 65.22% | 80.77% | 73.91% | 84.62% | 78.26% | |
|
| 84.62% | 82.61% | 88.46% | 86.96% | 92.31% | 91.30% | |
a. Result1 represents the testing set result accuracy. b. Result2 represents the accuracy of the result in the testing set 2.
All residue pairs and positive interface residue pairs in each protein–protein interface.
| Protein–Protein Interface | Positive Residue Pairs | All Residue Pairs |
|---|---|---|
| 1osp_H_L | 160 | 46,652 |
| 1osp_H_O | 30 | 54,718 |
| 1osp_L_O | 26 | 53,714 |
| 1oy3_B_C | 53 | 15,120 |
| 1oy3_B_D | 38 | 24,640 |
| 1oy3_C_D | 86 | 29,700 |
| 1p32_A_B | 93 | 31,122 |
| 1p32_A_C | 95 | 32,032 |
| 1p32_B_C | 94 | 30,096 |
| 1q5x_A_B | 42 | 25,238 |
| 1q5x_A_C | 37 | 24,800 |
| 1q5x_B_C | 41 | 24,490 |
| 1qb3_A_B | 7 | 13,447 |
| 1qb3_A_C | 172 | 12,317 |
| 1qb3_B_C | 15 | 12,971 |
| 1s7o_A_B | 12 | 11,130 |
| 1s7o_A_C | 21 | 11,448 |
| 1s7o_B_C | 78 | 11,340 |
| 1sg2_A_B | 44 | 15,369 |
| 1sg2_A_C | 72 | 20,022 |
| 1sg2_B_C | 50 | 15,478 |
| 1stz_A_B | 53 | 100,453 |
| 1stz_A_C | 135 | 100,453 |
| 1stz_B_C | 45 | 96,721 |
| 1sy6_A_H | 37 | 36,792 |
| 1sy6_A_L | 15 | 35,784 |
| 1sy6_H_L | 164 | 46,647 |
| 1w9z_A_B | 150 | 66,049 |
| 1w9z_A_C | 149 | 65,278 |
| 1w9z_B_C | 152 | 65,278 |
| 1wdj_A_B | 153 | 28,272 |
| 1wdj_A_C | 45 | 34,596 |
| 1wdj_B_C | 9 | 28,272 |
| 1ynb_A_B | 197 | 27,889 |
| 1ynb_A_C | 25 | 27,889 |
| 1ynb_B_C | 84 | 27,889 |
| 1za7_A_B | 43 | 24,915 |
| 1za7_A_C | 31 | 24,915 |
| 1za7_B_C | 33 | 27,225 |
| 2ig8_A_B | 98 | 20,306 |
| 2ig8_A_C | 104 | 20,164 |
| 2ig8_B_C | 102 | 20,306 |
| 2ium_A_B | 86 | 44,521 |
| 2ium_A_C | 88 | 44,521 |
| 2ium_B_C | 88 | 44,521 |
| 2iy0_A_B | 90 | 17,176 |
| 2iy0_A_C | 28 | 35,256 |
| 2iy0_B_C | 13 | 11,856 |
| 2izw_A_B | 88 | 31,862 |
| 2izw_A_C | 80 | 37,024 |
| 2izw_B_C | 90 | 37,232 |
| 2ms2_A_B | 39 | 16,641 |
| 2ms2_A_C | 36 | 16,641 |
| 2ms2_B_C | 41 | 16,641 |
| 2r3u_A_B | 88 | 39,390 |
| 2r3u_A_C | 84 | 37,370 |
| 2r3u_B_C | 97 | 36,075 |
| 2wr5_A_B | 169 | 235,225 |
| 2wr5_A_C | 176 | 235,225 |
| 2wr5_B_C | 162 | 235,225 |
| 3dli_A_B | 86 | 48,841 |
| 3dli_A_C | 86 | 48,841 |
| 3dli_B_C | 84 | 48,841 |
| 3ffd_A_B | 150 | 45,570 |
| 3ffd_A_P | 40 | 3780 |
| 3ffd_B_P | 26 | 3906 |
| 3m6n_A_B | 425 | 70,752 |
| 3m6n_A_C | 68 | 69,696 |
| 3m6n_B_C | 74 | 70,752 |
| 3owt_A_B | 8 | 21,904 |
| 3owt_A_C | 59 | 2960 |
| 3owt_B_C | 31 | 2960 |
| 3p5j_A_B | 67 | 44,802 |
| 3p5j_A_C | 108 | 30,392 |
| 3p5j_B_C | 214 | 19,836 |
| 3qks_A_B | 267 | 35,621 |
| 3qks_A_C | 29 | 4179 |
| 3qks_B_C | 24 | 3759 |
| mean | 83.0641 | 40,919.63 |
Protein–protein interface column denotes the protein–protein interface of those two chains in a protein trimer, such as 1osp_ H_ L is the interaction interface between H chain and L chain of 1osp protein trimer.
Comparison of our method with that of random results.
| Accuracy Rate |
|
|
|
|
|---|---|---|---|---|
|
| 0.20288% | 0.20290% | 0.20292% | 0.20298% |
|
| 76.92% | 84.62% | 88.46% | 92.31% |
|
| 0.001500% | 0.001648% | 0.001797% | 0.00002094% |
|
| 53.85% | 73.08% | 80.77% | 84.62% |
|
| 0.0000008351% | 0.0000008354% | 0.0000008357% | 0.0000008363% |
|
| 34.62% | 42.31% | 46.15% | 65.38% |