| Literature DB >> 27610389 |
Chi-Hua Tung1, Chi-Wei Chen2, Ren-Chao Guo2, Hui-Fuang Ng3, Yen-Wei Chu4.
Abstract
Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27610389 PMCID: PMC5005774 DOI: 10.1155/2016/9480276
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Performance of using different features with SVM in 10-fold cross-validation for monomer classification on the Oli8444 dataset.
| Monomeric protein | ||||
|---|---|---|---|---|
| Sn (%) | Sp (%) | ACC (%) | MCC | |
| Block | ||||
| Monomer | 79.07 | 78.75 | 78.91 | 0.579 |
|
| ||||
| FunD | ||||
| Monomer | 93.68 | 57.80 | 75.75 | 0.552 |
|
| ||||
| PseASA | ||||
| Monomer | 70.15 | 56.33 | 63.24 | 0.268 |
Performance of using different features with SVM in 10-fold cross-validation for homo- and heterooligomer classification on the Oli8444 dataset.
| Homooligomer | Heterooligomer | |||||||
|---|---|---|---|---|---|---|---|---|
| Sn (%) | Sp (%) | ACC (%) | MCC | Sn (%) | Sp (%) | ACC (%) | MCC | |
| Block | ||||||||
| Dimer | 83.18 | 82.83 | 83.00 | 0.660 | 66.30 | 97.00 | 86.73 |
|
| Trimer | 89.25 | 99.76 | 95.48 | 0.909 | 83.65 | 97.50 | 91.93 |
|
| Tetramer | 75.32 | 97.53 | 90.12 | 0.776 | 85.03 | 98.03 | 92.82 |
|
| Pentamer | 100.00 | 96.67 | 98.57 | 0.973 | 83.33 | 95.00 | 89.17 |
|
| Hexamer | 89.03 | 98.52 | 94.27 | 0.887 | 82.50 | 97.00 | 90.42 |
|
| Octamer | 95.71 | 72.62 | 84.17 | 0.709 | 90.00 | 85.33 | 87.67 |
|
| Decamer | 91.67 | 100.00 | 95.83 | 0.928 | 100.00 | 100.00 | 100.00 |
|
| Dodecamer | 95.50 | 98.00 | 96.75 | 0.941 | 86.00 | 94.67 | 90.33 |
|
|
| ||||||||
| Overall | 92.27 | 0.848 | 91.13 |
| ||||
|
| ||||||||
| FunD | ||||||||
| Dimer | 52.49 | 90.26 | 71.37 | 0.462 | 74.74 | 90.16 | 85.01 |
|
| Trimer | 93.73 | 87.83 | 90.22 | 0.806 | 69.94 | 88.25 | 80.87 |
|
| Tetramer | 60.64 | 96.61 | 84.62 | 0.647 | 71.96 | 95.08 | 85.79 |
|
| Pentamer | 75.00 | 86.67 | 80.48 | 0.649 | 53.33 | 100.00 | 76.67 |
|
| Hexamer | 64.94 | 100.00 | 84.27 | 0.712 | 85.75 | 83.89 | 84.86 |
|
| Octamer | 48.81 | 100.00 | 74.40 | 0.566 | 44.67 | 100.00 | 72.33 |
|
| Decamer | 63.33 | 100.00 | 81.67 | 0.691 | 45.00 | 100.00 | 72.50 |
|
| Dodecamer | 63.00 | 100.00 | 81.50 | 0.680 | 69.00 | 100.00 | 84.50 |
|
|
| ||||||||
| Overall | 81.07 | 0.652 | 80.32 |
| ||||
|
| ||||||||
| PseASA | ||||||||
| Dimer | 66.95 | 46.74 | 56.85 | 0.140 | 12.62 | 93.07 | 66.18 |
|
| Trimer | 39.16 | 85.93 | 66.93 | 0.288 | 36.08 | 82.75 | 63.98 |
|
| Tetramer | 30.11 | 91.52 | 71.05 | 0.280 | 33.37 | 83.49 | 63.34 |
|
| Pentamer | 64.17 | 70.00 | 67.62 | 0.343 | 86.67 | 66.67 | 76.67 |
|
| Hexamer | 65.76 | 60.37 | 62.78 | 0.262 | 61.63 | 80.37 | 71.85 |
|
| Octamer | 66.90 | 60.48 | 63.69 | 0.285 | 86.00 | 71.50 | 78.75 |
|
| Decamer | 81.67 | 93.33 | 87.50 | 0.785 | 90.00 | 85.00 | 87.50 |
|
| Dodecamer | 73.50 | 68.00 | 70.75 | 0.429 | 66.83 | 85.50 | 76.17 |
|
|
| ||||||||
| Overall | 68.40 | 0.352 | 73.05 |
| ||||
Figure 1Flowchart of the three-layer architecture of classifiers.
Performance comparison of model combination with SVM in 10-fold cross-validation for oligomer classification in the second layer.
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
| ACC (%) | MCC | ACC (%) | MCC | ACC (%) | MCC | ACC (%) | MCC | |
| Monomer | 75.75 | 0.552 | 78.91 | 0.579 | 82.64 | 0.663 | 82.64 | 0.663 |
|
| ||||||||
| Homooligomer | ||||||||
| Dimer | 71.01 | 0.456 | 83.00 | 0.660 | 83.00 | 0.660 | 83.00 | 0.660 |
| Trimer | 90.22 | 0.806 | 95.48 | 0.909 | 95.48 | 0.909 | 95.47 | 0.907 |
| Tetramer | 84.21 | 0.638 | 90.12 | 0.776 | 93.41 | 0.854 | 93.41 | 0.854 |
| Pentamer | 80.48 | 0.649 | 98.57 | 0.973 | 98.57 | 0.973 | 98.57 | 0.973 |
| Hexamer | 84.27 | 0.712 | 94.27 | 0.887 | 96.94 | 0.939 | 96.94 | 0.939 |
| Octamer | 74.40 | 0.566 | 84.17 | 0.709 | 85.60 | 0.743 | 84.17 | 0.709 |
| Decamer | 85.83 | 0.759 | 95.83 | 0.928 | 98.33 | 0.971 | 98.33 | 0.971 |
| Dodecamer | 81.50 | 0.680 | 96.75 | 0.941 | 99.00 | 0.982 | 99.00 | 0.982 |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
|
| ||||||||
| Heterooligomer | ||||||||
| Dimer | 85.01 | 0.659 | 86.73 | 0.697 | 88.89 | 0.767 | 88.89 | 0.767 |
| Trimer | 80.87 | 0.600 | 91.93 | 0.835 | 91.93 | 0.835 | 92.07 | 0.836 |
| Tetramer | 85.79 | 0.706 | 92.82 | 0.853 | 94.48 | 0.889 | 94.48 | 0.889 |
| Pentamer | 77.50 | 0.577 | 89.17 | 0.799 | 93.33 | 0.886 | 93.33 | 0.886 |
| Hexamer | 84.86 | 0.698 | 90.42 | 0.812 | 90.42 | 0.812 | 93.72 | 0.875 |
| Octamer | 72.67 | 0.484 | 87.67 | 0.782 | 87.67 | 0.782 | 87.67 | 0.782 |
| Decamer | 92.50 | 0.873 | 97.50 | 0.958 | 100.00 | 1.000 | 97.50 | 0.958 |
| Dodecamer | 84.50 | 0.733 | 90.33 | 0.823 | 95.33 | 0.916 | 95.33 | 0.916 |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
Performance comparison of classification algorithms in 10-fold cross-validation and self-consistency test.
| Algorithms | Test method | |||||
|---|---|---|---|---|---|---|
| Cross-validation | Self-consistency | |||||
| CCI (%) | Kappa |
| CCI (%) | Kappa |
| |
| Bayes | ||||||
| Bayes net | 64.80 | 0.5017 | 0.608 | 65.02 | 0.5053 | 0.611 |
| Naïve Bayes | 39.91 | 0 | 0.228 | 39.91 | 0 | 0.228 |
|
| ||||||
| Functions | ||||||
| LibSVM | 67.40 | 0.5288 | 0.616 | 68.60 | 0.5464 | 0.632 |
| Logistic | 67.28 | 0.5285 | 0.615 | 67.57 | 0.5326 | 0.619 |
| Multilayer perceptron | 64.01 | 0.4893 | 0.598 | 69.97 | 0.5694 | 0.657 |
|
| ||||||
| Lazy | ||||||
| IB1 | 51.91 | 0.3513 | 0.515 | 87.18 | 0.8218 | 0.869 |
| IBk | 58.45 | 0.4126 | 0.551 | 90.38 | 0.8682 | 0.902 |
| KStar | 62.43 | 0.463 | 0.581 | 88.10 | 0.8362 | 0.877 |
|
| ||||||
| Meta | ||||||
| AdaBoostM1 | 59.80 | 0.3909 | 0.493 | 59.80 | 0.3909 | 0.493 |
| Bagging | 66.27 | 0.5147 | 0.605 | 69.88 | 0.5671 | 0.65 |
|
| ||||||
| Rules | ||||||
| Conjunctive rule | 59.80 | 0.3909 | 0.493 | 59.80 | 0.3909 | 0.493 |
| Decision table | 66.99 | 0.5189 | 0.601 | 67.22 | 0.5218 | 0.601 |
| DTNB | 67.12 | 0.5225 | 0.606 | 67.38 | 0.5268 | 0.61 |
|
| ||||||
| Tree | ||||||
| J48 | 66.45 | 0.5161 | 0.607 | 69.81 | 0.5639 | 0.646 |
| Random forest | 58.91 | 0.4306 | 0.566 | 90.02 | 0.8651 | 0.899 |
| Random tree | 54.65 | 0.3817 | 0.537 | 90.38 | 0.8682 | 0.902 |
CCI is correctly classified instances.
Comparison of results of different functional categories of proteins on QuaBingo and QuatIdent.
| Protein categories | QuaBingo | QuatIdent | ||||||
|---|---|---|---|---|---|---|---|---|
| Sn (%) | Sp (%) | ACC (%) | MCC | Sn (%) | Sp (%) | ACC (%) | MCC | |
| Immunity system | 40.46 | 96.28 | 68.37 | 0.367 | 20.61 | 96.66 | 58.64 | 0.199 |
| Enzyme | 57.21 | 97.33 |
| 0.545 | 38.18 | 97.98 | 68.08 | 0.426 |
| Cell cycle | 44.44 | 96.53 | 70.49 | 0.410 | 14.82 | 97.80 | 56.31 | 0.176 |
| Chaperone | 45.95 | 96.62 | 71.28 | 0.426 | 20.27 | 98.99 | 59.63 | 0.313 |
| Gene regulation | 58.36 | 97.40 |
| 0.558 | 21.75 | 98.19 | 59.97 | 0.276 |
| Transport proteins | 57.80 | 97.36 | 77.58 | 0.552 | 21.67 | 97.86 | 59.77 | 0.258 |
| Single transduction | 59.16 | 97.45 |
| 0.566 | 11.97 | 98.42 | 55.19 | 0.167 |
| Viral protein | 42.73 | 96.42 | 69.57 | 0.391 | 10.00 | 98.75 | 54.38 | 0.156 |
| Membrane protein | 57.81 | 97.36 |
| 0.552 | 16.41 | 98.49 | 57.45 | 0.229 |
| Molecular binding | 63.37 | 97.71 |
| 0.611 | 27.11 | 98.47 | 62.79 | 0.351 |
| Hormone | 36.08 | 96.01 | 66.04 | 0.321 | 28.87 | 97.29 | 63.08 | 0.305 |
| Others | 60.03 | 97.50 | 78.77 | 0.575 | 17.18 | 98.61 | 57.89 | 0.247 |
|
| ||||||||
| Overall | 51.95 | 97.00 | 74.47 | 0.490 | 20.74 | 98.13 | 59.43 | 0.259 |
Top five features of block composition of oligomers.
| Oligomer type | Top 5 features | ||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| Monomer | IPB002225A | IPB002347A | IPB000817A | IPB002347D | IPB013549A |
|
| |||||
| Homooligomer | |||||
| Dimer | IPB000817A | IPB004045 | IPB013572B | IPB001647 | IPB003449A |
| Trimer | IPB007691D | IPB006052A | IPB006056A | IPB006175A | IPB006175B |
| Tetramer | IPB002347D | IPB003560D | IPB002198B | IPB002347B | IPB002347E |
| Pentamer | IPB007334A | IPB001931A | IPB013124E | IPB008681A | IPB012599D |
| Hexamer | IPB001564C | IPB001753C | IPB001980A | IPB001564A | IPB001564B |
| Octamer | IPB001354C | IPB013341B | IPB002682 | IPB001354A | IPB001354B |
| Decamer | IPB000866A | IPB000866B | IPB013740 | IPB003394A | IPB002587G |
| Dodecamer | IPB002177A | IPB002177B | IPB008331B | IPB014035B | IPB007664A |
|
| |||||
| Heterooligomer | |||||
| Dimer | IPB003026B | IPB008386B | IPB000315A | IPB000219A | IPB012565 |
| Trimer | IPB002353B | IPB012565 | IPB003990A | IPB001003B | IPB003026B |
| Tetramer | IPB003026B | IPB012565 | IPB010004A | IPB001664D | IPB002398F |
| Pentamer | IPB001280E | IPB003484D | IPB012420 | IPB004333C | IPB006711D |
| Hexamer | IPB002919A | IPB003038 | IPB008019A | IPB001591A | IPB001762 |
| Octamer | IPB007659A | IPB004977B | IPB006574B | IPB002971G | IPB003539A |
| Decamer | IPB013124E | IPB002662B | IPB003417A | IPB000732A | IPB000817A |
| Dodecamer | IPB002682 | IPB000353B | IPB001003B | IPB003597B | IPB006217A |