| Literature DB >> 22058650 |
Hafida Bouziane1, Belhadri Messabih, Abdallah Chouarfia.
Abstract
Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier.Entities:
Keywords: Multi-class Support Vector Machines (M-SVMs); Position-Specific Scoring Matrix (PSSM) profiles; ensemble method; feed-forward Neural Networks; k-Nearest Neighbors; protein secondary structure prediction
Year: 2011 PMID: 22058650 PMCID: PMC3204938 DOI: 10.4137/EBO.S7931
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Performance comparison of the six individual classifiers for RS126 dataset.
| M-SVM | 78.11 | 77.36 | 65.80 | 84.46 | 0.724 | 0.624 | 0.610 | 73.0 | 65.7 | 70.2 | 70.1 |
| M-SVM | 77.79 | 77.17 | 64.85 | 84.35 | 0.710 | 0.621 | 0.611 | 72.9 | 66.0 | 71.4 | 70.9 |
| C-SVC | 72.58 | 66.91 | 55.30 | 84.55 | 0.630 | 0.522 | 0.534 | 64.0 | 61.1 | 66.5 | 64.1 |
| RBFNN | 77.05 | 75.92 | 62.49 | 84.72 | 0.709 | 0.601 | 0.595 | 71.0 | 62.0 | 68.1 | 67.1 |
| MLP | 74.42 | 73.25 | 61.24 | 81.47 | 0.662 | 0.554 | 0.562 | 64.0 | 60.9 | 63.9 | 62.7 |
| k-NN | 72.69 | 66.88 | 50.00 | 87.33 | 0.605 | 0.521 | 0.557 | 64.0 | 53.2 | 98.2 | 63.2 |
Performance comparison of the six individual classifiers for RS126 dataset after applying the filter to the predictions.
| M-SVM | 78.35 | 76.56 | 64.93 | 85.92 | 0.735 | 0.624 | 0.612 | 76.9 | 66.6 | 71.3 | 73.4 |
| M-SVM | 77.94 | 76.49 | 63.75 | 85.64 | 0.719 | 0.619 | 0.611 | 76.0 | 66.9 | 72.2 | 73.7 |
| C-SVC | 72.67 | 66.12 | 54.24 | 85.78 | 0.642 | 0521 | 0.530 | 67.0 | 61.9 | 67.4 | 67.5 |
| RBFNN | 77.21 | 74.78 | 62.16 | 85.98 | 0.718 | 0.604 | 0.593 | 75.0 | 65.8 | 70.6 | 72.3 |
| MLP | 74.83 | 72.50 | 60.68 | 83.11 | 0.678 | 0.555 | 0.564 | 71.9 | 63.6 | 68.4 | 70.1 |
| k-NN | 72.99 | 66.57 | 48.03 | 89.10 | 0.633 | 0.518 | 0.551 | 71.3 | 52.9 | 67.4 | 66.7 |
Performance comparison of the six individual classifiers for CB513 dataset.
| M-SVM | 76.08 | 76.95 | 64.48 | 81.51 | 0.704 | 0.614 | 0.568 | 71.4 | 65.1 | 67.9 | 69.7 |
| M-SVM | 76.11 | 76.97 | 64.96 | 81.33 | 0.706 | 0.614 | 0.568 | 71.0 | 65.3 | 67.8 | 69.5 |
| C-SVC | 73.32 | 72.31 | 55.78 | 83.43 | 0.673 | 0.564 | 0.523 | 67.3 | 58.1 | 64.8 | 65.2 |
| RBFNN | 76.04 | 77.39 | 62.46 | 82.15 | 0.700 | 0.611 | 0.572 | 70.7 | 64.8 | 69.1 | 70.1 |
| MLP | 72.97 | 74.15 | 62.09 | 77.77 | 0.645 | 0.560 | 0.532 | 62.8 | 62.5 | 64.7 | 63.3 |
| k-NN | 72.82 | 81.06 | 57.29 | 74.39 | 0.642 | 0.549 | 0.536 | 72.4 | 59.6 | 64.0 | 66.9 |
Performance comparison of the six individual classifiers for CB513 dataset after applaying the filter to the predicitions.
| M-SVM | 76.34 | 76.13 | 64.36 | 82.85 | 0.714 | 0.619 | 0.569 | 73.1 | 66.5 | 69.4 | 72.7 |
| M-SVM | 76.34 | 76.16 | 64.79 | 82.61 | 0.715 | 0.618 | 0.569 | 73.1 | 66.6 | 69.4 | 72.8 |
| C-SVC | 73.55 | 71.35 | 55.34 | 84.97 | 0.681 | 0568 | 0.526 | 70.3 | 59.2 | 65.5 | 68.2 |
| RBFNN | 76.30 | 76.56 | 62.04 | 83.64 | 0.710 | 0.615 | 0.573 | 73.1 | 65.9 | 70.0 | 72.9 |
| MLP | 73.69 | 73.87 | 62.19 | 79.63 | 0.666 | 0.570 | 0.536 | 71.0 | 65.4 | 67.9 | 70.7 |
| k-NN | 73.18 | 80.40 | 56.74 | 76.05 | 0.656 | 0.551 | 0.533 | 73.5 | 60.8 | 66.6 | 70.1 |
Performance comparison of the three combination schemes for RS126 dataset.
| SMV | M-SVM | 77.84 | 79.07 | 66.63 | 82.36 | 0.711 | 0.625 | 0.612 | 74.0 | 66.5 | 70.1 | 70.7 |
| M-SVM | 78.18 | 77.38 | 65.10 | 84.93 | 0.722 | 0.625 | 0.615 | 72.7 | 66.0 | 71.2 | 70.7 | |
| M-SVM | 77.94 | 78.72 | 66.51 | 82.85 | 0.716 | 0.624 | 0.611 | 73.7 | 66.4 | 69.1 | 69.9 | |
| M-SVM | 77.53 | 76.06 | 64.02 | 84.93 | 0.710 | 0.614 | 0.605 | 71.7 | 65.4 | 70.1 | 69.6 | |
| M-SVM | 78.08 | 77.59 | 65.24 | 84.52 | 0.715 | 0.626 | 0.615 | 74.0 | 66.5 | 70.4 | 70.8 | |
| IMV | M-SVM | 78.11 | 77.36 | 65.80 | 84.46 | 0.724 | 0.624 | 0.610 | 73.0 | 65.7 | 70.2 | 70.1 |
| M-SVM | 78.15 | 77.24 | 65.86 | 84.60 | 0.726 | 0.625 | 0.611 | 73.1 | 65.7 | 70.8 | 70.5 | |
| M-SVM | 78.15 | 77.31 | 65.84 | 84.56 | 0.726 | 0.624 | 0.611 | 73.2 | 65.7 | 70.7 | 70.5 | |
| M-SVM | 77.93 | 76.27 | 65.84 | 84.78 | 0.720 | 0.624 | 0.607 | 73.3 | 65.9 | 70.8 | 70.9 | |
| M-SVM | 78.21 | 76.78 | 65.91 | 85.00 | 0.726 | 0.626 | 0.612 | 73.9 | 65.8 | 71.1 | 71.1 | |
| WMV | M-SVM | 78.11 | 77.36 | 65.80 | 84.46 | 0.724 | 0.624 | 0.610 | 73.8 | 65.7 | 70.2 | 70.1 |
| M-SVM | 78.24 | 77.24 | 65.47 | 84.98 | 0.726 | 0.626 | 0.614 | 73.1 | 66.0 | 71.7 | 71.1 | |
| M-SVM | 78.24 | 77.27 | 65.45 | 84.97 | 0.726 | 0.626 | 0.613 | 73.2 | 66.1 | 71.8 | 71.1 | |
| M-SVM | 77.80 | 76.27 | 64.39 | 85.20 | 0.720 | 0.616 | 0.607 | 73.2 | 65.5 | 70.8 | 70.5 | |
| M-SVM | 78.41 | 76.72 | 65.14 | 85.85 | 0.726 | 0.630 | 0.618 | 73.8 | 66.6 | 72.0 | 71.8 | |
Performance comparison of the three combination schemes for CB513 dataset.
| SMV | M-SVM | 76.34 | 76.16 | 64.79 | 82.61 | 0.715 | 0.618 | 0.569 | 73.1 | 66.6 | 69.4 | 72.8 |
| M-SVM | 76.19 | 77.0 | 64.59 | 81.69 | 0.706 | 0.616 | 0.570 | 71.4 | 65.3 | 68.1 | 69.9 | |
| M-SVM | 76.27 | 78.64 | 65.78 | 79.92 | 0.706 | 0.618 | 0.571 | 72.5 | 66.0 | 67.6 | 70.2 | |
| M-SVM | 76.34 | 77.82 | 63.51 | 82.58 | 0.709 | 0.615 | 0.575 | 71.8 | 64.9 | 68.9 | 70.4 | |
| M-SVM | 76.39 | 78.64 | 64.18 | 81.03 | 0.706 | 0.618 | 0.575 | 73.1 | 65.5 | 68.5 | 70.7 | |
| IMV | M-SVM | 76.11 | 76.97 | 64.96 | 81.33 | 0.706 | 0.614 | 0.568 | 71.0 | 65.3 | 67.8 | 69.5 |
| M-SVM | 76.15 | 76.92 | 64.97 | 81.45 | 0.707 | 0.614 | 0.569 | 71.4 | 65.3 | 67.8 | 69.8 | |
| M-SVM | 76.19 | 77.47 | 62.99 | 82.14 | 0.705 | 0.612 | 0.573 | 71.8 | 65.2 | 69.4 | 70.9 | |
| M-SVM | 76.27 | 76.73 | 65.02 | 81.86 | 0.710 | 0.615 | 0.571 | 71.7 | 65.4 | 68.4 | 70.2 | |
| M-SVM | 76.26 | 76.94 | 64.93 | 81.71 | 0.709 | 0.616 | 0.571 | 71.7 | 65.4 | 68.3 | 70.1 | |
| WMV | M-SVM | 76.11 | 76.97 | 64.96 | 81.33 | 0.706 | 0.614 | 0.568 | 71.0 | 65.3 | 67.8 | 69.5 |
| M-SVM | 76.20 | 76.92 | 64.68 | 81.71 | 0.707 | 0.615 | 0.570 | 71.4 | 65.3 | 68.2 | 70.0 | |
| M-SVM | 76.21 | 76.92 | 64.70 | 81.73 | 0.707 | 0.615 | 0.570 | 71.4 | 65.3 | 68.2 | 70.1 | |
| M-SVM | 76.33 | 76.71 | 63.74 | 82.68 | 0.710 | 0.614 | 0.574 | 71.8 | 64.9 | 69.1 | 70.5 | |
| M-SVM | 76.41 | 77.58 | 63.98 | 82.04 | 0.708 | 0.618 | 0.575 | 72.5 | 66.0 | 69.2 | 71.2 | |
Performance comparison of the three combination schemes after applying the filter to the predictions for RS126 dataset.
| SMV | M-SVM | 78.14 | 78.46 | 65.91 | 83.75 | 0.721 | 0.626 | 0.613 | 77.6 | 67.7 | 71.9 | 74.2 |
| M-SVM | 78.29 | 76.64 | 63.96 | 86.20 | 0.731 | 0.623 | 0.613 | 76.6 | 66.6 | 72.0 | 73.7 | |
| M-SVM | 78.19 | 77.89 | 65.99 | 84.18 | 0.727 | 0.625 | 0.611 | 76.8 | 67.8 | 71.8 | 73.9 | |
| M-SVM | 77.67 | 75.21 | 63.32 | 86.11 | 0.719 | 0.615 | 0.603 | 75.3 | 67.2 | 70.9 | 72.9 | |
| M-SVM | 78.34 | 77.03 | 64.37 | 85.85 | 0.728 | 0.625 | 0.615 | 76.8 | 67.5 | 71.8 | 73.8 | |
| IMV | M-SVM | 78.34 | 77.24 | 64.93 | 85.44 | 0.733 | 0.624 | 0.612 | 77.2 | 66.6 | 71.0 | 73.2 |
| M-SVM | 78.32 | 77.04 | 64.99 | 85.50 | 0.731 | 0.625 | 0.612 | 76.9 | 66.6 | 71.4 | 73.3 | |
| M-SVM | 78.29 | 77.06 | 65.03 | 85.42 | 0.732 | 0.625 | 0.611 | 76.7 | 66.6 | 71.1 | 73.1 | |
| M-SVM | 77.93 | 75.59 | 64.93 | 85.66 | 0.721 | 0.625 | 0.606 | 75.8 | 66.7 | 70.7 | 72.9 | |
| M-SVM | 78.41 | 76.36 | 65.16 | 86.06 | 0.731 | 0.627 | 0.615 | 76.7 | 66.8 | 71.0 | 73.3 | |
| WMV | M-SVM | 78.35 | 76.56 | 64.93 | 85.92 | 0.735 | 0.624 | 0.612 | 76.9 | 66.6 | 71.3 | 73.4 |
| M-SVM | 78.25 | 76.49 | 64.06 | 86.17 | 0.731 | 0.622 | 0.612 | 76.4 | 66.6 | 71.9 | 73.6 | |
| M-SVM | 78.26 | 76.52 | 64.12 | 86.14 | 0.731 | 0.622 | 0.612 | 76.5 | 66.7 | 72.0 | 73.7 | |
| M-SVM | 77.67 | 75.04 | 63.32 | 86.23 | 0.720 | 0.615 | 0.603 | 75.2 | 67.2 | 70.9 | 72.9 | |
| M-SVM | 78.44 | 75.85 | 63.77 | 87.12 | 0.731 | 0.626 | 0.617 | 76.3 | 67.3 | 72.0 | 73.7 | |
Performance comparison of the three combination schemes after applying the filter to the predictions for CB513 dataset.
| SMV | M-SVM | 76.65 | 77.80 | 63.87 | 82.49 | 0.715 | 0.623 | 0.576 | 74.1 | 66.6 | 69.9 | 73.3 |
| M-SVM | 76.59 | 77.46 | 64.35 | 82.36 | 0.715 | 0.622 | 0.574 | 73.9 | 66.7 | 70.1 | 73.3 | |
| M-SVM | 76.54 | 77.78 | 65.62 | 81.32 | 0.715 | 0.622 | 0.572 | 74.0 | 67.3 | 69.6 | 73.3 | |
| M-SVM | 76.59 | 77.14 | 63.54 | 83.05 | 0.715 | 0.621 | 0.575 | 73.4 | 66.6 | 69.9 | 73.2 | |
| M-SVM | 76.65 | 77.80 | 63.87 | 82.49 | 0.715 | 0.623 | 0.576 | 74.1 | 66.6 | 69.9 | 73.3 | |
| IMV | M-SVM | 76.33 | 76.75 | 64.79 | 82.10 | 0.713 | 0.618 | 0.569 | 73.3 | 66.6 | 69.3 | 72.7 |
| M-SVM | 76.33 | 76.62 | 64.82 | 82.17 | 0.712 | 0.618 | 0.569 | 73.1 | 66.6 | 69.4 | 72.7 | |
| M-SVM | 76.43 | 77.42 | 62.56 | 82.96 | 0.712 | 0.615 | 0.575 | 73.6 | 66.1 | 70.2 | 73.1 | |
| M-SVM | 76.51 | 78.17 | 62.61 | 82.53 | 0.713 | 0.616 | 0.576 | 73.9 | 66.2 | 70.1 | 73.2 | |
| M-SVM | 76.51 | 76.82 | 64.69 | 82.52 | 0.715 | 0.620 | 0.573 | 73.4 | 66.5 | 69.7 | 73.0 | |
| WMV | M-SVM | 76.41 | 77.58 | 63.98 | 82.04 | 0.708 | 0.618 | 0.575 | 72.5 | 66.0 | 69.2 | 71.2 |
| M-SVM | 76.59 | 77.32 | 64.48 | 82.41 | 0.716 | 0.622 | 0.574 | 73.8 | 66.8 | 70.1 | 73.3 | |
| M-SVM | 76.57 | 77.32 | 64.48 | 82.38 | 0.715 | 0.621 | 0.574 | 73.8 | 66.8 | 70.1 | 73.3 | |
| M-SVM | 76.64 | 77.92 | 63.61 | 82.51 | 0.716 | 0.621 | 0.576 | 73.9 | 66.7 | 70.1 | 73.4 | |
| M-SVM | 76.69 | 77.40 | 63.59 | 83.06 | 0.717 | 0.622 | 0.577 | 74.1 | 66.5 | 70.3 | 73.4 | |
Figure 1.The Q3 (A) and SOV (B) scores for the three voting schemes on the RS126 dataset. QH/E/C and SOVH/E/C are respectively the predicted Q and SOV scores for each conformational state (helix, strand and coil).
Figure 2.The Q3 (A) and SOV (B) scores for the three voting schemes on the CB513 dataset. QH/E/C and SOVH/E/C are respectively the predicted Q and SOV scores for each conformational state (helix, strand and coil).
Figure 3.Comparison of prediction accuracies (y axis) between the three combination schemes and the individual classifiers (x axis) with and without applying the filter to the predictions on both RS126 and CB513 datasets.