| Literature DB >> 25984606 |
Qiaoying Huang1, Zhuhong You2, Xiaofeng Zhang3, Yong Zhou4.
Abstract
With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein-protein interactions (PPIs) research is becoming more and more important. Life activities and the protein-protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction.Entities:
Keywords: protein–protein interactions; reduced amino acid alphabet; weighted sparse representation-based classification
Mesh:
Substances:
Year: 2015 PMID: 25984606 PMCID: PMC4463679 DOI: 10.3390/ijms160510855
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Scheme for reduced amino acid alphabet based on protein blocks method.
| Cluster Profiles | Protein Blocks Method |
|---|---|
| CP(13) | G-IV-FYW-A-L-M-E-QRK-P-ND-HS-T-C |
| CP(11) | G-IV-FYW-A-LM-EQRK-P-ND-HS-T-C |
| CP(9) | G-IV-FYW-ALM-EQRK-P-ND-HS-TC |
| CP(8) | G-IV-FYW-ALM-EQRK-P-ND-HSTC |
| CP(5) | G-IVFYW-ALMEQRK-P-NDHSTC |
The dimension of feature vectors of n-peptide composition with different cluster profiles.
| CP(13) | CP(11) | CP(9) | CP(8) | CP(5) | |
|---|---|---|---|---|---|
| 13 | 11 | 9 | 8 | 5 | |
| 169 | 121 | 81 | 64 | 25 | |
| 2197 | 1331 | 729 | 512 | 125 |
Five-fold cross-validation results of protein–protein interactions (PPIs) prediction. ACC, accuracy; SN, sensitivity; PE, precision; MCC, Matthews correlation coefficient. (The numbers in bold is the best result.).
| Criteria | CP(5) | CP(8) | CP(9) | CP(11) | CP(13) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dimension | 5 | 25 | 125 | 8 | 64 | 512 | 9 | 81 | 729 | 11 | 121 | 1331 | 13 | 169 | 2197 |
| ACC (%) | 71.16 | 82.85 | 87.43 | 77.90 | 87.74 | 79.63 | 88.19 | 90.86 | 81.47 | 89.05 | 90.77 | 82.99 | 89.46 | 90.26 | |
| SN (%) | 71.26 | 84.21 | 89.88 | 78.02 | 90.02 | 79.92 | 90.72 | 94.15 | 81.78 | 91.52 | 83.75 | 92.23 | 93.32 | ||
| PE (%) | 70.94 | 80.86 | 84.38 | 77.70 | 84.90 | 79.14 | 85.10 | 87.13 | 80.99 | 86.08 | 86.91 | 81.88 | 86.19 | 86.71 | |
| MCC (%) | 58.96 | 71.55 | 77.98 | 65.57 | 78.45 | 67.56 | 79.14 | 83.34 | 69.81 | 80.47 | 83.19 | 71.76 | 81.11 | 82.37 | |
Figure 1Receiver Operating Characteristic (ROC) curve.
Prediction results on the H. pylori dataset. (The numbers in bold is the best result.).
| Criteria | CP(5) | CP(8) | CP(9) | CP(11) | CP(13) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dimension | 5 | 25 | 125 | 8 | 64 | 512 | 9 | 81 | 729 | 11 | 121 | 1331 | 13 | 169 | 2197 |
| ACC (%) | 66.63 | 76.34 | 81.43 | 72.89 | 81.68 | 81.90 | 73.19 | 81.79 | 80.81 | 74.43 | 82.60 | 78.50 | 76.19 | 77.29 | |
| SN (%) | 64.76 | 72.45 | 77.67 | 70.67 | 77.90 | 77.56 | 70.44 | 78.39 | 76.27 | 71.29 | 78.77 | 73.41 | 72.61 | 71.69 | |
| PE (%) | 72.65 | 85.12 | 88.47 | 78.18 | 88.51 | 89.94 | 79.89 | 88.03 | 89.49 | 81.67 | 89.41 | 89.72 | 84.03 | 89.75 | |
| MCC (%) | 55.14 | 63.29 | 69.49 | 60.28 | 69.75 | 70.02 | 60.40 | 69.96 | 68.53 | 61.56 | 71.00 | 65.38 | 63.29 | 63.73 | |
Comparisons of different feature extraction methods.
| Criteria | Auto Covariance | Conjoint Triad | Moran Autocorrelation | Geary Autocorrelation | Pseudo-Amino Acid Composition | Our Method |
|---|---|---|---|---|---|---|
| ACC (%) | 88.89 | 89.57 | 88.90 | 85.06 | 86.45 | 90.91 |
| SN (%) | 92.07 | 92.91 | 92.12 | 88.20 | 88.16 | 94.17 |
| PE (%) | 85.12 | 85.68 | 85.08 | 80.97 | 84.23 | 87.22 |
| MCC (%) | 80.19 | 81.26 | 80.21 | 74.51 | 76.56 | 84.43 |
Comparisons of the weighted sparse representation-based classification (WSRC) and support vector machine (SVM). (The numbers in bold is the best result.).
| Criteria | CP(5) | CP(8) | CP(9) | CP(11) | CP(13) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dimension | 5 | 25 | 125 | 8 | 64 | 512 | 9 | 81 | 729 | 11 | 121 | 1331 | 13 | 169 | 2197 |
| ACC (%) | 67.27 | 74.80 | 81.44 | 72.05 | 82.81 | 88.17 | 72.86 | 83.32 | 89.23 | 74.58 | 84.89 | 90.81 | 75.66 | 86.27 | |
| SN (%) | 71.04 | 75.56 | 80.49 | 75.20 | 82.64 | 86.35 | 75.62 | 83.18 | 87.48 | 77.20 | 84.45 | 88.85 | 77.60 | 85.46 | |
| PE (%) | 66.06 | 74.44 | 82.06 | 70.76 | 82.95 | 89.61 | 71.69 | 83.43 | 90.66 | 73.36 | 85.22 | 92.48 | 74.71 | 86.88 | |
| MCC (%) | 55.84 | 62.29 | 69.78 | 59.64 | 71.53 | 79.12 | 60.39 | 72.21 | 80.76 | 62.03 | 74.34 | 83.30 | 63.14 | 76.31 | |
Comparisons of WSRC with other classifiers. (The numbers in bold is the best result.).
| Criteria | WSRC | SRC | SVM | Guo’s Work | Zhou’s Work | Yang’s Work |
|---|---|---|---|---|---|---|
| ACC (%) | 87.46 | 88.17 | 89.33 | 88.56 | 86.15 | |
| SN (%) | 89.93 | 86.55 | 89.93 | 87.37 | 81.03 | |
| PE (%) | 87.22 | 84.39 | 89.61 | 88.87 | 89.50 | |
| MCC (%) | 78.02 | 79.12 | N/A | 77.15 | N/A |