| Literature DB >> 18390576 |
Yanzhi Guo1, Lezheng Yu, Zhining Wen, Menglong Li.
Abstract
Compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11,474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18390576 PMCID: PMC2396404 DOI: 10.1093/nar/gkn159
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The comparative results of the prediction performance of the method based on different negative data sets, respectively, using AC with lg of 25 amino acids
| Negative data set | Psub | Prcp | 1-let | 2-let | 3-let |
|---|---|---|---|---|---|
| Sensitivity (%) | 85.22 | 41.76 | 79.29 | 69.81 | 60.74 |
| Precision (%) | 87.83 | 62.64 | 82.67 | 85.14 | 80.15 |
| Accuracy (%) | 86.23 ± 1.95 | 58.42 ± 1.68 | 79.25 ± 7.80 | 77.30 ± 12.38 | 70.25 ± 10.40 |
Psub is the negative data set of non-interacting pairs of non-co-localized proteins; Prcp is the negative data set derived from the method by Shen et al. (26). The three negative data sets, 1-let, 2-let and 3-let are obtained by shuffling the protein sequences with k-let counts, k = 1, 2, 3.
Figure 1.The average prediction accuracy of the method with AC of different lgs respectively.
The prediction results of the test sets based on the negative data set Psub and lg of 30 amino acids
| Test set | TP | FN | TN | FP | Sensitivity (%) | Precision (%) | Accuracy (%) | |
|---|---|---|---|---|---|---|---|---|
| ACC | 1 | 2096 | 282 | 2226 | 152 | 88.14 | 93.24 | 90.87 |
| 2 | 2282 | 96 | 1741 | 637 | 95.96 | 78.18 | 84.59 | |
| 3 | 2023 | 355 | 2291 | 87 | 85.07 | 95.88 | 90.71 | |
| 4 | 2181 | 197 | 2099 | 279 | 91.72 | 88.66 | 89.99 | |
| 5 | 2052 | 267 | 2194 | 184 | 88.77 | 91.98 | 90.52 | |
| Average | 2138 | 240 | 2110 | 268 | 89.93 | 88.87 | 89.33 ± 2.67 | |
| AC | 1 | 2161 | 217 | 1944 | 434 | 90.87 | 83.28 | 86.31 |
| 2 | 2215 | 163 | 1890 | 488 | 93.15 | 81.95 | 86.31 | |
| 3 | 2062 | 316 | 2153 | 225 | 86.71 | 90.16 | 88.63 | |
| 4 | 1890 | 488 | 2221 | 157 | 79.48 | 92.33 | 86.44 | |
| 5 | 2052 | 326 | 2185 | 193 | 86.29 | 91.40 | 89.10 | |
| Average | 2076 | 312 | 2079 | 299 | 87.30 | 87.82 | 87.36 ± 1.38 |
TP, true positive; FP, false positive; TN, true negative; FN, false negative; Psub is the negative data set of non-interacting pairs of non-co-localized proteins.