| Literature DB >> 17963500 |
Jingchun Sun1, Yan Sun, Guohui Ding, Qi Liu, Chuan Wang, Youyu He, Tieliu Shi, Yixue Li, Zhongming Zhao.
Abstract
BACKGROUND: Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there have been few integration studies for PPI prediction; one failed to yield appreciable improvement of prediction and the others did not conduct performance comparison. It remains unclear whether an integration of multiple genomic features can improve the PPI prediction and, if it can, how to integrate these features.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17963500 PMCID: PMC2238723 DOI: 10.1186/1471-2105-8-414
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Protein-protein interactions predicted by four methods
| Method | Number of PPIs | Number of proteins involved | Average degree | Number of PPIs covered by two methods | |||
| PPM | GCM | GFM | GNM | ||||
| PPM | 45,437 | 2,124 | 21.4 | ||||
| GCM | 2,437 | 2,102 | 1.2 | 449 | |||
| GFM | 6,728 | 1,254 | 5.4 | 1,532 | 134 | ||
| GNM | 3,595 | 3,901 | 0.9 | 300 | 1,155 | 124 | |
| Totala | 54,911 | 4,040 | 13.6 | ||||
aNumber of non-redundant PPIs predicted by the four methods.
Figure 1Comparison of PPI prediction by the four methods using the KEGG, EcoCyc, and DIP datasets. Performance of the prediction was measured by AC value.
Figure 2Comparison of PPI prediction by four individual methods and InPrePPI. The combined protein pairs in the KEGG, EcoCyc, and DIP datasets were used in the four methods and InPrePPI_high dataset was used in InPrePPI.
Figure 3PPI prediction by InPrePPI with different k values.
Accuracy and coverage in three integrated methods
| KEGG | EcoCyc | DIP | |||||
| Number of PPIs | Accuracy (%) | Coverage (%) | Accuracy (%) | Coverage (%) | Accuracy (%) | Coverage (%) | |
| Joint observation method (JOM) | |||||||
| JOM4a | 55 | 78.18 | 0.10 | 32.73 | 2.65 | 25.45 | 0.44 |
| JOM≥3 | 298 | 60.74 | 0.41 | 32.89 | 14.45 | 12.42 | 1.17 |
| JOM≥2 | 2,933 | 38.70 | 2.58 | 9.00 | 38.94 | 2.35 | 2.18 |
| JOM≥1 | 54,911 | 8.79 | 10.98 | 0.85 | 69.17 | 0.49 | 8.58 |
| STRING | |||||||
| Highb | 2,279 | 24.62 | 1.28 | 13.43 | 42.33 | 3.20 | 2.31 |
| Medium | 4,458 | 5.74 | 0.58 | 1.39 | 7.08 | 0.31 | 0.44 |
| Low | 9,970 | 2.18 | 0.49 | 0.17 | 2.21 | 0.11 | 0.35 |
| InPrePPI | |||||||
| Highc | 1,194 | 45.73 | 1.24 | 18.84 | 33.19 | 4.69 | 1.77 |
| Medium | 5,403 | 27.93 | 3.43 | 2.24 | 17.85 | 0.91 | 1.55 |
| Low | 48,314 | 5.73 | 6.30 | 0.25 | 18.14 | 0.34 | 5.25 |
aThe predicted PPIs covered by at least one (JOM≥1), two (JOM≥2), three (JOM≥3) or four (JOM4) methods.
bThe predicted PPIs in the high, medium and low confidence in STRING [22].
cThe predicted PPIs in the high, medium and low confidence in InPrePPI (see Methods).
Figure 4Comparison of PPI prediction by InPrePPI and STRING using the KEGG, EcoCyc, and DIP datasets. The data were separated into three groups with the high, medium, and low confidence.
Figure 5Comparison of PPI prediction by InPrePPI and STRING using the COG annotation data. A predicted pair is treated as a true positive when its two proteins are within the same COG well-characterized category.
Summary of the positive and negative control data
| Category | Number of protein pairs | Overlap | Source | ||
| KEGG | EcoCyc | DIP | |||
| KEGG | 43,937 | KEGG [25] | |||
| EcoCyc | 678 | 506 | EcoCyc (8.0) [26] | ||
| DIP | 3,159 | 141 | 54 | DIP (Ecoli20060116) [27] | |
| Positivesa | 47,105 | KEGG + EcoCyc + DIP | |||
| Negatives | 376,874 | KO [54] | |||
aThe non-redundant pairs in the KEGG, EcoCyc, and DIP datasets. There is no overlap between negatives and positives.