| Literature DB >> 15833142 |
Ashwini Patil1, Haruki Nakamura.
Abstract
BACKGROUND: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15833142 PMCID: PMC1127019 DOI: 10.1186/1471-2105-6-100
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Likelihood ratios for genomic features.
Likelihood ratio, sensitivity and specificity for the combination of different genomic features
| Genomic Feature(s) | Likelihood ratio (L) | Sensitivity (%) | Specificity (%) |
| d + g + h | 170.052 | 12.3 | 99.4 |
| d + g | 66.031 | 14.5 | 99.3 |
| d + h | 50.463 | 14.7 | 99.2 |
| d | 19.595 | 14.8 | 99.2 |
| g + h | 8.678 | 44.1 | 94.0 |
| g | 3.370 | 86.7 | 74.3 |
| h | 2.575 | 89.7 | 62.9 |
| none | 0.163 | 100 | 0 |
d: interacting Pfam domains; g: similar GO annotations; h: homologous interactions. More than one genomic features are indicated by listing the features separated by a '+' sign.
Figure 2ROC curve for the combination of genomic features using 10-fold cross validations. The dotted line shows the empirical ROC curve, while the solid line shows the fitted ROC curve (obtained using JROCFIT). Each point on the ROC curve corresponds to sensitivity and specificity for one or a combination of more than one genomic features. d: interacting Pfam domains; g: similar GO annotations; h: homologous interactions; none: no genomic features. More than one genomic features are indicated by listing the features separated by a '+' sign.
Number of interactions in different ranges of likelihood ratios for high-throughput data sets of various species
| Likelihood ratio (L) | ||||
| 0 – 1 | 541 | 16655 | 2925 | 5534 |
| 1 – 10 | 733 | 3119 | 852 | 5810 |
| 10 – 100 | 362 | 367 | 139 | 824 |
| 100 – 1000 | 50 | 260 | 99 | 506 |
| Total | 1686 | 20401 | 4015 | 12674 |
All interactions with a Likelihood ratio > 1 are predicted as true.
Figure 3Percentage of interactions predicted true across different high-throughput data sets.
Figure 4Percentage of interactions predicted true in high and low confidence interactions across different high-throughput data sets.
Figure 5Some low confidence interactions predicted to be true by our method and confirmed by other publications. The Likelihood ratio for each interaction is indicated. Interactions with a Likelihood ratio greater than 100 are shown with a solid line, while those with a Likelihood ratio less than 10 are shown with a dashed line. (A) Interactions between proteins co-regulating the alternative splicing of Dscam exon 4 in D. menalogaster. (B) Interactions between proteins in the Lsm1-7 complex in S. cerevisiae confirmed by similar interactions found in H. sapiens.
Yeast high-throughput data sets
| Data set | Interactions | Type |
| Uetz | 1438 | Y2H |
| Ito | 4449 | Y2H |
| Gavin | 3757 | Co-IP (spoke model) |
| Ho | 3618 | Co-IP (spoke model) |
| Total unique interactions | 12674 | Binary |
Y2H: Yeast two-hybrid; Co-IP: Mass Spectrometry of coimmunoprecipitated complexes, converted to binary interactions using the spoke model.
Sources of Gold Standard Positive yeast protein interaction data
| Data set | Interactions | Type |
| MIPS interactions | 574 | Y2H |
| MIPS complexes | 490 | Co-IP (matrix model) |
| Small scale interactions from DIP and IntAct | 110 | Y2H |
| More than one high-throughput data sets | 305 | Y2H ([3, 4]) Co-IP (spoke model) [5, 6] |
| Total | 1479 | Binary |
Y2H: Yeast two-hybrid; Co-IP: Mass Spectrometry of coimmunoprecipitated complexes, expanded by spoke or matrix model as indicated.
Correlation coefficients of the genomic features for 100 random interactions
| Genomic Features | r | t(98) | p-value |
| Homologous Interactions – Similar GO annotations | -0.12605 | -1.2579 | 0.2401 |
| Homologous Interactions – Interacting Pfam Domains | 0.022501 | 0.222802 | 0.8826 |
| Similar GO annotations – Interacting Pfam Domains | -0.01817 | -0.17988 | 0.2868 |
r: Pearson's correlation coefficient; t(98): t-test with 98 degrees of freedom; p-value: probability. Since the p-value for all t-tests is greater than the significance level of 0.05, the null hypothesis, that the genomic features are not correlated, is accepted.