| Literature DB >> 15345047 |
Abstract
BACKGROUND: Protein-interaction maps are powerful tools for suggesting the cellular functions of genes. Although large-scale protein-interaction maps have been generated for several invertebrate species, projects of a similar scale have not yet been described for any mammal. Because many physical interactions are conserved between species, it should be possible to infer information about human protein interactions (and hence protein function) using model organism protein-interaction datasets.Entities:
Mesh:
Year: 2004 PMID: 15345047 PMCID: PMC522870 DOI: 10.1186/gb-2004-5-9-r63
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
The number and accuracy of human protein interactions predicted by different model organism protein-interaction datasets
| Data source | Predicted human interactions | Interactions sharing GO terms | |
| Number | % | ||
| All | 71,496 | 12,724 | 24.9 |
| Yeast | 55,231 | 10,727 | 26.2 |
| Fly | 12,059 | 1,404 | 19.0 |
| Worm | 4,494 | 753 | 24.4 |
| All core | 11,487 | 3,133 | 38.1 |
| Core yeast | 6,061 | 2,146 | 45.4 |
| Core fly | 2,889 | 488 | 27.8 |
| Core worm | 2,701 | 597 | 32.3 |
| Two species | 288 | 154 | 74.8 |
| Two species (core) | 160 | 95 | 88.0 |
| Two methods | 2,166 | 829 | 60.6 |
| Random pairs | 71,496 | 6,053 | 14.6 |
The table lists the total number of interactions predicted by each interaction dataset, and the number of these interactions that connect proteins that share at least one GO term (at level 3 or deeper in the GO hierarchy). The percentages are relative to the total number of non-self interactions where both proteins have at least one GO annotation. All, all predicted human protein interactions; Yeast/worm/fly, interactions predicted by the yeast, worm or fly interaction maps; All core, all interactions predicted by the high-confidence subsets of each model organism interaction map (see Materials and methods); Two species, interactions predicted by more than one model organism interaction map; Two species (core), interactions predicted by the high-confidence subset of interactions from more than one model organism; Two methods, interactions predicted by data derived from more than one different interaction assay; Random pairs, the data for a randomly generated interaction network.
Figure 1Sources of predicted human protein interactions. (a) The number of human protein interactions predicted by the interaction maps from each model organism. (b) The number of human protein interactions predicted by the core higher-confidence interactions from each organism. As explained in the text, core interactions are those that reconfirmed when retested (worm), or had an interaction score of greater than 0.5 (fly) or were identified more than once in a single assay (yeast, worm).
Figure 2Filtering interaction datasets to improve their accuracy. (a) The percentages of interactions sharing GO terms at various depths in the GO hierarchy are compared for interactions predicted by the high-confidence interactions from each model organism (core yeast, core worm and core fly), as well as for the complete datasets from each organism (all yeast, all worm, all fly). For comparison, the percentage of shared GO terms is shown for a randomly generated network of the same size as the complete human network (random pairs). The x-axis indicates the depth in the GO hierarchy being considered, and the y-value the percentage of interaction partners (with known GO annotations) that share GO annotations at this depth or deeper. (b) The percentages of interactions sharing GO terms at different levels in the GO hierarchy are compared for interactions predicted by core interactions in two or more species (two species (core)), by interactions in the complete datasets of two or more species (two species), for interactions predicted by more than one experimental method in yeast (two methods), by any core interaction (all core), by any interaction (all), or by a randomly generated interaction network of the same size as the complete human interaction network (random pairs). All values shown are the percentage of non-self interactions between pairs of proteins that both have at least one associated GO term at the indicated depth in the GO hierarchy.
The number of interactions, genes, novel genes and disease genes in the complete and core human interaction networks
| Network | Interactions | Genes | Novel genes | Disease genes |
| Complete | 71,496 | 6,231 | 1,482 | 448 |
| Core | 11,487 | 3,872 | 864 | 292 |
The complete network consists of all human protein interactions predicted by model organism protein-interaction datasets. The core network consists of all the human interactions predicted by the high-confidence subsets of each interaction network (see Materials and methods). Novel genes are defined as those without GO annotations. Disease genes are defined by the OMIM database [25], available from Ensembl [16].
The approximate accuracy and coverage of GO terms predicted by the core and complete interaction networks
| Number of interactors with GO term | Core data | Complete data | ||
| Accuracy | Coverage | Accuracy | Coverage | |
| 1+ | 8 | 26 | 3 | 35 |
| 2+ | 22 | 11 | 8 | 19 |
| 3+ | 30 | 7 | 11 | 14 |
| 4+ | 36 | 5 | 15 | 11 |
| 5+ | 42 | 4 | 18 | 8 |
| 6+ | 45 | 3 | 20 | 7 |
The approximate accuracy and coverage of GO term predictions were calculated for every gene in the core or complete interaction networks with at least one known GO term. The GO terms of a gene are predicted using the GO terms of any of its interaction partners (1+), or GO terms shared by at least two to six of its interaction partners (2+ to 6+). Accuracy is calculated as the number of correctly predicted GO terms divided by the total number of predicted GO terms. Coverage is calculated as the number of correctly predicted GO terms divided by the total number of known GO terms associated with each gene. These values are similar for GO annotations at different levels of the GO hierarchy (see Additional data file 3).
Sources of model organism protein-interaction data
| Dataset | Interactions | Type | Reference | URL |
| Fly | 20,020 | Two-hybrid | [5] | [26] |
| Worm | 4,605 | Two-hybrid | [6] | [27] |
| Yeast | 78,391 | Total | [11] | [11] |
| 5,125 | Two-hybrid | |||
| 49,313 | Complex purification | |||
| 886 | Genetic | |||
| 23,844 (23,399) |
The table lists the total number of interactions contained in each model organism dataset, together with the method used to identify interactions, the publication reference, and the website (URL) from which the interaction dataset was obtained. For each dataset, the non-redundant number of unique interactions between unambiguously identified proteins is shown. For the yeast interactions, the total number of interactions is shown, as well as the number of interactions identified using each detection method. In silico only are interactions only predicted by in silico methods without any confirmation from the experimental datasets.