| Literature DB >> 22715412 |
Carmen Sánchez Claros1, Anna Tramontano.
Abstract
Comprehensive protein interaction maps can complement genetic and biochemical experiments and allow the formulation of new hypotheses to be tested in the system of interest. The computational analysis of the maps may help to focus on interesting cases and thereby to appropriately prioritize the validation experiments. We show here that, by automatically comparing and analyzing structurally similar regions of proteins of known structure interacting with a common partner, it is possible to identify mutually exclusive interactions present in the maps with a sensitivity of 70% and a specificity higher than 85% and that, in about three fourth of the correctly identified complexes, we also correctly recognize at least one residue (five on average) belonging to the interaction interface. Given the present and continuously increasing number of proteins of known structure, the requirement of the knowledge of the structure of the interacting proteins does not substantially impact on the coverage of our strategy that can be estimated to be around 25%. We also introduce here the Estrella server that embodies this strategy, is designed for users interested in validating specific hypotheses about the functional role of a protein-protein interaction and it also allows access to pre-computed data for seven organisms.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22715412 PMCID: PMC3370996 DOI: 10.1371/journal.pone.0038765
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data used in the analysis and stored in the Estrella database.
| Number of sub-networks containing: | HS | SC | DM | MM | CE | RN | EC | Total |
| All proteins | 12294 | 6023 | 9570 | 3052 | 4934 | 838 | 695 | 37406 |
| At least three node proteins of known structure. | 5176 | 3971 | 156 | 13 | 65 | 78 | 171 | 9630 |
| At least three non-redundant node proteins of known structure. | 4598 | 3796 | 137 | 46 | 12 | 165 | 63 | 8817 |
| Complexes of known structure involving the hub protein and least three node proteins. | 81 | 62 | 0 | 1 | 0 | 1 | 7 | 152 |
| Mutually exclusive interactions in complexes of known structure involving the hub protein and their node proteins. | 64 | 59 | 0 | 1 | 0 | 1 | 3 | 128 |
HS: Homo sapiens, SC: Saccharomices cerevisiae, DM: Drosophila melanogaster, MM: Mus musculus, CE: Caenorhabditis elegans, RN: Rattus norvegicus, EC: Escherichia Coli.
The last two rows show the data used for validation.
Figure 1Exemplification of the way we compute the statistical parameters.
In the left upper part of the figure we show the experimentally known situation where A1, A2 and A3 interact with the same region of the hub, the interaction of B1, B2 and B3 with the hub is also mutually exclusive, although they bind to a region different from that of the As. C1 binds to a region different from both the A and B binding sites. The example represents a possible set of sub-networks predicted as mutually exclusive by Estrella and the corresponding values for FP, TP, TN, FN, specificity (Sp) and sensitivity (Sn). The overall values for the specificity and sensitivity are computed as the average of the values for each identified cluster. In Cluster 1, the TP are A1, A2 and A3, the TN are B2 and B3, the FP are B1 and C1 and there are no FN. In Cluster 2, the TP are B2 and B3, the TN are A2, A3 and C1, the FP is A1 and the FN is B1. In Cluster 3, the TP are A1 and A3, the TN are B1, B2 and B3, the FP is C1 and the FN is A2. The overall values for the specificity and sensitivity are computed as the average of the values for each identified cluster.
Statistical parameters for the Estrella method applied to the sub-networks where the experimental structures of complexes between the hub protein and at least two partners are available.
| All clusters | First cluster | |
| Correctly identified mutually exclusive node proteins (TP) | 4428 | 260 |
| Incorrectly identified mutually exclusive node proteins (FP) | 878 | 95 |
| Correctly identified non mutually exclusive node proteins (TN) | 5162 | 57 |
| Incorrectly identified non mutually exclusive node proteins (FN) | 1898 | 36 |
| Specificity = 100*TN/(TN+FP) | 85.5 | 63 |
| Sensitivity = 100*TP/(TP+FN) | 70.0 | 88 |
| Positive Predictive value = 100*TP/(TP+FP) | 83.4 | 82 |
| Negative Predictive Value = 100*TN/(TN+FN) | 73.1 | 72 |
| Accuracy = 100* (TP+TN)/(TP+FP+FN+TN) | 77.6 | 79 |
Data are computed as the average of all clusters for each sub-network (first column) and only considering the first ranking clusters (second column).
Number of correctly identified interface residues in the correctly identified complexes.
| All clusters | First ranking cluster | |
| Number of correctly predicted common interfaces complexes | 1739 | 89 |
| Total number of residues at the interface | 34306 | 976 |
| Number of correctly identified interface residues | 9192 | 300 |
| Number of common interfaces where at least one interface residue is correctly identified | 1101 | 67 |
Results of the Estrella procedure applied to sub-networks for which the experimental structure of the complexes is known.
| Clusters | % |
| With more than one missing partner | 8.72 |
| With one missing partner | 40.4 |
| Perfectly defined | 50.5 |
| With one extra partner | 0.23 |
| With more than one extra partner | 0.06 |
Data are shown for all clusters.
Figure 2The output page of Estrella.