| Literature DB >> 23023127 |
Qiangfeng Cliff Zhang1, Donald Petrey, Lei Deng, Li Qiang, Yu Shi, Chan Aye Thu, Brygida Bisikirska, Celine Lefebvre, Domenico Accili, Tony Hunter, Tom Maniatis, Andrea Califano, Barry Honig.
Abstract
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms. Much of our present knowledge derives from high-throughput techniques such as the yeast two-hybrid assay and affinity purification, as well as from manual curation of experiments on individual systems. A variety of computational approaches based, for example, on sequence homology, gene co-expression and phylogenetic profiles, have also been developed for the genome-wide inference of protein-protein interactions (PPIs). Yet comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, termed PrePPI, which combines structural information with other functional clues, is comparable in accuracy to high-throughput experiments, yielding over 30,000 high-confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of considerable biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23023127 PMCID: PMC3482288 DOI: 10.1038/nature11503
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Predicting protein-protein interactions using PrePPI
Given a pair of query proteins that potentially interact (QA, QB), representative structures for the individual subunits (MA, MB) are taken from the PDB, where available, or from homology model databases. For each subunit we find both close and remote structural neighbors. A “template” for the interaction exists whenever a PDB or PQS structure contains a pair of interacting chains (e.g. NA1-NB3) that are structural neighbors of MA and MB, respectively. A model is constructed by superposing the individual subunits, MA and MB, on their corresponding structural neighbors, NA1 and NB3. We assign five empirical structure-based scores to each interaction model (Figure S1) and then calculate a likelihood for each model to represent a true interaction by combining these scores using a Bayesian Network (Figure S2) trained on the HC and the N interaction reference sets. We finally combine the structure-derived score (SM) with non-structural evidence associated with the query proteins (e.g., co-expression, functional similarity) using a naïve Bayesian classifier.
Figure 2ROC curve (A) and Venn diagram (B) for PrePPI predictions and high-throughput (HT) experiments for yeast
HT experiments are labeled with the first author of the relevant publication (Table S4). The number of interactions in each set is given after the set label in the Venn diagram.
Figure 3Models for the PPI formed between (A) PKD1 and PKCε, and (B) EF1δ and VHL using homology models and remote structural relationships
The same template complex of ubiquitin-conjugating enzyme E2D 3 and ubiquitin (PDB code: 2fuh A and B chain, shown in blue and red respectively) was used in both cases. The structures of the PH domain of PKD1 and the GNE domain of EF1δ (shown in green and purple) are homology models from ModBase; the structure of a C1 domain of PKCε (yellow) is a homology model from SkyBase; the structure of VHL (cyan) is from PDB (1lm8 V chain). In each case, the relevant homology models are structurally superimposed on one of the two templates in the E2-ubiqutin complex.