| Literature DB >> 19656385 |
Kevin Y Yip1, Philip M Kim, Drew McDermott, Mark Gerstein.
Abstract
BACKGROUND: Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19656385 PMCID: PMC2734556 DOI: 10.1186/1471-2105-10-241
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic illustration of multi-level learning concepts. (a) The three levels of interactions. Top: the PDB structure 1piw of the homo-dime r yeast. NADP-dependent alcohol dehydrogenase 6. Middle: each chain contains two conserved Pfam domain instances, PF00107 (inner) and PF08240 (outer). The interaction interface is at PF00107. Bottom: two pairs of residues predicted by iPfam to interact: 283 (yellow) with 287 (cyan), and 285 (purple) with 285. (b) The three information flow architectures. i: independent levels, ii: unidirectional flow (illustrated by download flow), iii: bidirectional flow. (c) Coupling mechanisms for passing information from one level to another. 1: passing training information to expand the training set of the next level, 2: passing predictions as an additional feature of the next level, 3: passing predictions to expand the training set of the next level.
Data features at the protein level.
| Feature | Feature of | Data type | Kernel |
| COG (version 7) phylogenetic profiles [ | Proteins | Binary vectors | Linear |
| Sub-cellular localization [ | Proteins | Binary vectors | Linear |
| Cell cycle gene expression [ | Proteins | Real vectors | Correlation (linear after standardization) |
| Environment response gene expression [ | Proteins | Real vectors | Correlation (linear after standardization) |
| Yeast two-hybrid [ | Protein pairs | Unweighted graph | Diffusion ( |
| TAP-MS [ | Protein pairs | Weighted graph | Diffusion ( |
Data features at the domain level.
| Feature | Feature of | Data type | Kernel |
| Phylogenetic tree correlations [ | Domain family pairs | Real matrix | Empirical kernel map [ |
| In all species, number of proteins containing an in stance of the domain family | Domain families | Integers | Polynomial (d = 3) |
| In all species, number of proteins containing domain instances only from the family | Domain families | Integers | Polynomial (d = 3) |
| Number of domain instances of parent protein | Domain instances | Integers | Polynomial (d = 3) |
| Fraction of non-yeast interacting protein pairs contain ing instances of the two domains respectively are mediated by the domain instances* | Domain family pairs | Real matrix | Constant shift embedding [ |
| Fraction of protein pairs containing instances of the two domains respectively are known to be interacting in the PPI training set* | Domain family pairs | Real matrix | Constant shift embedding |
*: These two features were used with the unidirectional and bidirectional flow architectures only since they involve information about the training set of the protein level.
Data features at the residue level.
| Feature | Feature of | Data type | Kernel |
| PSI-BLAST profiles | Residues and neighbors | Vectors of real vectors | Summation of linear |
| Predicted secondary structures | Residues and neighbors | Vectors of real vectors | Summation of linear |
| Predicted solvent accessible surface areas | Residues and neighbors | Vectors of real numbers | Summation of circular |
Prediction accuracies (AUC) of the three levels with different information flow architectures and training levels.
| Independent levels | Unidirectional flow | Bidirectional flow | ||||||
| Level | PD | PR | DR | PD | PR | DR | PDR | |
| Proteins | 0.7153 | 0.7205 | 0.7227 | |||||
| Domains | 0.5214 | 0.5854 | 0.6796 | 0.6986 | ||||
| Residues | 0.5675 | 0.5296 | 0.5128 | 0.6581 | 0.6182 | |||
Figure 2Receiver operator characteristic (ROC) curves of protein interaction predictions with different frameworks and training levels.
Figure 3Receiver operator characteristic (ROC) curves of domain interaction predictions with different frameworks and training levels.
Figure 4Receiver operator characteristic (ROC) curves of residue interaction predictions with different frameworks and training levels.