| Literature DB >> 18586722 |
Yanjun Qi1, Fernanda Balem, Christos Faloutsos, Judith Klein-Seetharaman, Ziv Bar-Joseph.
Abstract
MOTIVATION: Protein complexes integrate multiple gene products to coordinate many biological functions. Given a graph representing pairwise protein interaction data one can search for subgraphs representing protein complexes. Previous methods for performing such search relied on the assumption that complexes form a clique in that graph. While this assumption is true for some complexes, it does not hold for many others. New algorithms are required in order to recover complexes with other types of topological structure.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18586722 PMCID: PMC2718642 DOI: 10.1093/bioinformatics/btn164
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Projection of selected yeast MIPS complexes on our PPI graph (weight thresholded). (a) Example of a clique. All nodes are connected by edges. (b) Example of a star-shape, also referred to as the spoke model. (c) Example of a linear shape. (d). Example for a hybrid shape where small cliques are connected by a common node.
Features for representing protein complex properties
| No | Group | Reference | Graph type | Num. features |
|---|---|---|---|---|
| 1 | Node size | Chakrabarti | Binary | 1 |
| 2 | Graph density | Chakrabarti | Binary | 1 |
| 3 | Degree statistics | Barabasi | Binary | 4 |
| 4 | Edge weight statistics | Chakrabarti | Weight | 4 |
| 5 | Density wrt. weight cutoffs | Chakrabarti | Weight | 7 |
| 6 | Degree correlation statistics | Stelzl | Binary | 3 |
| 7 | Clustering coefficient statistics | Barabasi | Binary | 3 |
| 8 | Topological coefficient statistics | Stelzl | Binary | 3 |
| 9 | First Eigenvalues | Chakrabarti | Binary | 3 |
| 10 | Protein weight/size statistics | Cherry | 4 |
Each row represents a group of similar features. We use 33 features divided into 10 groups. See supporting website for more details. The second column lists the name of the feature group and the third column provides the references. The fourth column specifies which type of graph is used to derive the property.
Fig. 2.A Bayesian probabilistic model for scoring a subgraph in our framework. The root node ‘Label’ is the binary indicator for complexes (1 if this subgraph is a complex, 0 otherwise). The second level node ‘nodeSize’ represents the number of nodes in the subgraph. The remaining nodes are all located on the third level and each represents a feature property described in Table 1.
Protein complex identification algorithm
| - Weighted PPI matrix; |
| - A training set of complexes and non-complexes; |
| - Discovered list of protein complexes; |
| - Extract property features from positive and negative training examples; |
| - Discretize the continuous features; |
| - Calculate the BN MLE parameters for different features properties on the multinomial distribution; |
| - Starting from the seeding subgraphs, apply simulated annealing search to expand and identify candidate complexes; |
| - Output subgraphs with ratio scores exceeding a certain threshold |
Fig. 3.Histogram of number of proteins in each of the three reference sets: ‘MIPS’, ‘TAP06’ and ‘Non-complexes’. Note that all resemble ‘power law’ distributions. Horizontal axis is the number of proteins. Vertical axis is the number of subgraphs (complexes).
Fig. 4.Reference examples’ distribution when projected with the first three principle components after applying SVD to the features.
Performance comparison between our algorithm (‘SCI-BN’), SVM with the same set of features (‘SCI-SVM’), Clique based method using only the density feature (‘Density’) and the ‘MCODE’ methods (Bader et al., 2003b) (‘MCODE’)
| Train | Test | Method | Precision | Recall | F1 |
|---|---|---|---|---|---|
| MIPS | TAP06 | Density | 0.217 | 0.409 | 0.283 |
| MIPS | TAP06 | MCODE | 0.293 | 0.088 | 0.135 |
| MIPS | TAP06 | SCI-SVM | 0.247 | 0.377 | 0.298 |
| MIPS | TAP06 | SCI-BN | 0.312 | 0.489 | 0.381 |
| TAP06 | MIPS | Density | 0.143 | 0.515 | 0.224 |
| TAP06 | MIPS | MCODE | 0.146 | 0.063 | 0.088 |
| TAP06 | MIPS | SCI-SVM | 0.176 | 0.379 | 0.240 |
| TAP06 | MIPS | SCI-BN | 0.219 | 0.537 | 0.312 |
Evaluation is based on precision, recall and the F1 measure. Experiments carried out with either MIPS as positive training set and TAP06 as test set, or vice versa.
Fig. 5.Projection of predicted complexes on our weighted PPI graph. The edge weights are thresholded and color coded. See color legend (top right corner bar) for edge weights. Descriptions for each predicted complex are provided in the ‘Validation’ section.