| Literature DB >> 21143800 |
Changning Liu1, Jing Li, Yi Zhao.
Abstract
BACKGROUND: Developing effective strategies to reveal modular structures in protein interaction networks is crucial for better understanding of molecular mechanisms of underlying biological processes. In this paper, we propose a new density-based algorithm (ADHOC) for clustering vertices of a protein interaction network using a novel subgraph density measurement.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21143800 PMCID: PMC3005933 DOI: 10.1186/1471-2164-11-S4-S17
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flowchart of the proposed algorithm For a given density level k, all nodes are classified into four types, and then clustered into different modules or inter-module layer according to their types.
Figure 2A simple demonstration of ADHOC method The four type's nodes are marked by different colors. The identified three clusters are circled by dashed lines.
Comparison of ADHOC to Competing Clustering Methods for DIP Yeast PPI Dataset
| Method | Cluster Number | Cluster Size | Discard (%) | GO (-log P-value) | ||
|---|---|---|---|---|---|---|
| MF | BP | CC | ||||
| ADHOC | 50 | 20.56 | 68.05 | 5.18 | 7.44 | 6.38 |
| Maximal Clique | 376 | 4.55 | 80.06 | 3.43 | 4.02 | 2.67 |
| IPCA | 253 | 4.64 | 80.39 | 3.61 | 4.09 | 2.89 |
| DPClus | 90 | 5.27 | 84.49 | 3.91 | 4.50 | 3.44 |
| MCODE | 29 | 23.76 | 75.21 | 4.20 | 5.23 | 4.89 |
| CFinder | 84 | 7.46 | 80.06 | 4.63 | 6.03 | 4.66 |
Cluster Number: the number of clusters identified by each method; Cluster Size: the average number of proteins in each cluster; Discard (%): the percentage of proteins not assigned to any cluster; GO: the average -log p-values (adjusted) of all detected clusters for Gene Ontology (molecular functions (MF), biological process (BP), and cellular component (CC)).
Robustness Analysis of ADHOC
| Noise | Cluster Number | GO MF (-log P-value) | GO BP (-log P-value) | GO CC (-log P-value) |
|---|---|---|---|---|
| 0% | 50 | 5.18 | 7.44 | 6.38 |
| 5% | 44.15 ± 2.28 | 5.19 ± 0.19 | 7.34 ± 0.31 | 6.52 ± 0.35 |
| 10% | 42.49 ± 2.50 | 5.13 ± 0.28 | 7.26 ± 0.33 | 6.50 ± 0.35 |
| 15% | 39.94 ± 2.69 | 5.20 ± 0.28 | 7.31 ± 0.44 | 6.51 ± 0.49 |
| 20% | 37.43 ± 3.40 | 5.28 ± 0.48 | 7.54 ± 0.62 | 6.66 ± 0.53 |
| 25% | 35.13 ± 2.98 | 5.29 ± 0.42 | 7.47 ± 0.54 | 6.59 ± 0.54 |
Noise column represents the percentile of random noise added into DIP Yeast PPI dataset. For each noise percentile, we generated 100 random networks. The numbers in each cell indicate the values of Mean and Standard Variance.
Figure 3The effect of k on clustering The impact of different k value (ranging from 3 to 8 with a step of 0.1) on protein discard rate and Gene Ontology enrichment.
Figure 4The modular structure of the yeast PPI network A) The hierarchical and overlapping modular structure of the yeast PPI network. B) The module of peroxisomal membrane proteins. C) The COMPASS complex and the CPF complex.
Prediction for uncharacterized proteins (ordered by predicted functions)
| Protein | P-value | Predicted Function | Protein | P-value | Predicted Function |
|---|---|---|---|---|---|
| Q12156 | 12.28 | cytoskeleton organization and biogenesis | P16387 | 19.64 | transcription |
| Q05911 | 14.20 | nuclear organization and biogenesis | P16547 | 19.64 | transcription |
| P01097 | 19.56 | precursor metabolites and energy generation | P25659 | 19.64 | transcription |
| O13563 | 25.25 | protein catabolic process | P36139 | 19.64 | transcription |
| P36003 | 25.25 | protein catabolic process | P38301 | 19.64 | transcription |
| P50086 | 25.25 | protein catabolic process | P38352 | 19.64 | transcription |
| P53196 | 25.25 | protein catabolic process | P38717 | 19.64 | transcription |
| Q06665 | 25.25 | protein catabolic process | P38915 | 19.64 | transcription |
| Q05778 | 25.25 | protein catabolic process | P39113 | 19.64 | transcription |
| P39713 | 19.31 | protein catabolic process | P39533 | 19.64 | transcription |
| P42942 | 19.31 | protein catabolic process | P40560 | 19.64 | transcription |
| P53243 | 19.31 | protein catabolic process | P46954 | 19.64 | transcription |
| P53743 | 19.31 | protein catabolic process | P47005 | 19.64 | transcription |
| P53851 | 19.31 | protein catabolic process | P47120 | 19.64 | transcription |
| Q03935 | 19.31 | protein catabolic process | P53116 | 19.64 | transcription |
| Q06512 | 19.31 | protein catabolic process | P53878 | 19.64 | transcription |
| Q08018 | 19.31 | protein catabolic process | Q03899 | 19.64 | transcription |
| P53724 | 14.63 | protein catabolic process | Q04847 | 19.64 | transcription |
| P40462 | 14.08 | ribosome biogenesis and assembly | Q05947 | 19.64 | transcription |
| P43584 | 14.08 | ribosome biogenesis and assembly | Q06479 | 19.64 | transcription |
| P47019 | 14.08 | ribosome biogenesis and assembly | Q06640 | 19.64 | transcription |
| P53163 | 14.08 | ribosome biogenesis and assembly | Q07844 | 19.64 | transcription |
| Q02608 | 14.08 | ribosome biogenesis and assembly | Q08923 | 19.64 | transcription |
| Q03162 | 14.08 | ribosome biogenesis and assembly | Q12395 | 19.64 | transcription |
| P38254 | 13.03 | RNA metabolic process | Q12443 | 19.64 | transcription |
| P38768 | 13.03 | RNA metabolic process | P38182 | 23.45 | vesicle-mediated transport |
| P53094 | 13.03 | RNA metabolic process | Q12125 | 23.45 | vesicle-mediated transport |
| P53212 | 13.03 | RNA metabolic process | Q04562 | 20.50 | vesicle-mediated transport |
| P53952 | 13.03 | RNA metabolic process | Q12327 | 20.50 | vesicle-mediated transport |
The Swiss-Prot ID of proteins is listed in the Protein column, corresponding P-value (-log p-value > 10) is listed in the P-value column and predicted function for each protein is listed in the Predicted function column.
Figure 5The topological characteristics of module hubs and inter-module hubs A) the degree distribution, B) the betweeness distribution, C) the interactions between hub nodes, D) the clustering coefficient distribution.
GO annotation (Top3) for module hubs and inter-module hubs
| GO | Module Hubs | Inter-module Hubs | ||
|---|---|---|---|---|
| Protein catabolic process | 7.64 | Signal transduction | 4.24 | |
| RNA metabolic process | 5.86 | Anatomical structure morphogenesis | 3.20 | |
| Nuclear organization and biogenesis | 3.95 | Cell budding | 3.08 | |
| RNA binding | 4.12 | Protein kinase activity | 3.26 | |
| Peptidase activity | 4.10 | Signal transducer activity | 2.33 | |
| Structural molecule activity | 1.48 | DNA binding | 1.45 | |
| Nucleus | 7.29 | Cell cortex | 1.69 | |
| Endomembrane system | 3.78 | Site of polarized growth | 1.46 | |
| Golgi apparatus | 2.71 | Cytoskeleton | 1.32 |
The numbers indicate the corresponding -log P-values.
The enrichment of lethal genes in different groups
| Type | Lethal | Viable | Unknown | Lethal % | -log P-value |
|---|---|---|---|---|---|
| 119 | 131 | 11 | 45.59 | 7.09 | |
| 101 | 89 | 2 | 52.60 | 10.26 | |
| 18 | 42 | 9 | 26.09 | 0.10 | |
| 210 | 397 | 89 | 30.17 | 0.33 | |
| 386 | 1109 | 326 | 21.20 | 0.00 |