| Literature DB >> 27454228 |
Emad Ramadan1, Ahmed Naef2, Moataz Ahmed2.
Abstract
BACKGROUND: Protein-protein interaction networks are receiving increased attention due to their importance in understanding life at the cellular level. A major challenge in systems biology is to understand the modular structure of such biological networks. Although clustering techniques have been proposed for clustering protein-protein interaction networks, those techniques suffer from some drawbacks. The application of earlier clustering techniques to protein-protein interaction networks in order to predict protein complexes within the networks does not yield good results due to the small-world and power-law properties of these networks.Entities:
Keywords: Genetic algorithms; Graph clustering; Protein complex detection; Protein–protein interaction network
Mesh:
Year: 2016 PMID: 27454228 PMCID: PMC4965715 DOI: 10.1186/s12859-016-1096-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Clustering algorithm flowchart
Fig. 2Population initialization
Fig. 3The mutation operation. a shows the selected node of the cluster c. b shows the cluster c after the mutation operator
Comparison of clustering algorithms on the Collins network. The populations in our method are initialized using spectral and random clusterings
| Method | #Cls | CYC2008 | MIPS | Discard | ||||
|---|---|---|---|---|---|---|---|---|
| R | P | F-measure | R | P | F-measure | |||
| MCODE | 54 | 0.66 | 0.59 | 0.63 | 0.27 | 0.48 | 0.35 | 40 % |
| MCL | 75 | 0.65 | 0.45 | 0.54 | 0.27 | 0.34 | 0.30 | 19 % |
| ClusterOne | 114 | 0.55 | 0.43 | 0.49 | 0.20 | 0.34 | 0.25 | 18 % |
| Our method using | ||||||||
| spectral initialization | ||||||||
| 1) Density cut | 162 | 0.74 | 0.60 | 0.66 | 0.32 | 0.45 | 0.37 | 14 % |
| 2) Maxmin cut | 180 | 0.71 | 0.47 | 0.60 | 0.38 | 0.40 | 0.39 | 15 % |
| 3) Normalized cut | 193 | 0.67 | 0.50 | 0.57 | 0.39 | 0.37 | 0.37 | 20 % |
| 4) Ratio cut | 161 | 0.73 | 0.38 | 0.50 | 0.39 | 0.33 | 0.36 | 17 % |
| Our method using | ||||||||
| random initialization | ||||||||
| 5) Density cut | 164 | 0.72 | 0.54 | 0.62 | 0.30 | 0.41 | 0.35 | 18 % |
| 6) Maxmin cut | 162 | 0.71 | 0.45 | 0.56 | 0.40 | 0.35 | 0.38 | 17 % |
| 7) Normalized cut | 138 | 0.66 | 0.57 | 0.61 | 0.36 | 0.44 | 0.41 | 19 % |
| 8) Ratio cut | 154 | 0.61 | 0.55 | 0.58 | 0.34 | 0.43 | 0.38 | 18 % |
A few of the clusters in the Collins network with the lowest p-values with GO components
| # | Size | GO-ID | GO-Term |
| N% |
|---|---|---|---|---|---|
| 1 | 17 | GO:0030880 | RNA polymerase complex | 3.30986E-39 | 100.0 % |
| 2 | 8 | GO:0044428 | Nuclear part | 3.70274E-05 | 100.0 % |
| 3 | 7 | GO:0030126 | COPI vesicle coat | 1.37069E-21 | 100.0 % |
| 4 | 14 | GO:0044428 | Nuclear part | 7.23152E-10 | 100.0 % |
| 5 | 27 | GO:0005739 | Mitochondrion | 9.82318E-22 | 100.0 % |
| 7 | 18 | GO:0000502 | Proteasome complex | 1.76807E-40 | 100.0 % |
| 8 | 12 | GO:0005634 | Nucleus | 3.90352E-06 | 100.0 % |
| 9 | 7 | GO:0030008 | TRAPP complex | 1.02802E-20 | 100.0 % |
| 11 | 21 | GO:0005634 | Nucleus | 2.04087E-10 | 100.0 % |
| 12 | 10 | GO:0044425 | Membrane part | 4.18992E-10 | 100.0 % |
| 13 | 5 | GO:0035097 | Histone methyl–transferase complex | 1.31389E-11 | 100.0 % |
| 14 | 5 | GO:0030126 | COPI vesicle coat | 1.18247E-14 | 100.0 % |
| 15 | 9 | GO:0016585 | Chromatin remodeling complex | 2.37606E-17 | 100.0 % |
| 16 | 15 | GO:0000502 | Proteasome complex | 2.20275E-33 | 100.0 % |
| 17 | 13 | GO:0043189 | Histone acetyl–transferase complex | 1.21627E-39 | 100.0 % |
| 20 | 12 | GO:0016514 | SWI/SNF complex | 4.98150E-37 | 100.0 % |
| 21 | 60 | GO:0005634 | Nucleus | 2.15384E-32 | 100.0 % |
| 22 | 81 | GO:0043227 | Membrane-bound organelle | 4.87516E-23 | 100.0 % |
| 24 | 63 | GO:0044464 | Cell part | 3.42642E-05 | 98.4 % |
| 23 | 4 | GO:0031011 | INO80 complex | 4.13601E-07 | 75.0 % |
The GO component that has the lowest p-value with these clusters is listed, the number of proteins in the cluster that overlap with the GO component are listed as percentages of the number of proteins in the cluster (N%). p-values defined in the text are also shown