| Literature DB >> 29297344 |
Xiujuan Lei1, Huan Li2, Aidong Zhang3, Fang-Xiang Wu4,5.
Abstract
BACKGROUND: Identifying protein complexes plays an important role for understanding cellular organization and functional mechanisms. As plenty of evidences have indicated that dense sub-networks in dynamic protein-protein interaction network (DPIN) usually correspond to protein complexes, identifying protein complexes is formulated as density-based clustering.Entities:
Keywords: Density-based clustering; Glowworm swarm optimization algorithm (GSO); Ordering points to identify the clustering structure algorithm (OPTICS); Protein complex
Mesh:
Year: 2017 PMID: 29297344 PMCID: PMC5751787 DOI: 10.1186/s12920-017-0314-x
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1An example of the two distances
Fig. 2Illustration of the cluster-ordering (a) Reachability-plots for a part of DIP data by OPTICS (b) One cluster in Reachability-plots
The value of parameters which corresponding to the best result in each sub-network on DIP
| Timestamps | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ɛ | 0.62 | 0.50 | 0.52 | 0.59 | 0.60 | 0.58 | 0.62 | 0.51 | 0.55 | 0.64 | 0.60 | 0.60 |
| MinPts | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| precision | 0.7500 | 0.7421 | 0.8182 | 0.9000 | 0.8462 | 0.8889 | 0.9048 | 0.7805 | 0.6064 | 0.7419 | 0.6970 | 0.9524 |
| recall | 0.5263 | 0.5122 | 0.3971 | 0.4500 | 0.4400 | 0.2587 | 0.3115 | 0.5565 | 0.5089 | 0.5897 | 0.5349 | 0.5882 |
| f-measure | 0.6185 | 0.6000 | 0.5347 | 0.6000 | 0.5789 | 0.4324 | 0.4634 | 0.6497 | 0.5534 | 0.6571 | 0.6053 | 0.7273 |
Fig. 3The corresponding relationships between GSO and OPTICS algorithms
The number of proteins and interactions in each sub-network of the four datasets contain
| DIP data | Timestamps | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Proteins | 797 | 941 | 796 | 623 | 610 | 530 | 493 | 944 | 1090 | 591 | 661 | 461 | |
| Interactions | 981 | 1444 | 1188 | 745 | 750 | 646 | 573 | 1705 | 2185 | 856 | 974 | 526 | |
| Krogan data | Timestamps | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| Proteins | 336 | 379 | 320 | 256 | 206 | 189 | 202 | 580 | 626 | 304 | 330 | 250 | |
| Interactions | 334 | 464 | 331 | 234 | 210 | 184 | 213 | 1025 | 1081 | 314 | 373 | 258 | |
| MIPS data | Timestamps | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| Proteins | 737 | 897 | 781 | 583 | 570 | 531 | 470 | 839 | 1,014 | 523 | 616 | 402 | |
| Interactions | 1097 | 1443 | 1183 | 754 | 684 | 642 | 504 | 1238 | 1637 | 878 | 1207 | 700 | |
| Gavin data | Timestamps | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| Proteins | 177 | 228 | 215 | 135 | 112 | 102 | 96 | 379 | 419 | 174 | 190 | 146 | |
| Interactions | 242 | 334 | 317 | 150 | 135 | 118 | 135 | 1019 | 1043 | 230 | 264 | 184 |
Fig. 4The effect of different values of MinPts on f-measure
Description of clusters predicted by several clustering algorithms
| Algorithms | Category | Cluster | Average Size | ||||||
|---|---|---|---|---|---|---|---|---|---|
| DIP | Krogan | MIPS | Gavin | DIP | Krogan | MIPS | Gavin | ||
| CMC [ | Density | 1263 | 907 | 168 | 486 | 4.39 | 4.56 | – | 5.55 |
| COACH [ | Core | 903 | 547 | 448 | 361 | 3.89 | 8.97 | – | 8.26 |
| MCL [ | Flow | 623 | 932 | – | 425 | 6.57 | 3.62 | – | 3.93 |
| MCODE [ | Density | 63 | 85 | 85 | 150 | 19.00 | 5.88 | – | 6.63 |
| ClusterOne [ | Graph | 372 | 373 | 256 | 312 | 4.90 | 4.29 | – | 6.35 |
| CFinder [ | Density | 609 | 88 | – | 137 | 6.18 | 12.73 | – | 9.6 |
| DBSCAN [ | Density | 492 | 96 | 130 | 24 | 6.26 | 34.43 | 14.3 | 12.7 |
| OPTICS | Density | 107 | 278 | 439 | 108 | 5.90 | 4.17 | 9 | 13.5 |
| OPTICS_PSO [ | Density | 76 | 119 | 98 | 84 | 5.59 | 8.05 | 33 | 8.45 |
| iOPTICS-GSO | Density | 99 | 143 | 86 | 101 | 5.76 | 5.62 | 26.5 | 8.14 |
Fig. 5The performance comparisons of various algorithms on four datasets
Comparison of the functional enrichment of protein complexes with other algorithms on four datasets
| Dataset | Algorithm | <E-15 | [E-15, E-10] | [E-10, E-5] | [E-5, 0.01] | <0.01 significant | ≥0.01 insignificant |
|---|---|---|---|---|---|---|---|
| DIP | COACH | 33(6.96%) | 44(9.28%) | 205(43.25%) | 126(26.58%) | 408(86.08%) | 66(13.92%) |
| MCL | 19(1.80%) | 47(4.46%) | 183(17.38%) | 362(34.38%) | 611(58.02%) | 442(41.98%) | |
| MCODE | 12(7.27%) | 17(10.30%) | 80(48.48%) | 38(23.03%) | 147(89.09%) | 18(10.91%) | |
| ClusterOne | 21(3.66%) | 52(9.06%) | 177(30.84%) | 184(32.06%) | 434(75.61%) | 140(24.39%) | |
| OPTICS | 7(7.87%) | 13(14.61%) | 40(44.94%) | 21(23.6%) | 81(91.01%) | 8(8.99%) | |
| OPTICS_PSO | 5(6.85%) | 10(13.70%) | 27(36.99%) | 23(31.51%) | 65(89.04%) | 8(10.96%) | |
|
|
|
|
|
|
|
| |
| Krogan | COACH | 23(10.41%) | 37(16.74%) | 91(41.18%) | 54(24.43%) | 205(92.76%) | 16(7.24%) |
| MCL | 16(3.97%) | 43(10.67%) | 103(25.56%) | 119(29.53%) | 281(69.73%) | 122(30.27%) | |
| MCODE | 8(5.00%) | 28(17.50%) | 68(42.50%) | 46(28.75%) | 150(93.75%) | 10(6.25%) | |
| ClusterOne | 13(3.26%) | 43(10.78%) | 98(24.56%) | 120(30.08%) | 274(68.67%) | 125(31.33%) | |
| OPTICS | 13(8.44%) | 26(16.88%) | 56(36.36%) | 31(20.13%) | 126(81.82%) | 28(18.18%) | |
| OPTICS_PSO | 9(9.47%) | 19(20.0%) | 41(43.16%) | 21(22.11%) | 90(94.74%) | 5(5.26%) | |
|
|
|
|
|
|
|
| |
| MIPS | COACH | 16(4.04%) | 46(11.62%) | 145(36.62%) | 149(37.63%) | 356(89.9%) | 40(10.10%) |
| MCL | 5(0.83%) | 13(2.15%) | 94(15.51%) | 220(36.30%) | 332(54.79%) | 274(45.21%) | |
| MCODE | 5(3.70%) | 10(7.41%) | 70(51.58%) | 39(28.89%) | 124(91.85%) | 11(8.15%) | |
| ClusterOne | 7(1.88%) | 16(4.30%) | 117(31.45%) | 126(33.87%) | 266(71.51%) | 106(28.49%) | |
| OPTICS | 16(5.63%) | 6(2.11%) | 26(9.15%) | 74(26.06%) | 122(42.96%) | 162(57.04%) | |
| OPTICS_PSO | 10(11.76%) | 3(3.53%) | 28(32.94%) | 30(35.29%) | 71(83.53%) | 14(16.47%) | |
|
|
|
|
|
|
|
| |
| Gavin | COACH | 35(14.96%) | 39(16.67%) | 100(42.72%) | 55(23.50%) | 229(97.86%) | 5(2.14%) |
| MCL | 22(9.69%) | 34(14.98%) | 88(38.77%) | 66(29.07%) | 110(92.51%) | 17(7.49%) | |
| MCODE | 12(7.74%) | 20(12.90%) | 80(51.61%) | 39(25.16%) | 151(97.42%) | 4(2.58%) | |
| ClusterOne | 31(10.62%) | 34(11.64%) | 118(40.41%) | 82(28.08%) | 292(90.75%) | 27(9.25%) | |
| OPTICS | 20(18.52%) | 13(12.04%) | 53(49.07%) | 19(17.59%) | 105(97.22%) | 3(2.78%) | |
| OPTICS_PSO | 15(18.07%) | 13(15.66%) | 38(45.78%) | 16(19.28%) | 82(98.80%) | 1(1.20%) | |
|
|
|
|
|
|
|
|
The bold data in Tables 4 are the result of our four datasets
Some examples of the predicted complexes with small p-value on Gavin data
| No. | Predicted protein complex |
| Gene Ontology term |
|
|---|---|---|---|---|
| 1 |
| 1.22E-35 |
| 0.44 |
| 2 |
| 5.46E-32 |
| 0.11 |
| 3 |
| 2.37E-26 |
| 0.47 |
| 4 |
| 1.67E-21 |
| 0.41 |
| 5 |
| 2.91E-17 |
| 0.12 |
| 6 |
| 6.89E-14 |
| 0.36 |
| 7 |
| 4.3E-11 |
| 0.17 |
| 8 |
| 2.45E-07 |
| 0.07 |
The proteins in bold have well matched some known protein complex in benchmark complex dataset
Fig. 6Visualization of a protein complex (ID1 in Table 5). YKL144C, YNR003C, YPR110C, YPR190C, YDL150W, YKR025W, YNL151C, YBR154C, YJL011C, YNL113W, YDR045C, YNL248C, YJR063W, YOR340C, YIL021W, YML010W are the names of proteins, which represent different proteins