| Literature DB >> 25707632 |
Xiujuan Lei, Chao Ying, Fang-Xiang Wu, Jin Xu.
Abstract
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.Entities:
Mesh:
Year: 2015 PMID: 25707632 PMCID: PMC4331806 DOI: 10.1186/1471-2164-16-S3-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1.
Figure 2Flow chart of the improved SHC algorithm.
Comparisons of the Pearson correlation coefficient between fval and f-measure in different ρ
|
| 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 50.9059 | 51.9654 | 52.0163 | 52.0588 | 52.0933 | 52.1201 | 52.1396 | 52.1521 | 52.1580 | 52.1574 | 52.1508 |
The average of maximum objective function values of different α for 20 times clustering (value)
|
| 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 94.4592 | 95.7940 | 95.4177 | 95.4546 | 95.7102 | 96.1161 | 95.2840 | 95.7408 | 95.7408 | 96.2709 | 95.7664 |
The experimental parameters of PSO, GA and FA algorithms
| PSO | The global acceleration coefficient ( | The local acceleration coefficient ( |
|---|---|---|
| GA | The crossover probability ( | The mutation probability ( |
| FA | The maximum of attractiveness | The light absorption coefficient |
The maximum objective function value of 10 times (value) of the FA, PSO, and GA algorithms
| Algorithm | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| PSO | 96.2709 | 96.2709 | 91.6680 | 96.2709 | 96.2709 | 91.6680 | 91.6680 | 96.2709 | 96.2709 | 96.2709 |
| GA | 96.2709 | 95.8277 | 95.9618 | 96.1602 | 96.0500 | 95.9618 | 95.9655 | 96.0500 | 96.2709 | 96.2709 |
| FA | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 | 96.2709 |
The average of the maximal objective function value of PSO, GA and FA algorithms on 10 times
| Algorithm | PSO | GA | FA |
|---|---|---|---|
| 94.8900 | 96.0790 | 96.2709 |
Figure 3Plots of the optimal objective function value with the number of iterations of the FA, PSO and GA. (a) Comparison of precision value (b) Comparison of recall value (c) Comparison of f-measure value
Figure 4The improved algorithm compared with spectral clustering algorithm and SHC in .
Comparison of precision, recall and f-measure among ISHC, SHC, SC and other algorithms
| Algorithm |
|
|
|
|---|---|---|---|
| SHC[ | 0.4447 | 0.3430 | 0.3873 |
| SC[ | 0.3612 | 0.3555 | 0.3584 |
| MCL[ | 0.3569 | 0.3879 | 0.3717 |
| Newman[ | 0.4665 | 0.4186 | 0.4413 |
| RNSC[ | 0.4067 | 0.4696 | 0.4359 |
| ISHC | 0.6624 | 0.3620 | 0.4673 |