| Literature DB >> 29348740 |
Yanhua Wang1, Xiyu Liu1, Laisheng Xiang1.
Abstract
Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.Entities:
Mesh:
Year: 2017 PMID: 29348740 PMCID: PMC5734009 DOI: 10.1155/2017/4367342
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The ensemble clustering process.
Figure 2A basic membrane structure.
Figure 3The generation of microcluster.
Figure 4The membrane structure for the GMEAEC.
Figure 5Flow diagram of the proposed approach.
Some characteristics of data sets.
| Data sets | Source | Objects | Attributes | Classes |
|---|---|---|---|---|
| Balance | UCI | 625 | 4 | 2 |
| Iris | UCI | 150 | 4 | 3 |
| Pima | UCI | 768 | 8 | 2 |
| Wine | UCI | 178 | 13 | 3 |
| Magic04 | UCI | 19020 | 10 | 2 |
| Segmentation | UCI | 2100 | 19 | 7 |
Figure 6GMEAEC versus base clusterings.
Average performances (in terms of R) over 100 runs by different ensemble clustering methods (the three highest scores of AVE and the three lowest scores of Var in each column are highlighted in bold).
| Method | Balance | Iris | Pima | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAX | MIN | AVE | VAR | MAX | MIN | AVE | VAR | MAX | MIN | AVE | VAR | |
| GMEAEC |
| 0.621 |
|
| 0.917 | 0.877 |
|
|
| 0.733 |
|
|
| CEGA | 0.699 | 0.542 |
|
|
| 0.755 |
| 0.00756 | 0.725 | 0.633 | 0.676 |
|
| CSPA | 0.711 | 0.520 |
| 0.00989 | 0.920 | 0.794 |
| 0.00482 | 0.820 | 0.712 |
| 0.00543 |
| HGPA | 0.655 | 0.578 |
|
| 0.842 | 0.702 | 0.815 |
| 0.830 | 0.648 |
| 0.01211 |
| MCLA | 0.633 | 0.456 | 0.594 | 0.01012 | 0.830 | 0.768 | 0.791 |
| 0.820 | 0.662 | 0.738 | 0.00378 |
| KCC | 0.694 | 0.377 | 0.544 | 0.01982 | 0.878 | 0.544 | 0.742 | 0.01351 | 0.735 | 0.698 | 0.716 |
|
|
| ||||||||||||
| Method | Wine | Magic04 | Seg | |||||||||
| MAX | MIN | AVE | VAR | MAX | MIN | AVE | VAR | MAX | MIN | AVE | VAR | |
|
| ||||||||||||
| GMEAEC |
| 0.878 |
|
| 0.783 | 0.655 |
|
| 0.751 | 0.615 |
|
|
| CEGA | 0.930 | 0.840 |
|
| 0.712 | 0.542 |
| 0.00942 | 0.659 | 0.421 | 0.558 | 0.00983 |
| CSPA | 0.723 | 0.553 | 0.693 |
|
| 0.554 |
| 0.01564 | 0.456 | 0.235 | 0.373 |
|
| HGPA | 0.830 | 0.662 | 0.759 | 0.00756 | 0.577 | 0.432 | 0.520 |
| 0.658 | 0.423 | 0.504 | 0.01425 |
| MCLA | 0.879 | 0.320 |
| 0.09844 | 0.654 | 0.344 | 0.526 | 0.02121 |
| 0.684 |
|
|
| KCC | 0.886 | 0.226 | 0.717 | 0.11254 | 0.756 | 0.498 | 0.624 |
| 0.755 | 0.524 |
| 0.00997 |
Figure 7The number of times each approach is ranked in the top (bottom) 3 across Table 2.
Figure 8The average performances over 10 runs on different methods by varying ensemble sizes M.