| Literature DB >> 16336639 |
Vito Di Gesú1, Raffaele Giancarlo, Giosué Lo Bosco, Alessandra Raimondi, Davide Scaturro.
Abstract
BACKGROUND: Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16336639 PMCID: PMC1343581 DOI: 10.1186/1471-2105-6-289
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Convergence of GenClust. Experimental convergence of GenClust on each of the five data sets. The x-coordinate gives the number of iterations and the y-coordinate the value of the total internal variance (2). For each data set, the experiment was performed by asking the algorithm to return a clustering solution with a number of clusters equal to the number of classes in the true solution, for each data set.
Figure 2Adjusted Rand Index. Experiments for adjusted Rand index. For each data set and each algorithm, the index is displayed as a function of the number of clusters.
RCNS Data Set. Performance of the algorithms at the number of classes (six) of the true solution for RCNS Rat data set.
| 0.168 | 3.89 | |
| 0.144 | 3.81 | |
| 0.258 | 3.81 | |
| 0.12 | 3.98 | |
| 0.167 | 3.71 | |
| 0.19 | 4.05 | |
| 0.161 | 4.07 |
YCC. Performance of the algorithms at the number of classes (five) of the true solution for YCC data set.
| 0.47 | 57.05 | |
| 0.44 | 57.05 | |
| 0.49 | 57.05 | |
| 0.529 | 56.66 | |
| 0.508 | 57.36 | |
| 0.559 | 58.78 | |
| 0.518 | 57.21 |
RYCC. Performance of the algorithms at the number of classes (five) of the true solution for the RYCC data set.
| 0.446 | 10.60 | |
| 0.359 | 10.69 | |
| 0.49 | 10.69 | |
| 0.49 | 10.84 | |
| 0.469 | 10.73 | |
| 0.46 | 11.50 | |
| 0.518 | 10.804 |
PBM. Performance of the algorithms at the number of classes (eighteen) of the true solution for the PBM data set.
| 0.51 | |
| 0.37 | |
| 0.429 | |
| 0.528 | |
| 0.58 | |
| 0.18 | |
| 0.51 |
RPBM. Performance of the algorithms at the number of classes (eighteen) of the true solution for the RPBM data set.
| 0.509 | 57.49 | |
| 0.378 | 55.73 | |
| 0.51 | 55.73 | |
| 0.679 | 50.21 | |
| 0.618 | 59.49 | |
| 0.517 | 62.27 | |
| 0.80 | 59.33 |
Adjusted Rand Index for Click. Performance of Click on the various data sets. The results in the clusters column give the number of clusters returned by Click, in addition to one class consisting of all the unclustered elements.
| 3 + 1 | 0.183 | |
| 18 + 1 | 0.767 | |
| 6 + 1 | 0.658 | |
| 7 + 1 | 0.510 | |
| 6 + 1 | 0.479 |
Figure 3FOM. Experiments for FOM. The index is displayed as a function of the number of clusters.