| Literature DB >> 23282075 |
Je-Gun Joung1, Soo-Jin Kim, Soo-Yong Shin, Byoung-Tak Zhang.
Abstract
BACKGROUND: Biclustering has been utilized to find functionally important patterns in biological problem. Here a bicluster is a submatrix that consists of a subset of rows and a subset of columns in a matrix, and contains homogeneous patterns. The problem of finding biclusters is still challengeable due to computational complex trying to capture patterns from two-dimensional features.Entities:
Mesh:
Year: 2012 PMID: 23282075 PMCID: PMC3521386 DOI: 10.1186/1471-2105-13-S17-S12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of biclustering. The rows represent genes and the columns represent conditions. All the elements in the bicluster are highlighted in gray.
Figure 2Probabilistic coevolutionary biclustering algorithm. Pop(G) is a population for gene set and Pop(C) is that for condition set. Individuals, xand yare evaluated and the bests are selected. The probability vectors of two populations, Pand Pare updated and new populations are generated by sampling and mutation in each iteration. Each parameter indicates: δ (cutoff of residue score); μ and ν (initial size of gene and condition population); wand w(parameters controlling the variance and volume); wand w(parameters keeping a balance between the genes and condition); α and β (parameters controlling update of probability); S(best individuals in genes); S(best individuals in conditions, respectively).
Parameter setting of PCOBA
| Parameter | Description | Artificial dataset | Real dataset |
|---|---|---|---|
| Pop. size for genes | 100 (1000) | 1000 | |
| Pop. size for conditions. | 50 | 100 | |
| Maximum generation | 100 (200) | 500 | |
| Cutoff of residue score | 20 (300) | 250 | |
| Controlling the variance | 0.5 | 0.5 | |
| Controlling the volume | 10 (30) | 30 | |
| Keeping a balance between | 0.9 (0.8) | 0.8 | |
| gene and condition | 0.1 (0.2) | 0.2 | |
| Controlling update of probabilities. | 0.2, 0.2 | 0.2, 0.2 | |
| Size of best individuals in genes and conditions | 20, 10 (200, 10) | 200, 20 |
() corresponds to Ec dataset.
Figure 3Simulation results of PCOBA using the synthetic dataset, . (a) A plot showing the fitness over generations. (b) The mean residue score at each generation. (c) The variance versus generation is shown. (d) A plot showing the change in volume. These plots show the average and variance of 100 runs.
Comparison of the performance of PCOBA and other evolutionary algorithms.
| Datasets | Algorithms | Avg. Fitness | Avg. Residue | Avg. Variance | Avg. Volume |
|---|---|---|---|---|---|
| GA | 11.96 ± 16.32 | 203.51 ± 323.67 | 19745 ± 9587.70 | 105.28 ± 54.28 | |
| CGA | 3.90 ± 6.99 | 36.63 ± 140.32 | 21220 ± 7202 | 72.39 ± 20.11 | |
| EDA | 5.80 ± 11.14 | 81.84 ± 220.84 | 23527 ± 6719.4 | 127.48 ± 21.64 | |
| PCOBA | 0.05 ± 0.00 | 26254 ± 833.22 | 104.90 ± 8.49 | ||
| GA | 5.59 ± 10.16 | 76.67 ± 201.51 | 18570 ± 7496.3 | 107.17 ± 38.87 | |
| CGA | 3.05 ± 5.02 | 20.03 ± 100.81 | 22489 ± 6876.7 | 75.49 ± 18.99 | |
| EDA | 5.12 ± 8.28 | 67.63 ± 163.60 | 20862 ± 6834.7 | 112.36 ± 44.52 | |
| PCOBA | 2.74 ± 26.88 | 25199 ± 3295.9 | 99.66 ± 16.92 | ||
| GA | 2.21 ± 0.02 | 262.63 ± 9.05 | 3807.20 ± 1068 | 470.96 ± 18.90 | |
| CGA | 2.20 ± 0.03 | 263.09 ± 7.55 | 3229.40 ± 1160.4 | 443.00 ± 19.07 | |
| EDA | 2.22 ± 0.05 | 263.94 ± 6.96 | 2359.70 ± 228.74 | 450.83 ± 50.57 | |
| PCOBA | 265.01 ± 4.63 | 2473.50 ± 176.1 | 562.63 ± 47.43 | ||
Mean and standard deviation values after 100 independent runs are shown.
The lower score means the expression values in cluster are more similar.
Performance between PCOBA and other biclustering algorithm.
| PCOBA | CC | OPSM | |
|---|---|---|---|
| 219.15 ± 1.14 | 221.40 ± 8.99 | 447.72 ± 88.36 | |
| 412.11 ± 17.62 | 404.67 ± 134.26 | 1224.89 ± 415.95 | |
| 1321.30 ± 102.82 | 1369.18 ± 366.90 | 1365.40 ± 1642.85 | |
| 92.40 ± 1.64 | 98.54 ± 21.89 | 265.10 ± 412.22 | |
| 14.30 ± 0.48 | 12.18 ± 2.37 | 8.50 ± 3.02 |
Mean and standard deviation values of the ten best biclusters after single run are shown.
The lower score means the expression values in cluster are more similar.