| Literature DB >> 20356386 |
Jia Zeng1, Shanfeng Zhu, Alan Wee-Chung Liew, Hong Yan.
Abstract
BACKGROUND: Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem.Entities:
Mesh:
Year: 2010 PMID: 20356386 PMCID: PMC3098054 DOI: 10.1186/1471-2105-11-164
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1(A) The consistent problem in Eq. (2), where the intersection set C0 is nonempty. The circle is the initial solution. The thick black point is the consistent solution in the intersection of two sets for gene expression and GO constraints, respectively. POCS ensures that the initial solution will converge to the consistent solution after enough projections represented by the arrows. (B) The inconsistent problem in Eq. (4), where the intersection set C0 is empty. After enough simultaneous projections represented by the arrows, the thick black dot is the approximate solution such that a weighted set distance from gene expression and GO constraints is minimized.
10 reference gene clusters from KEGG
| Cluster name | Number of Genes |
|---|---|
| Amino acid metabolism | 197 |
| Carbohydrate metabolism | 189 |
| Metabolism of cofactors vitamins | 47 |
| Energy metabolism | 66 |
| Glycan biosynthesis and metabolism | 21 |
| Lipid metabolism | 74 |
| Nucleotide metabolism | 103 |
| Metabolism of other amino acids | 50 |
| Metabolism of secondary metabolites | 18 |
| Xenobiotics biodegradation and metabolism | 19 |
26 reference gene clusters from yeast biochemical pathways
| Cluster name | Number of genes |
|---|---|
| TCA cycle, aerobic respiration | 24 |
| de novo biosynthesis of purine nucleotides | 32 |
| de novo biosynthesis of pyrimidine deoxyribonucleotides | 15 |
| de novo biosynthesis of pyrimidine ribonucleotides | 12 |
| ergosterol biosynthesis | 15 |
| fatty acid biosynthesis, initial steps | 12 |
| fatty acid oxidation pathway | 11 |
| folate biosynthesis | 24 |
| folate interconversions | 17 |
| folate polyglutamylation | 13 |
| folate transformations | 16 |
| gluconeogenesis | 17 |
| glycolysis | 14 |
| glyoxylate cycle | 12 |
| inositol phosphate biosynthesis | 14 |
| isoleucine degradation | 13 |
| lipid-linked oligosaccharide biosynthesis | 15 |
| pantothenate and coenzyme A biosynthesis | 11 |
| phenylalanine degradation | 12 |
| phosphatidylinositol phosphate biosynthesis | 21 |
| protein modifications | 12 |
| salvage pathways of adenine, hypoxanthine, and their nucleosides | 11 |
| sphingolipid metabolism | 23 |
| superpathway of glucose fermentation | 14 |
| tryptophan degradation | 12 |
| valine degradation | 11 |
Figure 2GLL of the alpha dataset on the KEGG and SGD when 10 projections are used.
Figure 3GLL of the alpha dataset on the KEGG and SGD when different weights .
Five-fold cross-validation of the GLL values on KEGG clusters
| Datasets | POCS | |||
|---|---|---|---|---|
| (a) | ||||
| alpha | -198 ± 8 | -354 ± 9 | -253 ± 14 | -238 ± 14 |
| cdc15 | -194 ± 7 | -372 ± 22 | -220 ± 8 | -212 ± 12 |
| cdc28 | -200 ± 9 | -340 ± 20 | -265 ± 8 | -244 ± 10 |
| elu | -199 ± 6 | -355 ± 14 | -253 ± 10 | -228 ± 10 |
| Hughes | -191 ± 4 | -329 ± 17 | -212 ± 5 | -196 ± 9 |
| (b) | ||||
| alpha | -184 ± 6 | -415 ± 32 | -282 ± 12 | -262 ± 16 |
| cdc15 | -182 ± 4 | -413 ± 28 | -278 ± 10 | -255 ± 15 |
| cdc28 | -189 ± 9 | -424 ± 18 | -294 ± 9 | -271 ± 11 |
| elu | -187 ± 9 | -410 ± 35 | -297 ± 11 | -291 ± 13 |
| Hughes | -180 ± 8 | -401 ± 9 | -262 ± 6 | -234 ± 10 |
| (c) | ||||
| alpha | -243 ± 10 | -461 ± 27 | -288 ± 11 | -254 ± 22 |
| cdc15 | -225 ± 10 | -460 ± 26 | -271 ± 8 | -246 ± 14 |
| cdc28 | -248 ± 8 | -478 ± 33 | -301 ± 9 | -270 ± 10 |
| elu | -259 ± 10 | -476 ± 35 | -304 ± 7 | -286 ± 13 |
| Hughes | -222 ± 6 | -455 ± 34 | -276 ± 9 | -239 ± 13 |
| (d) | ||||
| alpha | -304 ± 13 | -494 ± 26 | -363 ± 18 | -328 ± 13 |
| cdc15 | -302 ± 6 | -491 ± 41 | -369 ± 12 | -331 ± 17 |
| cdc28 | -298 ± 8 | -444 ± 37 | -363 ± 14 | -326 ± 17 |
| elu | -321 ± 7 | -535 ± 23 | -378 ± 9 | -342 ± 13 |
| Hughes | -284 ± 7 | -478 ± 19 | -351 ± 11 | -319 ± 11 |
Five-fold cross-validation of the NMI values on KEGG clusters
| Datasets | POCS | |||
|---|---|---|---|---|
| (a) | ||||
| alpha | 0.287 ± 0.008 | 0.234 ± 0.007 | 0.251 ± 0.005 | 0.265 ± 0.005 |
| cdc15 | 0.282 ± 0.003 | 0.222 ± 0.009 | 0.259 ± 0.002 | 0.268 ± 0.009 |
| cdc28 | 0.267 ± 0.009 | 0.226 ± 0.005 | 0.209 ± 0.003 | 0.236 ± 0.003 |
| elu | 0.263 ± 0.006 | 0.219 ± 0.004 | 0.215 ± 0.001 | 0.240 ± 0.006 |
| Hughes | 0.289 ± 0.006 | 0.238 ± 0.007 | 0.254 ± 0.007 | 0.271 ± 0.005 |
| (b) | ||||
| alpha | 0.310 ± 0.009 | 0.255 ± 0.010 | 0.260 ± 0.010 | 0.283 ± 0.007 |
| cdc15 | 0.305 ± 0.004 | 0.266 ± 0.004 | 0.278 ± 0.012 | 0.281 ± 0.001 |
| cdc28 | 0.301 ± 0.001 | 0.266 ± 0.009 | 0.263 ± 0.008 | 0.279 ± 0.001 |
| elu | 0.292 ± 0.007 | 0.234 ± 0.002 | 0.244 ± 0.006 | 0.264 ± 0.009 |
| Hughes | 0.322 ± 0.003 | 0.286 ± 0.001 | 0.285 ± 0.007 | 0.303 ± 0.008 |
| (c) | ||||
| alpha | 0.382 ± 0.005 | 0.331 ± 0.001 | 0.335 ± 0.007 | 0.361 ± 0.004 |
| cdc15 | 0.384 ± 0.002 | 0.339 ± 0.004 | 0.341 ± 0.003 | 0.367 ± 0.004 |
| cdc28 | 0.361 ± 0.003 | 0.322 ± 0.001 | 0.336 ± 0.009 | 0.350 ± 0.007 |
| elu | 0.354 ± 0.007 | 0.311 ± 0.002 | 0.325 ± 0.003 | 0.342 ± 0.003 |
| Hughes | 0.396 ± 0.009 | 0.326 ± 0.003 | 0.356 ± 0.005 | 0.376 ± 0.009 |
| (d) | ||||
| alpha | 0.348 ± 0.008 | 0.307 ± 0.008 | 0.321 ± 0.008 | 0.339 ± 0.007 |
| cdc15 | 0.353 ± 0.005 | 0.312 ± 0.002 | 0.309 ± 0.009 | 0.330 ± 0.009 |
| cdc28 | 0.351 ± 0.003 | 0.316 ± 0.009 | 0.302 ± 0.009 | 0.336 ± 0.006 |
| elu | 0.338 ± 0.007 | 0.290 ± 0.007 | 0.308 ± 0.002 | 0.325 ± 0.005 |
| Hughes | 0.358 ± 0.007 | 0.320 ± 0.004 | 0.323 ± 0.005 | 0.343 ± 0.004 |
Five-fold cross-validation of the GLL values on SGD clusters
| Datasets | POCS | |||
|---|---|---|---|---|
| (a) | ||||
| alpha | -49 ± 3 | -146 ± 8 | -66 ± 2 | -62 ± 2 |
| cdc15 | -47 ± 1 | -148 ± 13 | -67 ± 3 | -61 ± 3 |
| cdc28 | -50 ± 2 | -154 ± 14 | -79 ± 3 | -64 ± 3 |
| elu | -52 ± 3 | -152 ± 9 | -69 ± 4 | -61 ± 3 |
| Hughes | -43 ± 3 | -143 ± 11 | -65 ± 4 | -55 ± 3 |
| (b) | ||||
| alpha | -42 ± 3 | -171 ± 4 | -69 ± 1 | -64 ± 2 |
| cdc15 | -40 ± 1 | -172 ± 4 | -78 ± 4 | -59 ± 3 |
| cdc28 | -43 ± 3 | -169 ± 10 | -79 ± 3 | -64 ± 4 |
| elu | -43 ± 1 | -170 ± 13 | -80 ± 3 | -62 ± 3 |
| Hughes | -39 ± 3 | -167 ± 14 | -62 ± 4 | -53 ± 4 |
| (c) | ||||
| alpha | -71 ± 3 | -190 ± 8 | -86 ± 2 | -82 ± 2 |
| cdc15 | -74 ± 3 | -194 ± 16 | -89 ± 6 | -79 ± 5 |
| cdc28 | -67 ± 3 | -188 ± 14 | -87 ± 2 | -71 ± 2 |
| elu | -82 ± 6 | -197 ± 6 | -89 ± 2 | -88 ± 2 |
| Hughes | -64 ± 4 | -182 ± 11 | -81 ± 5 | -70 ± 4 |
| (d) | ||||
| alpha | -64 ± 2 | -216 ± 9 | -91 ± 2 | -78 ± 2 |
| cdc15 | -65 ± 4 | -213 ± 17 | -89 ± 6 | -80 ± 6 |
| cdc28 | -62 ± 3 | -216 ± 11 | -89 ± 2 | -77 ± 3 |
| elu | -72 ± 3 | -219 ± 14 | -93 ± 2 | -85 ± 4 |
| Hughes | -63 ± 5 | -204 ± 8 | -84 ± 5 | -67 ± 4 |
Five-fold cross-validation of the NMI values on SGD clusters
| Datasets | POCS | |||
|---|---|---|---|---|
| (a) | ||||
| alpha | 0.438 ± 0.008 | 0.383 ± 0.002 | 0.404 ± 0.002 | 0.408 ± 0.003 |
| cdc15 | 0.462 ± 0.001 | 0.389 ± 0.004 | 0.422 ± 0.003 | 0.429 ± 0.004 |
| cdc28 | 0.428 ± 0.005 | 0.387 ± 0.001 | 0.400 ± 0.002 | 0.411 ± 0.004 |
| elu | 0.432 ± 0.006 | 0.410 ± 0.004 | 0.411 ± 0.003 | 0.412 ± 0.004 |
| Hughes | 0.467 ± 0.004 | 0.414 ± 0.009 | 0.434 ± 0.003 | 0.439 ± 0.003 |
| (b) | ||||
| alpha | 0.533 ± 0.003 | 0.471 ± 0.006 | 0.507 ± 0.004 | 0.517 ± 0.004 |
| cdc15 | 0.572 ± 0.002 | 0.507 ± 0.005 | 0.528 ± 0.005 | 0.540 ± 0.003 |
| cdc28 | 0.552 ± 0.001 | 0.488 ± 0.004 | 0.524 ± 0.005 | 0.543 ± 0.003 |
| elu | 0.536 ± 0.008 | 0.466 ± 0.004 | 0.514 ± 0.003 | 0.525 ± 0.003 |
| Hughes | 0.566 ± 0.001 | 0.513 ± 0.007 | 0.549 ± 0.003 | 0.546 ± 0.005 |
| (c) | ||||
| alpha | 0.607 ± 0.003 | 0.551 ± 0.003 | 0.579 ± 0.004 | 0.583 ± 0.004 |
| cdc15 | 0.613 ± 0.001 | 0.543 ± 0.005 | 0.580 ± 0.003 | 0.587 ± 0.004 |
| cdc28 | 0.598 ± 0.002 | 0.551 ± 0.003 | 0.587 ± 0.004 | 0.586 ± 0.005 |
| elu | 0.593 ± 0.001 | 0.539 ± 0.006 | 0.567 ± 0.004 | 0.564 ± 0.003 |
| Hughes | 0.638 ± 0.004 | 0.576 ± 0.003 | 0.586 ± 0.005 | 0.591 ± 0.003 |
| (d) | ||||
| alpha | 0.649 ± 0.002 | 0.586 ± 0.004 | 0.636 ± 0.004 | 0.634 ± 0.006 |
| cdc15 | 0.648 ± 0.006 | 0.594 ± 0.005 | 0.621 ± 0.004 | 0.620 ± 0.005 |
| cdc28 | 0.661 ± 0.003 | 0.607 ± 0.005 | 0.630 ± 0.005 | 0.637 ± 0.006 |
| elu | 0.637 ± 0.004 | 0.607 ± 0.008 | 0.619 ± 0.006 | 0.621 ± 0.005 |
| Hughes | 0.667 ± 0.003 | 0.617 ± 0.009 | 0.637 ± 0.004 | 0.646 ± 0.005 |
P-values of pairwise t-test of POCS and FCM
| Number of clusters | KEGG | SGD |
|---|---|---|
| 10 | 1.60e-3 | 1.10e-3 |
| 15 | 1.29e-4 | 1.25e-2 |
| 20 | 1.30e-3 | 8.10e-3 |
| 25 | 2.80e-3 | 1.00e-3 |