| Literature DB >> 24717120 |
Diego H Milone1, Georgina Stegmayer, Mariana López, Laura Kamenetzky, Fernando Carrari.
Abstract
BACKGROUND: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters.Entities:
Mesh:
Year: 2014 PMID: 24717120 PMCID: PMC4002909 DOI: 10.1186/1471-2105-15-101
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of SOM training using metabolic pathways (bSOM) with an artificial data set.a) ρ matrix; b) α=0.00; c) α=0.50; d) α=1.00 (the two dimensions in this simplified example could represent measures for two different treatments in real data). Each cluster found by the algorithm is indicated with a different color (red, green, cyan and purple). Groups of biologically related points are indicated with different markers (squares, diamonds, circles and triangles).
Figure 2Biological internal connectivity of data sets. Corresponding ρ matrix for: a) Solanum lycopersicum, b) Arabidopsis thaliana data sets. The intensity in the gray scale indicates a higher connection value.
Validation measures for SOM training: standard (sSOM) biological (bSOM) for metabolic datasets
| | | | ||
|---|---|---|---|---|
| 0.65 | 0.69 | 0.73 | 0.79 | |
| 0.66 | 0.65 | 0.59 | 0.49 | |
| 9.56 | 31.00 | 13.90 | 19.05 | |
| 0.59 | 0.40 | 0.40 | 0.37 | |
| 3.58 | 2.74 | 2.58 | 2.08 | |
| 0.87 | 0.90 | 0.83 | 0.87 | |
| 0.48 | 0.55 | 0.64 | 0.65 | |
| 0.81 | 0.79 | 0.79 | 0.71 | |
| 10.45 | 8.86 | 490 | 60.66 | |
| 0.32 | 0.24 | 0.50 | 0.54 | |
| 3.84 | 2.93 | 2.17 | 1.56 | |
| 0.65 | 0.67 | 0.52 | 0.48 | |
Validation measures for SOM training: standard (sSOM) and biological (bSOM) for the full datasets
| | | |||
|---|---|---|---|---|
| 0.79 | 0.80 | 0.80 | 0.81 | |
| 0.68 | 0.67 | 0.66 | 0.64 | |
| 8.80 | 9.07 | 9.12 | 10.64 | |
| 0.18 | 0.14 | 0.13 | 0.26 | |
| 3.32 | 2.65 | 2.38 | 1.80 | |
| 1.09 | 0.63 | 0.59 | 0.52 | |
| 0.51 | 0.52 | 0.51 | 0.51 | |
| 1.00 | 1.00 | 1.00 | 1.00 | |
| 13.30 | 12.02 | 10.35 | 12.19 | |
| 0.16 | 0.15 | 0.16 | 0.13 | |
| 3.13 | 3.10 | 2.80 | 2.00 | |
| 0.68 | 0.41 | 0.43 | 0.32 | |
Detail of patterns and common pathways for sSOM vs. bSOM
| Cluster A | Serine | Serine |
| | Threonine | Threonine |
| | Valine | Valine |
| | Glycine | Isoleucine |
| | Lysine | |
| Common | ko00260, ko00290 | ko00260, ko00290 |
| pathways | ko00970, map1060 | ko00970, map1060 |
| | ko02010 | ko02010 |
| | ko00460 | ko00966 |
| Cluster B | Arginine | Arginine |
| | Glycine | |
| | GABA | Lysine |
| Common | ko00330, ko00410 | ko00310, ko00970 |
| pathways | ko04080 | map1060, map1064 |
| | | ko02010 |
| Cluster C | LE31F17 | LE31F17 |
| | LE30O12 ∗ | LE16F20 |
| | LE26F02 ∗ | |
| Common | - | ko00052 |
| pathways | | ko00511, ko00531 |
| | | ko00600, ko00604 |
| Cluster D | Sucrose | Sucrose |
| | Aspartate | Glutamate |
| | 5oxoproline | Proline |
| | | LE23B16 ∗ |
| | | LE23N08 ∗ |
| Common | ko02010 | ko02010 |
| pathways | ko00330, ko00970 |
*does not participate in a well-known pathway.
Gene transcript codes. LE31F17: beta-galactosidase (GB acc# AAC25984); LE16F20: beta-galactosidase (GB acc# AAC25984); LE30O12: no data; LE26F02: component of oligomeric golgi complex, putative (GB acc# XP_00251994); LE23B16: CDPK-related protein kinase (GB acc# AAZ83348); LE23N08: no data.
Pathway codes. ko00260: Glycine, serine and threonine metabolism; ko00290: Valine, leucine and isoleucine biosynthesis; ko00970: Aminoacyl-tRNA biosynthesis; map01060: Biosynthesis of plant secondary metabolites; ko02010: ABC transporters; ko00460: Cyanoamino acid metabolism; ko00966: Glucosinolate biosynthesis; ko00330: Arginine and proline metabolism; ko00410: beta-Alanine metabolism; ko04080: Neuroactive ligand-receptor interaction; ko00310: Lysine degradation; map01064: Biosynthesis of alkaloids derived from ornithine, lysine and nicotinic acid; ko00052: Galactose metabolism; ko00511: Other glycan degradation; ko00531: Glycosaminoglycan degradation; ko00600: Sphingolipid metabolism; ko00604: Glycosphingolipid biosynthesis.