| Literature DB >> 18577208 |
Rob Jelier1, Peter A C 't Hoen, Ellen Sterrenburg, Johan T den Dunnen, Gert-Jan B van Ommen, Jan A Kors, Barend Mons.
Abstract
BACKGROUND: Comparative analysis of expression microarray studies is difficult due to the large influence of technical factors on experimental outcome. Still, the identified differentially expressed genes may hint at the same biological processes. However, manually curated assignment of genes to biological processes, such as pursued by the Gene Ontology (GO) consortium, is incomplete and limited. We hypothesised that automatic association of genes with biological processes through thesaurus-controlled mining of Medline abstracts would be more effective. Therefore, we developed a novel algorithm (LAMA: Literature-Aided Meta-Analysis) to quantify the similarity between transcriptomics studies. We evaluated our algorithm on a large compendium of 102 microarray studies published in the field of muscle development and disease, and compared it to similarity measures based on gene overlap and over-representation of biological processes assigned by GO.Entities:
Mesh:
Year: 2008 PMID: 18577208 PMCID: PMC2459190 DOI: 10.1186/1471-2105-9-291
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distribution of the number of microarray studies in which a gene was found differentially expressed. A total of 102 studies was included with 8282 unique differentially expressed genes.
Figure 2Performance for reproduction of the manual grouping by kappa (circles) and LAMA (squares). A star indicates a statistically significant difference according to the Wilcoxon ranks test at the 0.05 level.
Figure 3Kappa-based hierarchical clustering and heatmap. The dotted pink line indicates the used clustering cutoff and the identified clusters are indicated in addition to relevant subclusters. The dataset ids are shown between the tree and the heatmap. The colored bars provide background information on the datasets.
Figure 4LAMA-based hierarchical clustering and heatmap. The dotted pink line indicates the used clustering cutoff and identified clusters are indicated in addition to relevat subclusters. The dataset ids are shown between the tree and the heatmap. The colored bars provide background information on the datasets.
Characterizing concepts for clusters identified through LASSO analysis.
| Cluster | Subcluster | Characteristic | Biological Concepts (Up) | Biological Concepts (Down) |
| 1 | 1A | Atrophy – PABPN1 overexpression | Amino acyl tRNA synthetases, spermidine, polyamines, spermine, eukaryotic initiation factors | Platelet-derived growth factor, transforming growth factor-beta, insulin-like growth factor binding proteins |
| 1 | 1B | EOM-specific | Adipocytes, acyl CoA dehydrogenase | Cyclins, keratin, cyclin-dependent kinases |
| 2 | 2A | Dystrophin defiency in EOM muscle | Troponin | - |
| 2 | 2B | Myositis | Chemokine, chemokine receptor | - |
| 2 | 2C | Regeneration | T-lymphocyte, phosphotransferases, phosphorylation, mitogen-activated protein kinases, integrins, cell cycle | Cullin proteins, mitogen-activated protein kinases, ligase |
| 2 | 2D | Differentiation | Troponin, tropomyosin, nemaline myopathies, sarcomeres, myosin heavy chain, calsequestrin | Inhibitor of differentiation proteins, E2F transcription factors, proteoglycan, cell cycle proteins |
| 2 | 2E | Ky-mutant/diverse | Leptin, desaturase, myosin heavy chains, neural cell adhesion molecules | Mitogen-activated protein kinases |
Concepts are shown separately for the down and up regulated gene lists. The column "Characteristic" gives a description of the studied phenomena in the cluster.