MOTIVATION: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. RESULTS: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.
MOTIVATION: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. RESULTS: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.
Authors: Fiona H Blackhall; Melania Pintilie; Dennis A Wigle; Igor Jurisica; Ni Liu; Nikolina Radulovich; Michael R Johnston; Shaf Keshavjee; Ming-Sound Tsao Journal: Neoplasia Date: 2004 Nov-Dec Impact factor: 5.715
Authors: Sofian Berrouiguet; Mercedes M Perez-Rodriguez; Mark Larsen; Enrique Baca-García; Philippe Courtet; Maria Oquendo Journal: J Med Internet Res Date: 2018-01-03 Impact factor: 5.428
Authors: William L Noderer; Ross J Flockhart; Aparna Bhaduri; Alexander J Diaz de Arce; Jiajing Zhang; Paul A Khavari; Clifford L Wang Journal: Mol Syst Biol Date: 2014-08-28 Impact factor: 11.429