Literature DB >> 18084642

Mfuzz: a software package for soft clustering of microarray data.

Abstract

UNLABELLED: For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. AVAILABILITY: The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license.

Entities: Chemical Species

Keywords: gene expression; soft clustering; software

Year: 2007 PMID： 18084642 PMCID： PMC2139991 DOI： 10.6026/97320630002005

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Clustering methods are popular tools in data analysis. They can be used to reveal hidden-patterns (clusters of objects in large complex data sets). Most clustering methods assign one object to exactly one cluster. [1] While this so-called hard clustering approach is suitable for a variety of applications, it may be insufficient for microarray data analysis. Here, the detected clusters of co-expressed genes indicate co-regulation. However, genes are frequently not regulated in a simple ‘on’ ‐ ‘off’ manner, but instead their expression levels are tightly regulated by a number of fine-tuned transcriptional mechanisms. This is reflected in expression data sets generated in microarray experiments. It is a common observation that many genes show expression profiles similar to several cluster patterns. [2,3] Ideally, clustering methods for microarray analysis should be capable of dealing with this complexity in an adequate manner. They should not only differentiate how closely a gene follows the main expression pattern of a cluster, but they should also be capable to assign genes to several clusters if their expression patterns are similar. Soft clustering can provide these favourable capacities. Recently we have shown that applying soft clustering to microarray data analysis leads to i) more adequate clusters with information-rich structures, ii) increased noise-robustness and iii) and improved identification of regulatory sequence motifs. [4]

Methodology

Soft clustering has been implemented using the fuzzy c-means algorithm. [5] It is based on the iterative optimization of an objective function to minimize the variation of objects within clusters. Poorly clustered objects have decreased influence on the resulting clusters making the clustering process less sensitive to noise. Notably this is a valuable characteristic of fuzzy c-means method as microarray data tends to be inherently noisy. As a result, fuzzy c-means produces gradual membership values µ of a gene i between 0 and 1 indicating the degree of membership of this gene for cluster j. This strongly contrasts hard clustering e.g. the commonly used k-means clustering that generates only membership values µ of either 0 or 1. Thus, soft clustering can effectively reflect the strength of a gene's association with a cluster. Obtaining gradual membership values allows the definition of cluster cores of tightly co-expressed genes. Moreover, as soft clustering displays more noise robustness, the commonly used procedure of filtering genes to reduce noise in microarray data can be avoided and loss of the potentially important information can be prevented. [4]

Software input

Like most other clustering software, the Mfuzz package requires as input the data to be clustered and the setting of clustering parameters. Microarray expression data can be entered either as simple table or as Bioconductor (i.e. exprSet) object. Whereas the table format is an easy and sufficient way to handle data for most experiments, Bioconductor data objects can be used for more complex experimental designs. [6] The format for tables is the same as for the standard clustering software Cluster [7], so that users can easily use both software packages without reformatting their input. Further, the number of clusters and the so-called fuzzification parameter m have to be chosen. By variation of both parameters, users can probe the stability of obtained clusters as well as the global clustering structure [4]

Software output

As basic output, the partition matrix is supplied containing the complete set of membership values. This information can be used to define cluster cores consisting of highly correlated genes and to improve the subsequent detection of regulatory mechanism. [4] Results of the cluster analysis can be either further processed within the Bioconductor framework or stored in simple table format. Several functions serve the visualization of the results such as internal or global cluster structures. Figure 1 shows some examples of the graphical output.

Figure 1

A) Examples for visualization of clustering results produced by Mfuzz. Graphs show the temporal expression pattern during the yeast cell cycle (top and lower panels) and the global clustering structure (central panels). Membership values are color-encoded with red shades denoting high membership values and green shades denoting low membership values of genes. Using this color scheme, clusters with a large core of tightly co-regulated genes (e.g. cluster 7) can be easily distinguished from week or noisy clusters (e.g. cluster16). The central panel shows the principal components of the clusters. Lines between clusters indicate their overlap i.e. how many genes they share. B) Graphical user interface implemented in the Mfuzzgui package. Its outline follows the standard steps of cluster analyses of microarray data: Data loading and pre-processing, clustering, examination and visualization of results

Note that Mfuzz is not restricted to microarray data analysis, but has recently also successfully applied to examine protein phosphorrylation time series. [8]

Caveat & Future development

Mfuzz and Mfuzzgui are R packages. R is a statistical programming language and is freely available open-software. [9] Both developed packages follow conventions of the Bioconductor platform. [6] The graphical user interface implemented in Mfuzzgui demands an existing installation of Tcl/Tk. For convenience, we supply scripts for automatic installation of the software packages. Additionally, scripts are provided for a direct start of the packages enhancing their stand-alone character. Future versions will include extended export options such as automatically generated HTML pages reporting the results of the clustering analysis.

4 in total

1. Noise-robust soft clustering of gene expression time-course data.

Authors: Matthias E Futschik; Bronwyn Carlisle
Journal: J Bioinform Comput Biol Date: 2005-08 Impact factor: 1.122

2. The transcriptional program of sporulation in budding yeast.

Authors: S Chu; J DeRisi; M Eisen; J Mulholland; D Botstein; P O Brown; I Herskowitz
Journal: Science Date: 1998-10-23 Impact factor: 47.728

3. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.

Authors: Jesper V Olsen; Blagoy Blagoev; Florian Gnad; Boris Macek; Chanchal Kumar; Peter Mortensen; Matthias Mann
Journal: Cell Date: 2006-11-03 Impact factor: 41.582

4. A genome-wide transcriptional analysis of the mitotic cell cycle.

Authors: R J Cho; M J Campbell; E A Winzeler; L Steinmetz; A Conway; L Wodicka; T G Wolfsberg; A E Gabrielian; D Landsman; D J Lockhart; R W Davis
Journal: Mol Cell Date: 1998-07 Impact factor: 17.970

4 in total

329 in total

1. clusterProfiler: an R package for comparing biological themes among gene clusters.

Authors: Guangchuang Yu; Li-Gen Wang; Yanyan Han; Qing-Yu He
Journal: OMICS Date: 2012-03-28

2. Dynamics of the skeletal muscle secretome during myoblast differentiation.

Authors: Jeanette Henningsen; Kristoffer T G Rigbolt; Blagoy Blagoev; Bente Klarlund Pedersen; Irina Kratchmarova
Journal: Mol Cell Proteomics Date: 2010-07-14 Impact factor: 5.911

3. Dynamic architecture and regulatory implications of the miRNA network underlying the response to stress in melon.

Authors: Alejandro Sanz-Carbonell; Maria Carmen Marques; German Martinez; Gustavo Gomez
Journal: RNA Biol Date: 2019-12-10 Impact factor: 4.652

4. Interacting Network of the Gap Junction (GJ) Protein Connexin43 (Cx43) is Modulated by Ischemia and Reperfusion in the Heart.

Authors: Tania Martins-Marques; Sandra Isabel Anjo; Paulo Pereira; Bruno Manadas; Henrique Girão
Journal: Mol Cell Proteomics Date: 2015-08-27 Impact factor: 5.911

5. Epigenetic Compensation Promotes Liver Regeneration.

Authors: Shuang Wang; Chi Zhang; Dan Hasson; Anal Desai; Sucharita SenBanerjee; Elena Magnani; Chinweike Ukomadu; Amaia Lujambio; Emily Bernstein; Kirsten C Sadler
Journal: Dev Cell Date: 2019-06-20 Impact factor: 12.270

6. Metabolic Cross-talk Between Human Bronchial Epithelial Cells and Internalized Staphylococcus aureus as a Driver for Infection.

Authors: Laura M Palma Medina; Ann-Kristin Becker; Stephan Michalik; Harita Yedavally; Elisa J M Raineri; Petra Hildebrandt; Manuela Gesell Salazar; Kristin Surmann; Henrike Pförtner; Solomon A Mekonnen; Anna Salvati; Lars Kaderali; Jan Maarten van Dijl; Uwe Völker
Journal: Mol Cell Proteomics Date: 2019-02-26 Impact factor: 5.911

7. RNA element discovery from germ cell to blastocyst.

Authors: Molly S Estill; Russ Hauser; Stephen A Krawetz
Journal: Nucleic Acids Res Date: 2019-03-18 Impact factor: 16.971

8. Phosphoproteome and drug-response effects mediated by the three protein phosphatase 2A inhibitor proteins CIP2A, SET, and PME-1.

Authors: Otto Kauko; Susumu Y Imanishi; Evgeny Kulesskiy; Laxman Yetukuri; Teemu Daniel Laajala; Mukund Sharma; Karolina Pavic; Anna Aakula; Christian Rupp; Mikael Jumppanen; Pekka Haapaniemi; Luyao Ruan; Bhagwan Yadav; Veronika Suni; Taru Varila; Garry L Corthals; Jüri Reimand; Krister Wennerberg; Tero Aittokallio; Jukka Westermarck
Journal: J Biol Chem Date: 2020-02-18 Impact factor: 5.157

9. The R-Loop Atlas of Arabidopsis Development and Responses to Environmental Stimuli.

Authors: Wei Xu; Kuan Li; Shuai Li; Quancan Hou; Yushun Zhang; Kunpeng Liu; Qianwen Sun
Journal: Plant Cell Date: 2020-02-19 Impact factor: 11.277

10. Time-resolved analysis of the matrix metalloproteinase 10 substrate degradome.

Authors: Pascal Schlage; Fabian E Egli; Paolo Nanni; Lauren W Wang; Jayachandran N Kizhakkedathu; Suneel S Apte; Ulrich auf dem Keller
Journal: Mol Cell Proteomics Date: 2013-11-26 Impact factor: 5.911