Literature DB >> 22168285

Discovery of error-tolerant biclusters from noisy gene expression data.

Rohit Gupta1, Navneet Rao, Vipin Kumar.   

Abstract

BACKGROUND: An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together.
RESULTS: In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets.
CONCLUSIONS: The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

Entities:  

Mesh:

Year:  2011        PMID: 22168285      PMCID: PMC3247082          DOI: 10.1186/1471-2105-12-S12-S1

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  14 in total

1.  Functional discovery via a compendium of expression profiles.

Authors:  T R Hughes; M J Marton; A R Jones; C J Roberts; R Stoughton; C D Armour; H A Bennett; E Coffey; H Dai; Y D He; M J Kidd; A M King; M R Meyer; D Slade; P Y Lum; S B Stepaniants; D D Shoemaker; D Gachotte; K Chakraburtty; J Simon; M Bard; S H Friend
Journal:  Cell       Date:  2000-07-07       Impact factor: 41.582

2.  Biclustering of expression data.

Authors:  Y Cheng; G M Church
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

3.  Discovering local structure in gene expression data: the order-preserving submatrix problem.

Authors:  Amir Ben-Dor; Benny Chor; Richard Karp; Zohar Yakhini
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

4.  Mining gene expression databases for association rules.

Authors:  Chad Creighton; Samir Hanash
Journal:  Bioinformatics       Date:  2003-01       Impact factor: 6.937

5.  Mining Approximate Order Preserving Clusters in the Presence of Noise.

Authors:  Mengsheng Zhang; Wei Wang; Jinze Liu
Journal:  Proc Int Conf Data Eng       Date:  2008-04-07

6.  Biclustering algorithms for biological data analysis: a survey.

Authors:  Sara C Madeira; Arlindo L Oliveira
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2004 Jan-Mar       Impact factor: 3.710

7.  High confidence rule mining for microarray analysis.

Authors:  Tara McIntosh; Sanjay Chawla
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2007 Oct-Dec       Impact factor: 3.710

8.  Iterative signature algorithm for the analysis of large-scale gene expression data.

Authors:  Sven Bergmann; Jan Ihmels; Naama Barkai
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2003-03-11

9.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

10.  Immunohistochemical expression of integrins and extracellular matrix proteins in non-small cell lung cancer: correlation with lymph node metastasis.

Authors:  Ji Youn Han; Hong Sug Kim; Sug Hyung Lee; Won Sang Park; Jung Young Lee; Nam Jin Yoo
Journal:  Lung Cancer       Date:  2003-07       Impact factor: 5.705

View more
  5 in total

1.  Graph-based unsupervised feature selection and multiview clustering for microarray data.

Authors:  Tripti Swarnkar; Pabitra Mitra
Journal:  J Biosci       Date:  2015-10       Impact factor: 1.826

Review 2.  Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack.

Authors:  Gowtham Atluri; Kanchana Padmanabhan; Gang Fang; Michael Steinbach; Jeffrey R Petrella; Kelvin Lim; Angus Macdonald; Nagiza F Samatova; P Murali Doraiswamy; Vipin Kumar
Journal:  Neuroimage Clin       Date:  2013-08-07       Impact factor: 4.881

3.  Evaluation of Plaid Models in Biclustering of Gene Expression Data.

Authors:  Hamid Alavi Majd; Soodeh Shahsavari; Ahmad Reza Baghestani; Seyyed Mohammad Tabatabaei; Naghme Khadem Bashi; Mostafa Rezaei Tavirani; Mohsen Hamidpour
Journal:  Scientifica (Cairo)       Date:  2016-03-09

4.  A probabilistic coevolutionary biclustering algorithm for discovering coherent patterns in gene expression dataset.

Authors:  Je-Gun Joung; Soo-Jin Kim; Soo-Yong Shin; Byoung-Tak Zhang
Journal:  BMC Bioinformatics       Date:  2012-12-13       Impact factor: 3.169

5.  BicPAM: Pattern-based biclustering for biomedical data analysis.

Authors:  Rui Henriques; Sara C Madeira
Journal:  Algorithms Mol Biol       Date:  2014-12-16       Impact factor: 1.405

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.