Literature DB >> 24334380

Proximity measures for clustering gene expression microarray data: a validation methodology and a comparative analysis.

Pablo A Jaskowiak1, Ricardo J G B Campello1, Ivan G Costa2.   

Abstract

Cluster analysis is usually the first step adopted to unveil information from gene expression microarray data. Besides selecting a clustering algorithm, choosing an appropriate proximity measure (similarity or distance) is of great importance to achieve satisfactory clustering results. Nevertheless, up to date, there are no comprehensive guidelines concerning how to choose proximity measures for clustering microarray data. Pearson is the most used proximity measure, whereas characteristics of other ones remain unexplored. In this paper, we investigate the choice of proximity measures for the clustering of microarray data by evaluating the performance of 16 proximity measures in 52 data sets from time course and cancer experiments. Our results support that measures rarely employed in the gene expression literature can provide better results than commonly employed ones, such as Pearson, Spearman, and euclidean distance. Given that different measures stood out for time course and cancer data evaluations, their choice should be specific to each scenario. To evaluate measures on time-course data, we preprocessed and compiled 17 data sets from the microarray literature in a benchmark along with a new methodology, called Intrinsic Biological Separation Ability (IBSA). Both can be employed in future research to assess the effectiveness of new measures for gene time-course data.

Entities:  

Mesh:

Year:  2013        PMID: 24334380     DOI: 10.1109/TCBB.2013.9

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  8 in total

Review 1.  An overview of bioinformatics methods for modeling biological pathways in yeast.

Authors:  Jie Hou; Lipi Acharya; Dongxiao Zhu; Jianlin Cheng
Journal:  Brief Funct Genomics       Date:  2015-10-17       Impact factor: 4.241

Review 2.  Heterogeneous data integration methods for patient similarity networks.

Authors:  Jessica Gliozzo; Marco Mesiti; Marco Notaro; Alessandro Petrini; Alex Patak; Antonio Puertas-Gallardo; Alberto Paccanaro; Giorgio Valentini; Elena Casiraghi
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  On the selection of appropriate distances for gene expression data clustering.

Authors:  Pablo A Jaskowiak; Ricardo J G B Campello; Ivan G Costa
Journal:  BMC Bioinformatics       Date:  2014-01-24       Impact factor: 3.169

4.  A systematic comparative evaluation of biclustering techniques.

Authors:  Victor A Padilha; Ricardo J G B Campello
Journal:  BMC Bioinformatics       Date:  2017-01-23       Impact factor: 3.169

5.  The miRNome of canine invasive urothelial carcinoma.

Authors:  Mara S Varvil; Taylor Bailey; Deepika Dhawan; Deborah W Knapp; José A Ramos-Vara; Andrea P Dos Santos
Journal:  Front Vet Sci       Date:  2022-08-22

6.  Pathobiochemical signatures of cholestatic liver disease in bile duct ligated mice.

Authors:  Kerstin Abshagen; Matthias König; Andreas Hoppe; Isabell Müller; Matthias Ebert; Honglei Weng; Herrmann-Georg Holzhütter; Ulrich M Zanger; Johannes Bode; Brigitte Vollmar; Maria Thomas; Steven Dooley
Journal:  BMC Syst Biol       Date:  2015-11-20

7.  Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

Authors:  Juan A Nepomuceno; Alicia Troncoso; Isabel A Nepomuceno-Chamorro; Jesús S Aguilar-Ruiz
Journal:  BioData Min       Date:  2018-03-27       Impact factor: 2.522

8.  Identifying gene-specific subgroups: an alternative to biclustering.

Authors:  Vincent Branders; Pierre Schaus; Pierre Dupont
Journal:  BMC Bioinformatics       Date:  2019-12-03       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.