Literature DB >> 16483361

Instance-based concept learning from multiclass DNA microarray data.

Daniel Berrar1, Ian Bradbury, Werner Dubitzky.   

Abstract

BACKGROUND: Various statistical and machine learning methods have been successfully applied to the classification of DNA microarray data. Simple instance-based classifiers such as nearest neighbor (NN) approaches perform remarkably well in comparison to more complex models, and are currently experiencing a renaissance in the analysis of data sets from biology and biotechnology. While binary classification of microarray data has been extensively investigated, studies involving multiclass data are rare. The question remains open whether there exists a significant difference in performance between NN approaches and more complex multiclass methods. Comparative studies in this field commonly assess different models based on their classification accuracy only; however, this approach lacks the rigor needed to draw reliable conclusions and is inadequate for testing the null hypothesis of equal performance. Comparing novel classification models to existing approaches requires focusing on the significance of differences in performance.
RESULTS: We investigated the performance of instance-based classifiers, including a NN classifier able to assign a degree of class membership to each sample. This model alleviates a major problem of conventional instance-based learners, namely the lack of confidence values for predictions. The model translates the distances to the nearest neighbors into 'confidence scores'; the higher the confidence score, the closer is the considered instance to a pre-defined class. We applied the models to three real gene expression data sets and compared them with state-of-the-art methods for classifying microarray data of multiple classes, assessing performance using a statistical significance test that took into account the data resampling strategy. Simple NN classifiers performed as well as, or significantly better than, their more intricate competitors.
CONCLUSION: Given its highly intuitive underlying principles--simplicity, ease-of-use, and robustness--the k-NN classifier complemented by a suitable distance-weighting regime constitutes an excellent alternative to more complex models for multiclass microarray data sets. Instance-based classifiers using weighted distances are not limited to microarray data sets, but are likely to perform competitively in classifications of high-dimensional biological data sets such as those generated by high-throughput mass spectrometry.

Entities:  

Mesh:

Year:  2006        PMID: 16483361      PMCID: PMC1402330          DOI: 10.1186/1471-2105-7-73

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  20 in total

1.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method.

Authors:  L Li; C R Weinberg; T A Darden; L G Pedersen
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

2.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions.

Authors:  R L Somorjai; B Dolenko; R Baumgartner
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

3.  Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods.

Authors:  Musa H Asyali; Musa Alci
Journal:  Bioinformatics       Date:  2004-09-16       Impact factor: 6.937

4.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression.

Authors:  Tao Li; Chengliang Zhang; Mitsunori Ogihara
Journal:  Bioinformatics       Date:  2004-04-15       Impact factor: 6.937

5.  Systematic variation in gene expression patterns in human cancer cell lines.

Authors:  D T Ross; U Scherf; M B Eisen; C M Perou; C Rees; P Spellman; V Iyer; S S Jeffrey; M Van de Rijn; M Waltham; A Pergamenschikov; J C Lee; D Lashkari; D Shalon; T G Myers; J N Weinstein; D Botstein; P O Brown
Journal:  Nat Genet       Date:  2000-03       Impact factor: 38.330

6.  A gene expression database for the molecular pharmacology of cancer.

Authors:  U Scherf; D T Ross; M Waltham; L H Smith; J K Lee; L Tanabe; K W Kohn; W C Reinhold; T G Myers; D T Andrews; D A Scudiero; M B Eisen; E A Sausville; Y Pommier; D Botstein; P O Brown; J N Weinstein
Journal:  Nat Genet       Date:  2000-03       Impact factor: 38.330

7.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

8.  Multiclass cancer diagnosis using tumor gene expression signatures.

Authors:  S Ramaswamy; P Tamayo; R Rifkin; S Mukherjee; C H Yeang; M Angelo; C Ladd; M Reich; E Latulippe; J P Mesirov; T Poggio; W Gerald; M Loda; E S Lander; T R Golub
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-11       Impact factor: 11.205

9.  Recursive partitioning for tumor classification with gene expression microarray data.

Authors:  H Zhang; C Y Yu; B Singer; M Xiong
Journal:  Proc Natl Acad Sci U S A       Date:  2001-05-29       Impact factor: 11.205

10.  Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data.

Authors:  Junbai Wang; Trond Hellem Bø; Inge Jonassen; Ola Myklebost; Eivind Hovig
Journal:  BMC Bioinformatics       Date:  2003-12-02       Impact factor: 3.169

View more
  6 in total

1.  A hybrid BPSO-CGA approach for gene selection and classification of microarray data.

Authors:  Li-Yeh Chuang; Cheng-Huei Yang; Jung-Chike Li; Cheng-Hong Yang
Journal:  J Comput Biol       Date:  2011-01-06       Impact factor: 1.479

2.  Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

Authors:  Debasis Chakraborty; Ujjwal Maulik
Journal:  IEEE J Transl Eng Health Med       Date:  2014-12-02       Impact factor: 3.316

3.  Use of yeast chemigenomics and COXEN informatics in preclinical evaluation of anticancer agents.

Authors:  Steven C Smith; Dmytro M Havaleshko; Kihyuck Moon; Alexander S Baras; Jae Lee; Stefan Bekiranov; Daniel J Burke; Dan Theodorescu
Journal:  Neoplasia       Date:  2011-01       Impact factor: 5.715

4.  TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.

Authors:  Naryttza N Diaz; Lutz Krause; Alexander Goesmann; Karsten Niehaus; Tim W Nattkemper
Journal:  BMC Bioinformatics       Date:  2009-02-11       Impact factor: 3.169

5.  ANMM4CBR: a case-based reasoning method for gene expression data classification.

Authors:  Bangpeng Yao; Shao Li
Journal:  Algorithms Mol Biol       Date:  2010-01-06       Impact factor: 1.405

6.  A novel harmony search-K means hybrid algorithm for clustering gene expression data.

Authors:  Ka Abdul Nazeer; Mp Sebastian; Sd Madhu Kumar
Journal:  Bioinformation       Date:  2013-01-18
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.