Literature DB >> 15231531

Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.

Nathalie Pochet1, Frank De Smet, Johan A K Suykens, Bart L R De Moor.   

Abstract

MOTIVATION: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g. in oncology. The aim of this paper is to systematically benchmark the role of non-linear versus linear techniques and dimensionality reduction methods.
RESULTS: A systematic benchmarking study is performed by comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non-linear kernel functions with a radial basis function (RBF) kernel. A total of 9 binary cancer classification problems, derived from 7 publicly available microarray datasets, and 20 randomizations of each problem are examined.
CONCLUSIONS: Three main conclusions can be formulated based on the performances on independent test sets. (1) When performing classification with least squares support vector machines (LS-SVMs) (without dimensionality reduction), RBF kernels can be used without risking too much overfitting. The results obtained with well-tuned RBF kernels are never worse and sometimes even statistically significantly better compared to results obtained with a linear kernel in terms of test set receiver operating characteristic and test set accuracy performances. (2) Even for classification with linear classifiers like LS-SVM with linear kernel, using regularization is very important. (3) When performing kernel principal component analysis (kernel PCA) before classification, using an RBF kernel for kernel PCA tends to result in overfitting, especially when using supervised feature selection. It has been observed that an optimal selection of a large number of features is often an indication for overfitting. Kernel PCA with linear kernel gives better results.

Entities:  

Mesh:

Year:  2004        PMID: 15231531     DOI: 10.1093/bioinformatics/bth383

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  35 in total

Review 1.  Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments.

Authors:  Yulan Liang; Arpad Kelemen
Journal:  Funct Integr Genomics       Date:  2005-11-15       Impact factor: 3.410

2.  Differential and trajectory methods for time course gene expression data.

Authors:  Yulan Liang; Bamidele Tayo; Xueya Cai; Arpad Kelemen
Journal:  Bioinformatics       Date:  2005-05-10       Impact factor: 6.937

Review 3.  Classification algorithms for phenotype prediction in genomics and proteomics.

Authors:  Habtom W Ressom; Rency S Varghese; Zhen Zhang; Jianhua Xuan; Robert Clarke
Journal:  Front Biosci       Date:  2008-01-01

4.  Use of near-infrared spectroscopy and least-squares support vector machine to determine quality change of tomato juice.

Authors:  Li-juan Xie; Yi-bin Ying
Journal:  J Zhejiang Univ Sci B       Date:  2009-06       Impact factor: 3.066

5.  Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

Authors:  Samiran Ghosh; Yazhen Wang
Journal:  Stat Anal Data Min       Date:  2015-01-26       Impact factor: 1.051

6.  Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

Authors:  Debasis Chakraborty; Ujjwal Maulik
Journal:  IEEE J Transl Eng Health Med       Date:  2014-12-02       Impact factor: 3.316

7.  Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries.

Authors:  Pankaj Kumar; Xiaohua Ma; Xianghui Liu; Jia Jia; Han Bucong; Ying Xue; Ze Rong Li; Sheng Yong Yang; Yu Quan Wei; Yu Zong Chen
Journal:  J Comput Aided Mol Des       Date:  2011-05-10       Impact factor: 3.686

8.  Improved microarray-based decision support with graph encoded interactome data.

Authors:  Anneleen Daemen; Marco Signoretto; Olivier Gevaert; Johan A K Suykens; Bart De Moor
Journal:  PLoS One       Date:  2010-04-19       Impact factor: 3.240

9.  Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.

Authors:  Christoph Bartenhagen; Hans-Ulrich Klein; Christian Ruckert; Xiaoyi Jiang; Martin Dugas
Journal:  BMC Bioinformatics       Date:  2010-11-18       Impact factor: 3.169

10.  Modeling Laterality of the Globus Pallidus Internus in Patients With Parkinson's Disease.

Authors:  Justin Sharim; Daniel Yazdi; Amy Baohan; Eric Behnke; Nader Pouratian
Journal:  Neuromodulation       Date:  2016-07-28
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.