Literature DB >> 20224634

Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.

Youting Sun1, Ulisses Braga-Neto, Edward R Dougherty.   

Abstract

Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.

Year:  2010        PMID: 20224634      PMCID: PMC3171429          DOI: 10.1155/2009/504069

Source DB:  PubMed          Journal:  EURASIP J Bioinform Syst Biol        ISSN: 1687-4145


  28 in total

Review 1.  Statistical design and the analysis of gene expression microarray data.

Authors:  M K Kerr; G A Churchill
Journal:  Genet Res       Date:  2001-04       Impact factor: 1.588

2.  Analysis of variance for gene expression microarray data.

Authors:  M K Kerr; M Martin; G A Churchill
Journal:  J Comput Biol       Date:  2000       Impact factor: 1.479

3.  Binary analysis and optimization-based normalization of gene expression data.

Authors:  Ilya Shmulevich; Wei Zhang
Journal:  Bioinformatics       Date:  2002-04       Impact factor: 6.937

4.  DNA microarray data imputation and significance analysis of differential expression.

Authors:  Rebecka Jörnsten; Hui-Yu Wang; William J Welsh; Ming Ouyang
Journal:  Bioinformatics       Date:  2005-08-23       Impact factor: 6.937

5.  Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

Authors:  Dong Wang; Yingli Lv; Zheng Guo; Xia Li; Yanhui Li; Jing Zhu; Da Yang; Jianzhen Xu; Chenguang Wang; Shaoqi Rao; Baofeng Yang
Journal:  Bioinformatics       Date:  2006-06-29       Impact factor: 6.937

6.  How to improve postgenomic knowledge discovery using imputation.

Authors:  Muhammad Shoaib B Sehgal; Iqbal Gondal; Laurence S Dooley; Ross Coppel
Journal:  EURASIP J Bioinform Syst Biol       Date:  2009-02-08

7.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer.

Authors:  Jacques Lapointe; Chunde Li; John P Higgins; Matt van de Rijn; Eric Bair; Kelli Montgomery; Michelle Ferrari; Lars Egevad; Walter Rayford; Ulf Bergerheim; Peter Ekman; Angelo M DeMarzo; Robert Tibshirani; David Botstein; Patrick O Brown; James D Brooks; Jonathan R Pollack
Journal:  Proc Natl Acad Sci U S A       Date:  2004-01-07       Impact factor: 11.205

8.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

Authors:  Guy N Brock; John R Shaffer; Richard E Blakesley; Meredith J Lotz; George C Tseng
Journal:  BMC Bioinformatics       Date:  2008-01-10       Impact factor: 3.169

9.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations.

Authors:  Reija Autio; Sami Kilpinen; Matti Saarela; Olli Kallioniemi; Sampsa Hautaniemi; Jaakko Astola
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

10.  Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

Authors:  Alexandre G de Brevern; Serge Hazout; Alain Malpertuy
Journal:  BMC Bioinformatics       Date:  2004-08-23       Impact factor: 3.169

View more
  3 in total

1.  Matrix Completion Discriminant Analysis.

Authors:  Tong Tong Wu; Kenneth Lange
Journal:  Comput Stat Data Anal       Date:  2015-12       Impact factor: 1.681

2.  Impact of missing data imputation methods on gene expression clustering and classification.

Authors:  Marcilio C P de Souto; Pablo A Jaskowiak; Ivan G Costa
Journal:  BMC Bioinformatics       Date:  2015-02-26       Impact factor: 3.169

3.  Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.

Authors:  Yuanting Yan; Tao Dai; Meili Yang; Xiuquan Du; Yiwen Zhang; Yanping Zhang
Journal:  Int J Mol Sci       Date:  2018-10-30       Impact factor: 5.923

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.