Literature DB >> 18186917

Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

Guy N Brock1, John R Shaffer, Richard E Blakesley, Meredith J Lotz, George C Tseng.   

Abstract

BACKGROUND: Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set.
RESULTS: We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost.
CONCLUSION: Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm.

Entities:  

Mesh:

Year:  2008        PMID: 18186917      PMCID: PMC2253514          DOI: 10.1186/1471-2105-9-12

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  25 in total

1.  Widespread aneuploidy revealed by DNA microarray expression profiling.

Authors:  T R Hughes; C J Roberts; H Dai; A R Jones; M R Meyer; D Slade; J Burchard; S Dow; T R Ward; M J Kidd; S H Friend; M J Marton
Journal:  Nat Genet       Date:  2000-07       Impact factor: 38.330

2.  Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

Authors:  Muhammad Shoaib B Sehgal; Iqbal Gondal; Laurence S Dooley
Journal:  Bioinformatics       Date:  2005-02-24       Impact factor: 6.937

3.  The influence of missing value imputation on detection of differentially expressed genes from microarray data.

Authors:  Ida Scheel; Magne Aldrin; Ingrid K Glad; Ragnhild Sørum; Heidi Lyng; Arnoldo Frigessi
Journal:  Bioinformatics       Date:  2005-10-10       Impact factor: 6.937

4.  DNA microarray data imputation and significance analysis of differential expression.

Authors:  Rebecka Jörnsten; Hui-Yu Wang; William J Welsh; Ming Ouyang
Journal:  Bioinformatics       Date:  2005-08-23       Impact factor: 6.937

5.  Prediction of missing values in microarray and use of mixed models to evaluate the predictors.

Authors:  Guri Feten; Trygve Almøy; Are H Aastveit
Journal:  Stat Appl Genet Mol Biol       Date:  2005-05-05

6.  Genomic expression programs in the response of yeast cells to environmental changes.

Authors:  A P Gasch; P T Spellman; C M Kao; O Carmel-Harel; M B Eisen; G Storz; D Botstein; P O Brown
Journal:  Mol Biol Cell       Date:  2000-12       Impact factor: 4.138

7.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Authors:  U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  1999-06-08       Impact factor: 11.205

8.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.

Authors:  P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher
Journal:  Mol Biol Cell       Date:  1998-12       Impact factor: 4.138

9.  Improving missing value imputation of microarray data by using spot quality weights.

Authors:  Peter Johansson; Jari Häkkinen
Journal:  BMC Bioinformatics       Date:  2006-06-16       Impact factor: 3.169

10.  Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

Authors:  Alexandre G de Brevern; Serge Hazout; Alain Malpertuy
Journal:  BMC Bioinformatics       Date:  2004-08-23       Impact factor: 3.169

View more
  31 in total

1.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

Authors:  Sunghee Oh; Dongwan D Kang; Guy N Brock; George C Tseng
Journal:  Bioinformatics       Date:  2010-11-02       Impact factor: 6.937

2.  Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.

Authors:  Youting Sun; Ulisses Braga-Neto; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2010-03-02

Review 3.  Gene set enrichment analysis: performance evaluation and usage guidelines.

Authors:  Jui-Hung Hung; Tun-Hsiang Yang; Zhenjun Hu; Zhiping Weng; Charles DeLisi
Journal:  Brief Bioinform       Date:  2011-09-07       Impact factor: 11.622

4.  Shrinkage regression-based methods for microarray missing value imputation.

Authors:  Hsiuying Wang; Chia-Chun Chiu; Yi-Ching Wu; Wei-Sheng Wu
Journal:  BMC Syst Biol       Date:  2013-12-13

5.  Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion.

Authors:  Marinka Žitnik; Blaž Zupan
Journal:  J Comput Biol       Date:  2015-02-06       Impact factor: 1.479

6.  The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data.

Authors:  Loucif Benahmed; Larbi Houichi
Journal:  Environ Monit Assess       Date:  2018-09-04       Impact factor: 2.513

7.  Missing value imputation for epistatic MAPs.

Authors:  Colm Ryan; Derek Greene; Gerard Cagney; Pádraig Cunningham
Journal:  BMC Bioinformatics       Date:  2010-04-20       Impact factor: 3.169

8.  Discovering conditional co-regulated protein complexes by integrating diverse data sources.

Authors:  Fei Luo; Juan Liu; Jinyan Li
Journal:  BMC Syst Biol       Date:  2010-09-13

9.  Analyzing miRNA co-expression networks to explore TF-miRNA regulation.

Authors:  Sanghamitra Bandyopadhyay; Malay Bhattacharyya
Journal:  BMC Bioinformatics       Date:  2009-05-28       Impact factor: 3.169

10.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Authors:  Magalie Celton; Alain Malpertuy; Gaëlle Lelandais; Alexandre G de Brevern
Journal:  BMC Genomics       Date:  2010-01-07       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.