Literature DB >> 14978222

LSimpute: accurate estimation of missing values in microarray data with least squares methods.

Trond Hellem Bø1, Bjarte Dysvik, Inge Jonassen.   

Abstract

Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.

Mesh:

Year:  2004        PMID: 14978222      PMCID: PMC374359          DOI: 10.1093/nar/gnh026

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  14 in total

1.  Missing value estimation methods for DNA microarrays.

Authors:  O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal:  Bioinformatics       Date:  2001-06       Impact factor: 6.937

2.  Singular value decomposition for genome-wide expression data processing and modeling.

Authors:  O Alter; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2000-08-29       Impact factor: 11.205

Review 3.  Gene expression data analysis.

Authors:  A Brazma; J Vilo
Journal:  FEBS Lett       Date:  2000-08-25       Impact factor: 4.124

4.  Judging the quality of gene expression-based clustering methods using gene annotation.

Authors:  Francis D Gibbons; Frederick P Roth
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

5.  Genomic expression programs in the response of yeast cells to environmental changes.

Authors:  A P Gasch; P T Spellman; C M Kao; O Carmel-Harel; M B Eisen; G Storz; D Botstein; P O Brown
Journal:  Mol Biol Cell       Date:  2000-12       Impact factor: 4.138

6.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Authors:  U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  1999-06-08       Impact factor: 11.205

7.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

Authors:  A A Alizadeh; M B Eisen; R E Davis; C Ma; I S Lossos; A Rosenwald; J C Boldrick; H Sabet; T Tran; X Yu; J I Powell; L Yang; G E Marti; T Moore; J Hudson; L Lu; D B Lewis; R Tibshirani; G Sherlock; W C Chan; T C Greiner; D D Weisenburger; J O Armitage; R Warnke; R Levy; W Wilson; M R Grever; J C Byrd; D Botstein; P O Brown; L M Staudt
Journal:  Nature       Date:  2000-02-03       Impact factor: 49.962

8.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

9.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

10.  A gene-expression program reflecting the innate immune response of cultured intestinal epithelial cells to infection by Listeria monocytogenes.

Authors:  David N Baldwin; Veena Vanchinathan; Patrick O Brown; Julie A Theriot
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

View more
  85 in total

Review 1.  Overcoming key technological challenges in using mass spectrometry for mapping cell surfaces in tissues.

Authors:  Noelle M Griffin; Jan E Schnitzer
Journal:  Mol Cell Proteomics       Date:  2010-06-14       Impact factor: 5.911

2.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

Authors:  Sunghee Oh; Dongwan D Kang; Guy N Brock; George C Tseng
Journal:  Bioinformatics       Date:  2010-11-02       Impact factor: 6.937

3.  Gene expression profiling of Atlantic cod (Gadus morhua) embryogenesis using microarray.

Authors:  Øyvind Drivenes; Geir Lasse Taranger; Rolf B Edvardsen
Journal:  Mar Biotechnol (NY)       Date:  2011-07-21       Impact factor: 3.619

4.  Platelet factor 4 is a biomarker for lymphatic-promoted disorders.

Authors:  Wanshu Ma; Hyea Jin Gil; Noelia Escobedo; Alberto Benito-Martín; Pilar Ximénez-Embún; Javier Muñoz; Héctor Peinado; Stanley G Rockson; Guillermo Oliver
Journal:  JCI Insight       Date:  2020-07-09

Review 5.  Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments.

Authors:  Yulan Liang; Arpad Kelemen
Journal:  Funct Integr Genomics       Date:  2005-11-15       Impact factor: 3.410

6.  Gene expression patterns related to vascular invasion and aggressive features in endometrial cancer.

Authors:  Monica Mannelqvist; Ingunn M Stefansson; Geir Bredholt; Trond Hellem Bø; Anne M Oyan; Inge Jonassen; Karl-Henning Kalland; Helga B Salvesen; Lars A Akslen
Journal:  Am J Pathol       Date:  2011-02       Impact factor: 4.307

7.  How to improve postgenomic knowledge discovery using imputation.

Authors:  Muhammad Shoaib B Sehgal; Iqbal Gondal; Laurence S Dooley; Ross Coppel
Journal:  EURASIP J Bioinform Syst Biol       Date:  2009-02-08

8.  A computational strategy to analyze label-free temporal bottom-up proteomics data.

Authors:  Xiuxia Du; Stephen J Callister; Nathan P Manes; Joshua N Adkins; Roxana A Alexandridis; Xiaohua Zeng; Jung Hyeob Roh; William E Smith; Timothy J Donohue; Samuel Kaplan; Richard D Smith; Mary S Lipton
Journal:  J Proteome Res       Date:  2008-04-29       Impact factor: 4.466

9.  Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.

Authors:  Youting Sun; Ulisses Braga-Neto; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2010-03-02

Review 10.  Gene set enrichment analysis: performance evaluation and usage guidelines.

Authors:  Jui-Hung Hung; Tun-Hsiang Yang; Zhenjun Hu; Zhiping Weng; Charles DeLisi
Journal:  Brief Bioinform       Date:  2011-09-07       Impact factor: 11.622

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.