Literature DB >> 11395428

Missing value estimation methods for DNA microarrays.

O Troyanskaya1, M Cantor, G Sherlock, P Brown, T Hastie, R Tibshirani, D Botstein, R B Altman.   

Abstract

MOTIVATION: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.
RESULTS: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

Entities:  

Mesh:

Year:  2001        PMID: 11395428     DOI: 10.1093/bioinformatics/17.6.520

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  853 in total

Review 1.  Microarray data quality analysis: lessons from the AFGC project. Arabidopsis Functional Genomics Consortium.

Authors:  David Finkelstein; Rob Ewing; Jeremy Gollub; Fredrik Sterky; J Michael Cherry; Shauna Somerville
Journal:  Plant Mol Biol       Date:  2002-01       Impact factor: 4.076

2.  Analysis of DNA microarrays using algorithms that employ rule-based expert knowledge.

Authors:  Kuang-Hung Pan; Chih-Jian Lih; Stanley N Cohen
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-19       Impact factor: 11.205

3.  A common reference for cDNA microarray hybridizations.

Authors:  Ellen Sterrenburg; Rolf Turk; Judith M Boer; Gertjan B van Ommen; Johan T den Dunnen
Journal:  Nucleic Acids Res       Date:  2002-11-01       Impact factor: 16.971

4.  MetSign: a computational platform for high-resolution mass spectrometry-based metabolomics.

Authors:  Xiaoli Wei; Wenlong Sun; Xue Shi; Imhoi Koo; Bing Wang; Jun Zhang; Xinmin Yin; Yunan Tang; Bogdan Bogdanov; Seongho Kim; Zhanxiang Zhou; Craig McClain; Xiang Zhang
Journal:  Anal Chem       Date:  2011-09-20       Impact factor: 6.986

5.  Variation in gene expression patterns in follicular lymphoma and the response to rituximab.

Authors:  Sean P Bohen; Olga G Troyanskaya; Orly Alter; Roger Warnke; David Botstein; Patrick O Brown; Ronald Levy
Journal:  Proc Natl Acad Sci U S A       Date:  2003-02-05       Impact factor: 11.205

6.  Development and validation of an MRI-based model to predict response to chemoradiotherapy for rectal cancer.

Authors:  Philippe Bulens; Alice Couwenberg; Karin Haustermans; Annelies Debucquoy; Vincent Vandecaveye; Marielle Philippens; Mu Zhou; Olivier Gevaert; Martijn Intven
Journal:  Radiother Oncol       Date:  2018-01-31       Impact factor: 6.280

7.  ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies.

Authors:  Jing Tang; Jianbo Fu; Yunxia Wang; Bo Li; Yinghong Li; Qingxia Yang; Xuejiao Cui; Jiajun Hong; Xiaofeng Li; Yuzong Chen; Weiwei Xue; Feng Zhu
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

Review 8.  A structured approach to predictive modeling of a two-class problem using multidimensional data sets.

Authors:  Heidi Spratt; Hyunsu Ju; Allan R Brasier
Journal:  Methods       Date:  2013-01-12       Impact factor: 3.608

9.  Empirical evaluation of consistency and accuracy of methods to detect differentially expressed genes based on microarray data.

Authors:  Dake Yang; Rudolph S Parrish; Guy N Brock
Journal:  Comput Biol Med       Date:  2013-12-13       Impact factor: 4.589

10.  A T cell gene expression panel for the diagnosis and monitoring of disease activity in patients with systemic lupus erythematosus.

Authors:  Alexandros P Grammatikos; Vasileios C Kyttaris; Katalin Kis-Toth; Lisa M Fitzgerald; Amy Devlin; Michele D Finnell; George C Tsokos
Journal:  Clin Immunol       Date:  2013-12-16       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.