Literature DB >> 17267438

An ensemble approach to microarray data-based gene prioritization after missing value imputation.

Dong Hua1, Yinglei Lai.   

Abstract

MOTIVATION: Microarrays have been widely used to discover novel disease related genes. Some types of microarray, such as cDNA arrays, usually contain a considerable portion of missing values. When missing value imputation and gene prioritization are sequentially conducted, it is necessary to consider the distribution space of prioritization scores due to the existence of missing values. We propose an ensemble approach to address this issue. A bootstrap procedure enables us to generate a resample multivariate distribution of the prioritization scores and then to obtain the expected prioritization scores.
RESULTS: We used a published microarray two-sample data set to illustrate our approach. We focused on the following issues after missing value imputation: (i) concordance of gene prioritization and (ii) control of true and false positives. We compared our approach with the traditional non-ensemble approach to missing value imputation. We also evaluated the performance of non-imputation approach when the theoretical test distribution was available. The results showed that the ensemble imputation approach provided clearly improved performances in the concordance of gene prioritization and the control of true/false positives, especially when sample sizes were about 5-10 per group and missing rates were about 10-20%, which was a common situation for cDNA microarray studies. AVAILABILITY: The Matlab codes are freely available at http://home.gwu.edu/~ylai/research/Missing.

Mesh:

Year:  2007        PMID: 17267438     DOI: 10.1093/bioinformatics/btm010

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  2 in total

1.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Authors:  Magalie Celton; Alain Malpertuy; Gaëlle Lelandais; Alexandre G de Brevern
Journal:  BMC Genomics       Date:  2010-01-07       Impact factor: 3.969

2.  ProtQuant: a tool for the label-free quantification of MudPIT proteomics data.

Authors:  Susan M Bridges; G Bryce Magee; Nan Wang; W Paul Williams; Shane C Burgess; Bindu Nanduri
Journal:  BMC Bioinformatics       Date:  2007-11-01       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.