Literature DB >> 33849444

An efficient ensemble method for missing value imputation in microarray gene expression data.

Xinshan Zhu1,2, Jiayu Wang1, Biao Sun3, Chao Ren1, Ting Yang1, Jie Ding4.   

Abstract

BACKGROUND: The genomics data analysis has been widely used to study disease genes and drug targets. However, the existence of missing values in genomics datasets poses a significant problem, which severely hinders the use of genomics data. Current imputation methods based on a single learner often explores less known genomic data information for imputation and thus causes the imputation performance loss.
RESULTS: In this study, multiple single imputation methods are combined into an imputation method by ensemble learning. In the ensemble method, the bootstrap sampling is applied for predictions of missing values by each component method, and these predictions are weighted and summed to produce the final prediction. The optimal weights are learned from known gene data in the sense of minimizing a cost function about the imputation error. And the expression of the optimal weights is derived in closed form. Additionally, the performance of the ensemble method is analytically investigated, in terms of the sum of squared regression errors. The proposed method is simulated on several typical genomic datasets and compared with the state-of-the-art imputation methods at different noise levels, sample sizes and data missing rates. Experimental results show that the proposed method achieves the improved imputation performance in terms of the imputation accuracy, robustness and generalization.
CONCLUSION: The ensemble method possesses the superior imputation performance since it can make use of known data information more efficiently for missing data imputation by integrating diverse imputation methods and learning the integration weights in a data-driven way.

Entities:  

Keywords:  Bootstrap sampling; Ensemble learning; Gene expression data; Generalization; Imputation

Year:  2021        PMID: 33849444     DOI: 10.1186/s12859-021-04109-4

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  18 in total

1.  High-throughput methods for detection of genetic variation.

Authors:  V N Kristensen; D Kelefiotis; T Kristensen; A L Børresen-Dale
Journal:  Biotechniques       Date:  2001-02       Impact factor: 1.993

2.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Authors:  Margaret A Shipp; Ken N Ross; Pablo Tamayo; Andrew P Weng; Jeffery L Kutok; Ricardo C T Aguiar; Michelle Gaasenbeek; Michael Angelo; Michael Reich; Geraldine S Pinkus; Tane S Ray; Margaret A Koval; Kim W Last; Andrew Norton; T Andrew Lister; Jill Mesirov; Donna S Neuberg; Eric S Lander; Jon C Aster; Todd R Golub
Journal:  Nat Med       Date:  2002-01       Impact factor: 53.440

3.  Missing value estimation methods for DNA microarrays.

Authors:  O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal:  Bioinformatics       Date:  2001-06       Impact factor: 6.937

Review 4.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information.

Authors:  Alan Wee-Chung Liew; Ngai-Fong Law; Hong Yan
Journal:  Brief Bioinform       Date:  2010-12-14       Impact factor: 11.622

Review 5.  Missing values in gel-based proteomics.

Authors:  Daniela Albrecht; Olaf Kniemeyer; Axel A Brakhage; Reinhard Guthke
Journal:  Proteomics       Date:  2010-03       Impact factor: 3.984

6.  Sequential local least squares imputation estimating missing value of microarray data.

Authors:  Xiaobai Zhang; Xiaofeng Song; Huinan Wang; Huanping Zhang
Journal:  Comput Biol Med       Date:  2008-09-30       Impact factor: 4.589

7.  Improving PLS-RFE based gene selection for microarray data classification.

Authors:  Aiguo Wang; Ning An; Guilin Chen; Lian Li; Gil Alterovitz
Journal:  Comput Biol Med       Date:  2015-04-17       Impact factor: 4.589

8.  Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer.

Authors:  Justin E Mirus; Yuzheng Zhang; Christopher I Li; Anna E Lokshin; Ross L Prentice; Sunil R Hingorani; Paul D Lampe
Journal:  Clin Cancer Res       Date:  2015-01-14       Impact factor: 12.531

9.  Investigating the effects of imputation methods for modelling gene networks using a dynamic bayesian network from gene expression data.

Authors:  Lian En Chai; Chow Kuan Law; Mohd Saberi Mohamad; Chuii Khim Chong; Yee Wen Choon; Safaai Deris; Rosli Md Illias
Journal:  Malays J Med Sci       Date:  2014-03

10.  Microarray profiling shows distinct differences between primary tumors and commonly used preclinical models in hepatocellular carcinoma.

Authors:  Weining Wang; N Gopalakrishna Iyer; Hsien Ts'ung Tay; Yonghui Wu; Tony K H Lim; Lin Zheng; In Chin Song; Chee Keong Kwoh; Hung Huynh; Patrick O B Tan; Pierce K H Chow
Journal:  BMC Cancer       Date:  2015-10-31       Impact factor: 4.430

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.