Literature DB >> 18771957

Robust data imputation.

Karlien Vanden Branden1, Sabine Verboven.   

Abstract

Single imputation methods have been wide-discussed topics among researchers in the field of bioinformatics. One major shortcoming of methods proposed until now is the lack of robustness considerations. Like all data, gene expression data can possess outlying values. The presence of these outliers could have negative effects on the imputated values for the missing values. Afterwards, the outcome of any statistical analysis on the completed data could lead to incorrect conclusions. Therefore it is important to consider the possibility of outliers in the data set, and to evaluate how imputation techniques will handle these values. In this paper, a simulation study is performed to test existing techniques for data imputation in case outlying values are present in the data. To overcome some shortcomings of the existing imputation techniques, a new robust imputation method that can deal with the presence of outliers in the data is introduced. In addition, the robust imputation procedure cleans the data for further statistical analysis. Moreover, this method can be easily extended towards a multiple imputation approach by which the uncertainty of the imputed values is emphasised. Finally, a classification example illustrates the lack of robustness of some existing imputation methods and shows the advantage of the multiple imputation approach of the new robust imputation technique.

Mesh:

Year:  2008        PMID: 18771957     DOI: 10.1016/j.compbiolchem.2008.07.019

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  3 in total

1.  ImShot: An Open-Source Software for Probabilistic Identification of Proteins In Situ and Visualization of Proteomics Data.

Authors:  Wasim Aftab; Shibojyoti Lahiri; Axel Imhof
Journal:  Mol Cell Proteomics       Date:  2022-05-13       Impact factor: 7.381

2.  Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018).

Authors:  Ahmad R Alsaber; Jiazhu Pan; Adeeba Al-Hurban
Journal:  Int J Environ Res Public Health       Date:  2021-02-02       Impact factor: 3.390

3.  Using Machine Learning to Identify Biomarkers Affecting Fat Deposition in Pigs by Integrating Multisource Transcriptome Information.

Authors:  Huatao Liu; Kai Xing; Yifan Jiang; Yibing Liu; Chuduan Wang; Xiangdong Ding
Journal:  J Agric Food Chem       Date:  2022-08-11       Impact factor: 5.895

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.