Literature DB >> 22188722

A classifier ensemble approach for the missing feature problem.

Loris Nanni1, Alessandra Lumini, Sheryl Brahnam.   

Abstract

OBJECTIVES: Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets.
MATERIALS AND METHODS: Several state-of-the-art approaches are compared using different datasets. Some state-of-the-art classifiers (including support vector machines and input decimated ensembles) are tested with several imputation methods. The novel approach proposed in this work is a multiple imputation method based on random subspace, where each missing value is calculated considering a different cluster of the data. We have used a fuzzy clustering approach for the clustering algorithm.
RESULTS: Our experiments have shown that the proposed multiple imputation approach based on clustering and a random subspace classifier outperforms several other state-of-the-art approaches. Using the Wilcoxon signed-rank test (reject the null hypothesis, level of significance 0.05) we have shown that the proposed best approach is outperformed by the classifier trained using the original data (i.e., without missing values) only when >20% of the data are missed. Moreover, we have shown that coupling an imputation method with our cluster based imputation we outperform the base method (level of significance ∼0.05).
CONCLUSION: Starting from the assumptions that the feature set must be partially redundant and that the redundancy is distributed randomly over the feature set, we have proposed a method that works quite well even when a large percentage of the features is missing (≥30%). Our best approach is available (MATLAB code) at bias.csr.unibo.it/nanni/MI.rar.
Copyright © 2011 Elsevier B.V. All rights reserved.

Mesh:

Year:  2011        PMID: 22188722     DOI: 10.1016/j.artmed.2011.11.006

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  8 in total

1.  Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion.

Authors:  Marinka Žitnik; Blaž Zupan
Journal:  J Comput Biol       Date:  2015-02-06       Impact factor: 1.479

2.  Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

Authors:  Shah Atiqur Rahman; Yuxiao Huang; Jan Claassen; Nathaniel Heintzman; Samantha Kleinberg
Journal:  J Biomed Inform       Date:  2015-10-21       Impact factor: 6.317

3.  Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values.

Authors:  Byron C Jaeger; Ryan Cantor; Venkata Sthanam; Rongbing Xie; James K Kirklin; Ramaraju Rudraraju
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2021-09-14

4.  Zheng classification with missing feature values using local-validity approach.

Authors:  Yan Wang; Lizhuang Ma
Journal:  Evid Based Complement Alternat Med       Date:  2013-12-23       Impact factor: 2.629

5.  mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

Authors:  Geunho Lee; Hyun Beom Lee; Byung Hwa Jung; Hojung Nam
Journal:  FEBS Open Bio       Date:  2017-06-19       Impact factor: 2.693

6.  Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.

Authors:  Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko
Journal:  PLoS One       Date:  2016-05-19       Impact factor: 3.240

7.  A Long Short-Term Memory Ensemble Approach for Improving the Outcome Prediction in Intensive Care Unit.

Authors:  Jing Xia; Su Pan; Min Zhu; Guolong Cai; Molei Yan; Qun Su; Jing Yan; Gangmin Ning
Journal:  Comput Math Methods Med       Date:  2019-11-03       Impact factor: 2.238

8.  Maximizing the reusability of gene expression data by predicting missing metadata.

Authors:  Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang
Journal:  PLoS Comput Biol       Date:  2020-11-06       Impact factor: 4.475

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.