Literature DB >> 20212921

Selection-Fusion Approach for Classification of Datasets with Missing Values.

Mostafa Ghannad-Rezaie1, Hamid Soltanian-Zadeh, Hao Ying, Ming Dong.   

Abstract

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values.

Entities:  

Year:  2010        PMID: 20212921      PMCID: PMC2832761          DOI: 10.1016/j.patcog.2009.12.003

Source DB:  PubMed          Journal:  Pattern Recognit        ISSN: 0031-3203            Impact factor:   7.740


  3 in total

1.  The methods for handling missing data in clinical trials influence sample size requirements.

Authors:  Guy-Robert Auleley; Bruno Giraudeau; Gabriel Baron; Jean-Francis Maillefert; Maxime Dougados; Philippe Ravaud
Journal:  J Clin Epidemiol       Date:  2004-05       Impact factor: 6.437

2.  Impact of missing data in evaluating artificial neural networks trained on complete data.

Authors:  Mia K Markey; Georgia D Tourassi; Michael Margolis; David M DeLong
Journal:  Comput Biol Med       Date:  2006-05       Impact factor: 4.589

3.  Content-based image database system for epilepsy.

Authors:  Mohammad-Reza Siadat; Hamid Soltanian-Zadeh; Farshad Fotouhi; Kost Elisevich
Journal:  Comput Methods Programs Biomed       Date:  2005-09       Impact factor: 5.428

  3 in total
  4 in total

1.  Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion.

Authors:  Kim-Han Thung; Chong-Yaw Wee; Pew-Thian Yap; Dinggang Shen
Journal:  Neuroimage       Date:  2014-01-27       Impact factor: 6.556

2.  Using multiparametric data with missing features for learning patterns of pathology.

Authors:  Madhura Ingalhalikar; William A Parker; Luke Bloy; Timothy P L Roberts; Ragini Verma
Journal:  Med Image Comput Comput Assist Interv       Date:  2012

3.  Zheng classification with missing feature values using local-validity approach.

Authors:  Yan Wang; Lizhuang Ma
Journal:  Evid Based Complement Alternat Med       Date:  2013-12-23       Impact factor: 2.629

4.  Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.

Authors:  Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko
Journal:  PLoS One       Date:  2016-05-19       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.