Literature DB >> 20638252

Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

José M Jerez1, Ignacio Molina, Pedro J García-Laencina, Emilio Alba, Nuria Ribelles, Miguel Martín, Leonardo Franco.   

Abstract

OBJECTIVES: Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set.
MATERIALS AND METHODS: Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values.
RESULTS: The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model.
CONCLUSION: The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures.
Copyright © 2010 Elsevier B.V. All rights reserved.

Entities:  

Mesh:

Year:  2010        PMID: 20638252     DOI: 10.1016/j.artmed.2010.05.002

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  41 in total

Review 1.  Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT.

Authors:  R Shouval; O Bondi; H Mishan; A Shimoni; R Unger; A Nagler
Journal:  Bone Marrow Transplant       Date:  2013-10-07       Impact factor: 5.483

Review 2.  Role of Soft Computing Approaches in HealthCare Domain: A Mini Review.

Authors:  Shalini Gambhir; Sanjay Kumar Malik; Yugal Kumar
Journal:  J Med Syst       Date:  2016-10-29       Impact factor: 4.460

3.  Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges.

Authors:  José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins
Journal:  Appl Clin Inform       Date:  2016-12-07       Impact factor: 2.342

4.  Visual grids for managing data completeness in clinical research datasets.

Authors:  Robert R Kelley; William A Mattingly; Timothy L Wiemken; Mohammad Khan; Daniel Coats; Daniel Curran; Julia H Chariker; Julio Ramirez
Journal:  J Biomed Inform       Date:  2014-12-30       Impact factor: 6.317

5.  Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record.

Authors:  Zhen Hu; Genevieve B Melton; Elliot G Arsoniadis; Yan Wang; Mary R Kwaan; Gyorgy J Simon
Journal:  J Biomed Inform       Date:  2017-03-16       Impact factor: 6.317

6.  Missing value imputation in high-dimensional phenomic data: imputable or not, and how?

Authors:  Serena G Liao; Yan Lin; Dongwan D Kang; Divay Chandra; Jessica Bon; Naftali Kaminski; Frank C Sciurba; George C Tseng
Journal:  BMC Bioinformatics       Date:  2014-11-05       Impact factor: 3.169

7.  Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening.

Authors:  Xia Cao; Yanhui Lin; Binfang Yang; Ying Li; Jiansong Zhou
Journal:  Risk Manag Healthc Policy       Date:  2022-04-26

8.  Imputing Longitudinal Growth Data in International Pediatric Studies: Does CDC Reference Suffice?

Authors:  Zhiguo Li; Jorma Toppari; Markus Lundgren; Brigitte I Frohnert; Peter Achenbach; Riitta Veijola; Vibha Anand
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

9.  Imputation techniques on missing values in breast cancer treatment and fertility data.

Authors:  Xuetong Wu; Hadi Akbarzadeh Khorshidi; Uwe Aickelin; Zobaida Edib; Michelle Peate
Journal:  Health Inf Sci Syst       Date:  2019-10-03

10.  Ascertaining Design Requirements for Postoperative Care Transition Interventions.

Authors:  Joanna Abraham; Christopher R King; Alicia Meng
Journal:  Appl Clin Inform       Date:  2021-02-24       Impact factor: 2.342

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.