Literature DB >> 25725446

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.

Pedro J García-Laencina1, Pedro Henriques Abreu2, Miguel Henriques Abreu3, Noémia Afonoso4.   

Abstract

Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment. This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (III) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario.
Copyright © 2015 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  5-year survival prediction; Breast cancer; Discrete data; Imputation; Missing data

Mesh:

Year:  2015        PMID: 25725446     DOI: 10.1016/j.compbiomed.2015.02.006

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  19 in total

1.  Transforming growth factor beta induced (TGFBI) is a potential signature gene for mesenchymal subtype high-grade glioma.

Authors:  Yuan-Bo Pan; Chi-Hao Zhang; Si-Qi Wang; Peng-Hui Ai; Kui Chen; Liang Zhu; Zhao-Liang Sun; Dong-Fu Feng
Journal:  J Neurooncol       Date:  2018-01-02       Impact factor: 4.130

2.  The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance.

Authors:  Mohamed S Barakat; Matthew Field; Aditya Ghose; David Stirling; Lois Holloway; Shalini Vinod; Andre Dekker; David Thwaites
Journal:  Health Inf Sci Syst       Date:  2017-12-06

Review 3.  Artificial intelligence and machine learning in precision and genomic medicine.

Authors:  Sameer Quazi
Journal:  Med Oncol       Date:  2022-06-15       Impact factor: 3.738

4.  Application of unsupervised analysis techniques to lung cancer patient data.

Authors:  Chip M Lynch; Victor H van Berkel; Hermann B Frieboes
Journal:  PLoS One       Date:  2017-09-14       Impact factor: 3.240

5.  Orthodontic Treatment Planning based on Artificial Neural Networks.

Authors:  Peilin Li; Deyu Kong; Tian Tang; Di Su; Pu Yang; Huixia Wang; Zhihe Zhao; Yang Liu
Journal:  Sci Rep       Date:  2019-02-14       Impact factor: 4.379

6.  Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services.

Authors:  Zeinab Sajjadnia; Raof Khayami; Mohammad Reza Moosavi
Journal:  Cancer Inform       Date:  2020-05-27

7.  Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation.

Authors:  Hadi Lotfnezhad Afshar; Nasrollah Jabbari; Hamid Reza Khalkhali; Omid Esnaashari
Journal:  Iran J Public Health       Date:  2021-03       Impact factor: 1.429

8.  Prediction of in-hospital mortality in patients with post traumatic brain injury using National Trauma Registry and Machine Learning Approach.

Authors:  Ahmad Abujaber; Adam Fadlalla; Diala Gammoh; Husham Abdelrahman; Monira Mollazehi; Ayman El-Menyar
Journal:  Scand J Trauma Resusc Emerg Med       Date:  2020-05-27       Impact factor: 2.953

9.  Transcriptome analyses reveal molecular mechanisms underlying phenotypic differences among transcriptional subtypes of glioblastoma.

Authors:  Yuan-Bo Pan; Siqi Wang; Biao Yang; Zhenqi Jiang; Cameron Lenahan; Jianhua Wang; Jianmin Zhang; Anwen Shao
Journal:  J Cell Mol Med       Date:  2020-02-24       Impact factor: 5.310

10.  Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine.

Authors:  Zeeshan Ahmed; Khalid Mohamed; Saman Zeeshan; XinQi Dong
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.