Nagesh Shukla1, Markus Hagenbuchner2, Khin Than Win2, Jack Yang3. 1. School of Systems, Management and Leadership, Faculty of Engineering and Information Technology, University of Technology Sydney, NSW 2007, Australia. Electronic address: nagesh.shukla@uts.edu.au. 2. School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2500, Australia. 3. SMART Infrastructure Facility, Faculty of Engineering and Information Sciences, University of Wollongong, Wollongong, NSW 2500, Australia.
Abstract
BACKGROUND: Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. OBJECTIVE: The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. METHODS: Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. RESULTS: SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. CONCLUSIONS: A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients.
BACKGROUND:Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. OBJECTIVE: The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. METHODS: Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancerpatients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. RESULTS: SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. CONCLUSIONS: A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancerpatients.
Authors: Damini Dey; Piotr J Slomka; Paul Leeson; Dorin Comaniciu; Sirish Shrestha; Partho P Sengupta; Thomas H Marwick Journal: J Am Coll Cardiol Date: 2019-03-26 Impact factor: 24.094
Authors: Zihe Zheng; Sushrut S Waikar; Insa M Schmidt; J Richard Landis; Chi-Yuan Hsu; Tariq Shafi; Harold I Feldman; Amanda H Anderson; Francis P Wilson; Jing Chen; Hernan Rincon-Choles; Ana C Ricardo; Georges Saab; Tamara Isakova; Radhakrishna Kallem; Jeffrey C Fink; Panduranga S Rao; Dawei Xie; Wei Yang Journal: J Am Soc Nephrol Date: 2021-01-18 Impact factor: 14.978
Authors: Jessica Alonso-Molero; Antonio J Molina; Jose Juan Jiménez-Moleón; Beatriz Pérez-Gómez; Vicente Martin; Victor Moreno; Pilar Amiano; Eva Ardanaz; Silvia de Sanjose; Inmaculada Salcedo; Guillermo Fernandez-Tardon; Juan Alguacil; Dolores Salas; Rafael Marcos-Gragera; Maria Dolores Chirlaque; Nuria Aragonés; Gemma Castaño-Vinyals; Marina Pollán; Manolis Kogevinas; Javier Llorca Journal: BMJ Open Date: 2019-11-21 Impact factor: 2.692