PURPOSE: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. METHODS: A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. RESULTS: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. CONCLUSIONS: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.
PURPOSE: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. METHODS: A BN and SVM model were trained on 322 inoperable NSCLCpatients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. RESULTS: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. CONCLUSIONS: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.
Authors: Yi Luo; Daniel McShan; Dipankar Ray; Martha Matuszak; Shruti Jolly; Theodore Lawrence; Feng Ming Kong; Randall Ten Haken; Issam El Naqa Journal: IEEE Trans Radiat Plasma Med Sci Date: 2018-05-02
Authors: Mohamed S Barakat; Matthew Field; Aditya Ghose; David Stirling; Lois Holloway; Shalini Vinod; Andre Dekker; David Thwaites Journal: Health Inf Sci Syst Date: 2017-12-06
Authors: Issam El Naqa; Sarah L Kerns; James Coates; Yi Luo; Corey Speers; Catharine M L West; Barry S Rosenstein; Randall K Ten Haken Journal: Phys Med Biol Date: 2017-08-01 Impact factor: 3.609
Authors: Mathilda L Bongers; Dirk de Ruysscher; Cary Oberije; Philippe Lambin; Carin A Uyl-de Groot; V M H Coupé Journal: Med Decis Making Date: 2015-03-02 Impact factor: 2.583
Authors: Jung Hun Oh; Jeffrey Craft; Rawan Al Lozi; Manushka Vaidya; Yifan Meng; Joseph O Deasy; Jeffrey D Bradley; Issam El Naqa Journal: Phys Med Biol Date: 2011-02-18 Impact factor: 3.609
Authors: Jonathan Agner Forsberg; Daniel Sjoberg; Qing-Rong Chen; Andrew Vickers; John H Healey Journal: Clin Orthop Relat Res Date: 2013-03 Impact factor: 4.176
Authors: Erik Roelofs; Lucas Persoon; Sebastiaan Nijsten; Wolfgang Wiessler; André Dekker; Philippe Lambin Journal: Radiother Oncol Date: 2013-02-05 Impact factor: 6.280