Shiju Yan1, Wei Qian2, Yubao Guan3, Bin Zheng4. 1. School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China and School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019. 2. Department of Electrical and Computer Engineering, University of Texas, El Paso, Texas 79968 and Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110819, China. 3. Department of Radiology, Guangzhou Medical University, Guangzhou 510182, China. 4. School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019.
Abstract
PURPOSE: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. METHODS: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initially computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. RESULTS: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. CONCLUSIONS: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.
PURPOSE: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. METHODS: A dataset involving 94 early stage lung cancerpatients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initially computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. RESULTS: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. CONCLUSIONS: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.
Authors: Gopichandh Danala; Theresa Thai; Camille C Gunderson; Katherine M Moxley; Kathleen Moore; Robert S Mannel; Hong Liu; Bin Zheng; Yuchen Qiu Journal: Acad Radiol Date: 2017-05-26 Impact factor: 3.173
Authors: Gopichandh Danala; Masoom Desai; Bappaditya Ray; Morteza Heidari; Sai Kiran R Maryada; Calin I Prodan; Bin Zheng Journal: Ann Biomed Eng Date: 2022-02-02 Impact factor: 3.934