Yueying Wang1, Shuai Liu2, Zhao Wang2, Yusi Fan2, Jingxuan Huang2, Lan Huang2, Zhijun Li1, Xinwei Li1, Mengdi Jin1, Qiong Yu1, Fengfeng Zhou2. 1. Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun 130012, China. 2. College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
Abstract
BACKGROUND AND OBJECTIVE: Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths. MATERIALS AND METHODS: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database. RESULTS: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p < 0.001). Seven popular machine learning algorithms were utilized to evaluate one-year, three-year, and five-year survival prediction The two classifiers extreme gradient boosting (XGB) and logistic regression (LR) achieved the best prediction accuracies. The importance variable of the trained XGB models suggested that surgical removal (feature "Surgery") made the largest contribution to the one-year survival prediction models, while the metastatic status (feature "N" stage) of the regional lymph nodes was the most important contributor to three-year and five-year survival prediction. The female patients' three-year prognosis model achieved a prediction accuracy of 0.8297 on the independent future samples, while the male model only achieved the accuracy 0.7329. CONCLUSIONS: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models.
BACKGROUND AND OBJECTIVE:Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths. MATERIALS AND METHODS: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database. RESULTS: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p < 0.001). Seven popular machine learning algorithms were utilized to evaluate one-year, three-year, and five-year survival prediction The two classifiers extreme gradient boosting (XGB) and logistic regression (LR) achieved the best prediction accuracies. The importance variable of the trained XGB models suggested that surgical removal (feature "Surgery") made the largest contribution to the one-year survival prediction models, while the metastatic status (feature "N" stage) of the regional lymph nodes was the most important contributor to three-year and five-year survival prediction. The female patients' three-year prognosis model achieved a prediction accuracy of 0.8297 on the independent future samples, while the male model only achieved the accuracy 0.7329. CONCLUSIONS: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models.
Authors: Pierre I Karakiewicz; Shahrokh F Shariat; Ganesh S Palapattu; Paul Perrotte; Yair Lotan; Craig G Rogers; Gilad E Amiel; Amnon Vazina; Amit Gupta; Patrick J Bastian; Arthur I Sagalowsky; Mark Schoenberg; Seth P Lerner Journal: Eur Urol Date: 2006-06-23 Impact factor: 20.096