Literature DB >> 29132615

Prediction of lung cancer patient survival via supervised machine learning classification techniques.

Chip M Lynch1, Behnaz Abdollahi2, Joshua D Fuqua3, Alexandra R de Carlo3, James A Bartholomai3, Rayeanne N Balgemann3, Victor H van Berkel4, Hermann B Frieboes5.   

Abstract

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.
Copyright © 2017 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Biomedical big data; Data classification; Lung cancer; Machine learning; SEER database; Supervised classification

Mesh:

Year:  2017        PMID: 29132615      PMCID: PMC5726571          DOI: 10.1016/j.ijmedinf.2017.09.013

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  13 in total

1.  Predicting breast cancer survivability: a comparison of three data mining methods.

Authors:  Dursun Delen; Glenn Walker; Amit Kadam
Journal:  Artif Intell Med       Date:  2005-06       Impact factor: 5.326

2.  Cancer statistics, trends, and multiple primary cancer analyses from the Surveillance, Epidemiology, and End Results (SEER) Program.

Authors:  Matthew J Hayat; Nadia Howlader; Marsha E Reichman; Brenda K Edwards
Journal:  Oncologist       Date:  2007-01

3.  Lung cancer in young patients: analysis of a Surveillance, Epidemiology, and End Results database.

Authors:  S Ramalingam; K Pawlish; S Gadgeel; R Demers; G P Kalemkerian
Journal:  J Clin Oncol       Date:  1998-02       Impact factor: 44.544

4.  Lung cancer in women: analysis of the national Surveillance, Epidemiology, and End Results database.

Authors:  Jennifer B Fu; T Ying Kau; Richard K Severson; Gregory P Kalemkerian
Journal:  Chest       Date:  2005-03       Impact factor: 9.410

5.  Analysis of second primary lung cancers in the SEER database.

Authors:  Amrit Bhaskarla; Paul C Tang; Terry Mashtare; Chukwumere E Nwogu; Todd L Demmy; Alex A Adjei; Mary E Reid; Sai Yendamuri
Journal:  J Surg Res       Date:  2010-01-25       Impact factor: 2.192

6.  Subsite-specific colorectal cancer incidence rates and stage distributions among Asians and Pacific Islanders in the United States, 1995 to 1999.

Authors:  Xiaocheng Wu; Vivien W Chen; Jim Martin; Steven Roffers; Frank D Groves; Catherine N Correa; Elizabeth Hamilton-Byrd; Ahmedin Jemal
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2004-07       Impact factor: 4.254

7.  Lung cancer in elderly patients: an analysis of the surveillance, epidemiology, and end results database.

Authors:  Taofeek K Owonikoko; Camille C Ragin; Chandra P Belani; Ana B Oton; William E Gooding; Emanuela Taioli; Suresh S Ramalingam
Journal:  J Clin Oncol       Date:  2007-12-10       Impact factor: 44.544

8.  Conditional Survival in Rectal Cancer: A SEER Database Analysis.

Authors:  Samuel J Wang; Clifton D Fuller; Rachel Emery; Charles R Thomas
Journal:  Gastrointest Cancer Res       Date:  2007-05

9.  Application of unsupervised analysis techniques to lung cancer patient data.

Authors:  Chip M Lynch; Victor H van Berkel; Hermann B Frieboes
Journal:  PLoS One       Date:  2017-09-14       Impact factor: 3.240

10.  Lung cancer occurrence in never-smokers: an analysis of 13 cohorts and 22 cancer registry studies.

Authors:  Michael J Thun; Lindsay M Hannan; Lucile L Adams-Campbell; Paolo Boffetta; Julie E Buring; Diane Feskanich; W Dana Flanders; Sun Ha Jee; Kota Katanoda; Laurence N Kolonel; I-Min Lee; Tomomi Marugame; Julie R Palmer; Elio Riboli; Tomotaka Sobue; Erika Avila-Tang; Lynne R Wilkens; Jon M Samet
Journal:  PLoS Med       Date:  2008-09-09       Impact factor: 11.069

View more
  36 in total

1.  Individual-patient prediction of meningioma malignancy and survival using the Surveillance, Epidemiology, and End Results database.

Authors:  Jeremy T Moreau; Todd C Hankinson; Sylvain Baillet; Roy W R Dudley
Journal:  NPJ Digit Med       Date:  2020-01-30

Review 2.  Artificial intelligence and machine learning in precision and genomic medicine.

Authors:  Sameer Quazi
Journal:  Med Oncol       Date:  2022-06-15       Impact factor: 3.738

3.  Correlation between air pollution and prevalence of conjunctivitis in South Korea using analysis of public big data.

Authors:  Sanghyu Nam; Mi Young Shin; Jung Yeob Han; Su Young Moon; Jae Yong Kim; Hungwon Tchah; Hun Lee
Journal:  Sci Rep       Date:  2022-06-16       Impact factor: 4.996

4.  Application and Clinical Value of Machine Learning-Based Cervical Cancer Diagnosis and Prediction Model in Adjuvant Chemotherapy for Cervical Cancer: A Single-Center, Controlled, Non-Arbitrary Size Case-Control Study.

Authors:  Yang Wang; Lidan Shen; Jun Jin; Guohua Wang
Journal:  Contrast Media Mol Imaging       Date:  2022-06-15       Impact factor: 3.009

5.  Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Authors:  Bocheng Jing; W John Boscardin; W James Deardorff; Sun Young Jeon; Alexandra K Lee; Anne L Donovan; Sei J Lee
Journal:  Med Care       Date:  2022-03-30       Impact factor: 3.178

6.  RSMOTE: improving classification performance over imbalanced medical datasets.

Authors:  Mehdi Naseriparsa; Ahmed Al-Shammari; Ming Sheng; Yong Zhang; Rui Zhou
Journal:  Health Inf Sci Syst       Date:  2020-06-12

7.  Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models.

Authors:  Fei Deng; Lanlan Shen; He Wang; Lanjing Zhang
Journal:  Am J Cancer Res       Date:  2020-12-01       Impact factor: 6.166

Review 8.  Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey.

Authors:  Antonio Jesús Banegas-Luna; Jorge Peña-García; Adrian Iftene; Fiorella Guadagni; Patrizia Ferroni; Noemi Scarpato; Fabio Massimo Zanzotto; Andrés Bueno-Crespo; Horacio Pérez-Sánchez
Journal:  Int J Mol Sci       Date:  2021-04-22       Impact factor: 5.923

9.  Prediction of Incident Cancers in the Lifelines Population-Based Cohort.

Authors:  Francisco O Cortés-Ibañez; Sunil Belur Nagaraj; Ludo Cornelissen; Gerjan J Navis; Bert van der Vegt; Grigory Sidorenkov; Geertruida H de Bock
Journal:  Cancers (Basel)       Date:  2021-04-28       Impact factor: 6.639

10.  A Classification Approach for Cancer Survivors from Those Cancer-Free, Based on Health Behaviors: Analysis of the Lifelines Cohort.

Authors:  Francisco O Cortés-Ibañez; Sunil Belur Nagaraj; Ludo Cornelissen; Grigory Sidorenkov; Geertruida H de Bock
Journal:  Cancers (Basel)       Date:  2021-05-12       Impact factor: 6.639

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.