Literature DB >> 32742062

Comparison of Artificial Neural Networks and Logistic Regression for 30-days Survival Prediction of Cancer Patients.

Funda Secik Arkin1, Gulfidan Aras1, Elif Dogu2.   

Abstract

INTRODUCTION: A machine learning technique that imitates neural system and brain can provide better than traditional methods like logistic regression for survival prediction and create an algorithm by determining influential factors. AIM: To determine the influential factors on survival time of palliative care cancer patients and to compare two statistical methods for better prediction of survival.
METHODS: One-year data is gathered from the patients that we followed in the palliative care clinic of our hospital (2017-2018) (n = 189). All data were retrospectively evaluated. After descriptive statistics, we used Pearson and Spearman correlations for parametric and non-parametric variables. The Artificial Neural Networks (ANN) and logistic regression model were applied to parameters which have a significant correlation with short survival.
RESULTS: Significantly correlated variables with short survival were Palliative Performance Scale (PPS), Edmonton Symptom Assessment System (ESAS), Karnofsky Performance Scale (KPS), brain, liver, and distant metastasis, hemogram parameters, cero-reactive protein (CRP) and albumin (ALB). ANN model showed 89.3% prediction accuracy while the logistic regression model showed 73.0%. ANN model achieved a better AUC value of 0.86 than logistic regression model (0.76). DISCUSSION: There are several prognostic evaluation tools such as PPS, KPS, CRP, albumin, leukocytes, neutrophil were reported several studies as survival-related parameters in logistic regression models, also. Many studies compare ANN with logistic regression. When we evaluated these parameters totally, we observed the same relations with survival then we used the same parameters in the ANN model. The effectivity of the survival prediction models can be improved with the use of ANN.
CONCLUSION: ANN provides a more accurate estimation than logistic regression. ANN model is an important statistical method for survival prediction of cancer patients.
© 2020 Funda Secik Arkin, Gulfidan Aras, Elif Dogu.

Entities:  

Keywords:  artificial neural networks; classification; logistic regression; palliative care; prognostic estimates; survival prediction

Year:  2020        PMID: 32742062      PMCID: PMC7382770          DOI: 10.5455/aim.2020.28.108-113

Source DB:  PubMed          Journal:  Acta Inform Med        ISSN: 0353-8109


INTRODUCTION

In recent years, the increasing numbers of cancer patients and long-term oncological treatments have escalated the emergency admission rates due to both the side effects of the drugs and the clinical symptoms caused by cancer itself in our daily practice. In addition, these patients are concentrated on in-hospital services. Current evaluation criteria are not sufficient in deciding the transition from curative cancer treatments to palliative treatments. Accurate 30-days survival predictions, directing patients to palliative care at the right time will preventboth patients and caregivers from suffering and ensure correct treatments. Survival analyzes should be performed as accurately as possible. The effect of patient data such as clinical, laboratory results, KPS, PPS on survival estimation need to be carefully analyzed. Prediction of survival is one of the trending research areas for data mining techniques (1, 2). Traditional survival models were used such as the Cox proportional hazard model in many studies carried out survival analysis of patients. The logistic regression is widely used to observe the risk conditions among exposure and disease (3). Logistic regression is one of the machine learning techniques with statistical background which can be used in binary decisions in medicine such as classification with two classes. ANN (Artificial Neural Networks) is also a machine learning technique, which imitates the neural system and the design of the brain with neurons and synapses. ANN was used in many outcomes and survival studies. Rughani et al. demonstrated the usefulness of ANN in the survival prediction of patients after brain injury (4). Gohari et al. compared the survival prediction performance of the Cox model and ANN in patients with colorectal cancer (5). Parsaeian et al. obtained better results using ANN comparing to logistic regression (6). Faradmal et al. in predicting breast cancer relapse compared the performance of ANN and log-logistic regression (7).

AIM

The purpose of this study is twofold; the first is to determine the predictors of 30-days survival of palliative care cancer patients via ANN and logistic regression models and the second is to predict 30-days survival using the patient data. Performances of two prediction models are compared with the area under the receiver operating characteristic (ROC) curve (AUC).

METHODS

3.1 Data

This investigation was planned based on one-year patient data (n=189) that we followed in the palliative care clinic of our hospital (2017-2018). All data was retrospectively used for this study. The scientific study committee of our hospital reviewed and approved the database for using this study. At the beginning of their hospitalization, all patients are informed that their data could be tracked and their written consent is received. In our palliative care unit of the chest diseases referral center, the patients with advanced COPD (Chronic Obstructive Pulmonary Disease) and interstitial lung diseases are excluded from the study; only cancer cases are included. Most of them were patients with lung cancer who had stopped their advanced oncologic treatment. All patients received supportive treatment regarding their symptom characteristics. Patient data contain information such as gender, age, symptoms, and signs causing the admission to PCU, the condition of metastasis, biochemical analysis (ALB, CRP, CRP/ALB ratio), hemogram parameters (white, red blood cells, hemoglobin, hematocrit, neutrophil (NE), monocytes, lymphocytes, eosinophil, mean corpuscular volume (MCV), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular hemoglobin (MCH), mean platelet volume (MPV), platelets (PLT), Neutrophil/Lymphocyte ratio). The patient data of prognostic and symptoms measurements are also available. The tools used are PPS and ESAS. PPS score was modified from the Karnofsky Performance Scale’s functional status of ambulation, activity level, evidence of disease, and amounts of self-care oral intake, level of consciousness (8-10). The ESAS is a validated self-reported tool to measure the nine prevalent symptoms of cancer (11). All patient data has been recorded on the first 1-3 days of hospitalization. Some of the patients died when they were hospitalized. Mortality data of the discharged patients were obtained from the national death notification system after one year (2019). We determined that all of our discharged patients died at the exact time recorded by the national death notification system. All the patients are classified according to their survival time (hospital admission to death) being less or more than thirty days.

3.2 Data Analysis

SPSS 23.0 for Windows is used for statistical analysis. Descriptive statistics are revealed as number and percentage for categorical variables and mean, standard deviation, minimum, and maximum for scale variables. Shapiro-Wilk test is used to test the normality assumption. Comparisons of the two independent patient groups are made by Student’s t-test when scale variables are normally distributed and when the normality condition is not provided, by the Mann Whitney U test. Comparisons of ratios in independent groups are performed with the Chi-square test. The significance level alpha is accepted as p<0.05. To examine the relationships, Pearson and Spearman’s correlations are used for parametric and non-parametric variables, respectively. All variables with significant correlations are taken as input variables. For 30-days survival prediction between admission and death, using the significantly correlated factors as input variables, we compared the performances of two classification techniques; logistic regression and ANN. Logistic regression is a linear machine learning method that is used in binary classification problems. It calculates the logarithmic probability of the target variable using a linear function of input variables (independent factors) (3, 12). Another machine learning algorithm is ANN that imitates the synaptic design of the brain. This computational tool enables transferring knowledge or rules concealed in data to the network structure by processing experimental data. An ANN is constructed with three layers: the input layer, hidden layers, and the output layer. All of the input layer nodes are transferred to the output layer in a stratified way. Input layers can be output for the other layer or raw data in the first layer. The main task of the hidden layer is to extract classified information from existing data. The output layer shows the final output of the network. The outputs from the nodes in one layer consist of a weighted linear combination that was transformed by a nonlinear function. This nonlinear function allows the neural network to grasp sophisticated relations between the independent variables and enhance the performance of data-driven machine learning technique (12-14). The total dataset is split into a training set, a cross-validation set, and a test set. The training set is used for deriving the parameters related to survivability prediction, the cross-validation set for preventing overfitting, and the test set for evaluating these derived predictors. All significantly correlated variables are used as input data for both ANN and logistic regression classification models. The parameters that affect survival are evaluated by logistic regression and ANN models with a small dataset of patients who live less or more than thirty days. To compare ANN and logistic regression models in our setting, we used the estimated AUC of ROC.

RESULTS

The symptom distributions and palliative prognostic measurements of 189 patients, the majority of which consisted of lung cancers and males, are given in Table 1.
Table 1.

Characteristics of documented data of patients in PCU (n=189)

Mean±SDMin-Max
Age64.53±11.6026-91
Karnofsky Performance Status32.86±17.6310-90
Edmonton Symptom Assessment System55.51±15.782-90
Palliative Performance Scale33.81±17.8110-90
n%
GenderFemale3619
Male15381
SymptomPain5730.2
Dyspnea15381
Cough5629.6
Hemoptysis115.8
Tiredness5328
Lack of appetite6836
Constipation84.2
Insomnia136.9
Nausea115.8
Lack of well-being147.4
DiagnosisLung Cancer15682.5
Brain Tumor10.5
Colon Cancer31.6
Laryngeal Cancer21.1
Malignant Melanoma10.5
Breast Cancer94.8
Bladder Cancer21.1
Malign Pleural Mesothelioma63.2
Gastric Cancer10.5
Ovarian Cancer10.5
Esophageal Cancer31.6
Pancreatic Cancer21.1
Prostate Cancer10.5
Renal Cancer10.5
The mean age was 64.53 ± 11.60. Statistically significant parameters that differ between the two patient groups, which are classified by their survival being less or more than thirty days, were determined. Distant metastasis, brain metastasis, liver metastasis, symptom burden (ESAS) were higher in patients with short survival than others. Whereas, the level of albumin and prognostic indices (KPS and PPS) were lower in patients with short survival as shown in Table 2.
Table 2.

Differences in patients who died within 30-days and the others

Death within 30-days (n=125) Others (n=64)P
CRP152.49±104.12118.21±90.570.027
ALB2.8±0.483.08±0.590.001
Leucocytes (AN)14.18±8.3912.24±6.100.102
MCHC31.70±1.4932.23±1.230.016
NE (AN)11.88±8.0510.02±5.940.102
Distant Metastasis3528%710.9%0.009
Brain Metastasis1814.4%34.7%0.051
Liver Metastasis1915.2%34.7%0.033
KPS26.88±13.1644.53±19.43<0.001
ESAS59.18±13.7748.34±17.05<0.001
PPS27.92±13.8145.31±19.19<0.001
Significant correlations with short survival were PPS, ESAS, KPS, brain, liver, distant metastasis, leucocytes, neutrophil, MCHC, CRP, and ALB. The presence of the brain, liver, and distant metastasis, the level of CRP, leucocytes, neutrophil, the measurement of the symptom burden measurement (ESAS) were positively correlated with the short term of survival. However, the level of the albumin, MCHC and the prognostic measurements such as KPS and PPS were negatively correlated as shown in Table 3.
Table 3.

The Correlations

Death within 30-daysCorrelationp
CRP0.1510.038
ALB-0.2230.002
Leucocytes (AN)0.1460.046
MCHC-0.1830.012
NE (AN)0.1530.036
Distant Metastasis0.1940.007
Brain Metastasis0.1460.045
Liver Metastasis0.1550.033
KPS-0.460<0.001
ESAS0.317<0.001
PPS-0.452<0.001
The logistic regression model and ANN model are both trained using correlated parameters as the input variables for survival prediction and for investigating their influence. The results of the classification are given in Table 4.
Table 4.

True and Predicted Classes of Logistic Regression and ANN

True Class
Predicted ClassDeath within 30-daysOther
Artificial Neural Networks TrainingDeath within 30-days7725
Other823
Cross-ValidationDeath within 30-days184
Other24
TestDeath within 30-days203
Other05
TotalDeath within 30-days11532
Other1032
Logistic Regression TotalDeath within 30-days10531
Other2033
The logistic regression model had an AUC value of 0.76 as shown in Figure 1. However, the ANN model achieved a better AUC value of 0.86 than the logistic regression model as shown in Figure 2 for all three parts of the dataset. Additionally, the ANN model showed 89.3% prediction accuracy while logistic regression model showed 73.0%, as summarized in Table 5.
Figure 1.

ROC Curve of Logistic Regression Classifier

Figure 2.

ROC Curve of ANN Classifier for Training, Cross-Validation and Testing Data Sets

Table 5.

Accuracy, Sensitivity, Specificity and AUC of LR and ANN results

MethodAccuracySensitivitySpecificityAUC
Artificial Neural Networks89.3%38%100%0.86
Logistic Regression73.0%48%84%0.76

DISCUSSION

Accurate prediction of cancer patients’ survival is valuable for managing expectations, planning of care, and end-of-life. It is critical in the triangle of patients, caregivers/relatives, and clinicians (15, 16). A meta-analysis by White et al. reported that that clinicians’ predictions are frequently inaccurate in palliative care (17). There are several prognostic evaluation tools such as PPS, KPS in a palliative care setting. Many studies compare ANN with other statistical models such as logistic regression. In a study comparing the logistic regression and the ANN model in predicting hospital mortality after hepatocellular carcinoma operation, it was shown that the ANN model provides a more accurate estimation (23). Additionally, it is reported that the ANN model was more accurate in predicting 5-year mortality than multiple logistic regression in breast cancer patients after surgery (24). In the study of Faradmal et al., the ANN model predicts breast cancer relapse better than the logistic regression model (7). However, over time, both methods showed reducing the performance of prediction. Similarly, this study, which also used palliative care patient data for the ANN model, we found better AUC value (0.86) than the logistic regression to predict survival. However, there are some opposing studies. These studies reported that the ANN model was not superior to the traditional models for prediction, which would be related to the data structure (21). If the factor and response relationship in the data sample is unknown, the ANN model is better than traditional models to predict this relationship (1). ANN model may rapidly recognize non-linear patterns, linear patterns even probable effects (25). The aim of our study was to compare two prediction models and to predict the short survival of the patients who were hospitalized in the palliative care service. For this reason, we determined the parameters that are correlated with survival from patient data. When we use the parameters that have a significant correlation with the survival as the ANN input parameters, we found that the model has determined the actual survival with high accuracy (89.3%). These parameters used in our algorithm have been found to have relations with survival individually. Additionally, the ANN model showed a fairly good prediction ability. ANN classification model was used as a model to test and confirm the factors in the clinic and their effects on survival relationships in our study. This algorithm contains data from laboratory parameters such as CRP, ALB, leucocytes, neutrophil, MCHC, and performance measurements such as KPS and PPS. Furthermore, the patient’s symptom load (ESAS) and metastasis status were also included. These algorithm components are already associated with shorter survival, regarding the results obtained from other studies. In our study, it has been used as a collective algorithm and it is composed of parameters which have been determined by traditional statistical methods. MCHC, which is a component of our algorithm, has a role in predicting short survival. MCHC is the average hemoglobin level in the erythrocyte. A few studies are showing the relevance of MCHC to survival. MCHC was negatively correlated with survival in our study. Increased MCHC level in hospitalized patients with acute myocardial infarction is associated with decreased hospitalization time and one-year mortality. Continuing inflammatory response may lead to iron deficiency, which decreases MCHC (26-29). Several studies have shown that a high level of CRP correlates with patients with poorer prognosis in patients with resectable lung cancer. Besides, the CRP level was associated with reduced serum albumin, resulting in progressive weight loss, poor performance, the higher mortality rate in cancer patients (30-32). The level of albumin can reveal the nutritional status of patients and correlates with short survival in advanced cancer. Cancer is in inflammation state with a low level of albumin. The systemic inflammatory response was found related to poor survival in cancer patients (33, 34). In our study, CRP and albumin were selected as algorithm component in the ANN model due to the relationship with poor survival. Neutrophil and leukocyte count were also included in our algorithm. Neutrophilia and leukocytosis were important factors for survival prediction for this study. It was reported that both neutropenia and leukocytosis (mainly due to neutrophils) have been related to a poor course in cancer patients by another study, also (35). Tumor related leukocytosis and neutrophilia may result from granulocyte colony-stimulating factors sourced from the solid tumor (36, 37). The relation between neutrophilia and lower performance score was shown (38). Neutrophils ease invasion-metastasis cascade, repressing natural killer cell activity and enhancing the extravasation of tumor cells (39). We thought that patient’s symptom load was crucial and it is selected as another parameter of this study for ANN and logistic regression models. Symptom burden measurement (ESAS) was positively correlated with the short term of survival in our study, similar to another study. McGee et al. demonstrated that ESAS was as a prognostic tool and could complement ECOG (Eastern Cooperative Oncology Group performance status) in the estimation of survival in advanced lung cancer (40). The association of survival with performance status measurements in terminally ill cancer patients was studied many times. The KPS and ECOG status were the two most frequent measurements (41). Even if, the performance status was accepted as a significant prognostic factor for survival, possible acute influences should be considered (35). The KFS and PPS that are used as the palliative performance tools, were correlated with short survival in this study. Additionally, the presence of metastasis has also been determined in several studies as a prognostic factor in advanced cancer patients (42). This study showed the presence of distant metastasis, especially brain and liver involvement, is correlated with short survival. Since the correlation of the above-mentioned factors with short survival was found in several studies, the correlations between these factors were also found in our study, and we developed modeling for ANN and logistic regression to create a predictive algorithm from these factors for survival. Thus, we have tested the accuracy of them or survival prediction by applying both ANN and logistic regression methods. We found that ANN provides a more accurate estimation than logistic regression. A limitation of this study is that most of the cases are lung cancer patients because our center is in a respiratory reference hospital. The effectivity of the model can be improved with the use of artificial neural networks including more cases and covering all cancer types.

CONCLUSION

This study showed that ANN recognizes linear, non-linear even influence effect and can be used as a 30-days survival prediction algorithm for cancer patients. The usefulness of this algorithm can be determined by applying to a large number of palliative patients in future prospective studies. As suggested by the literature, our study also contributes to the validation of ANN for future prognostication (43). We think that the ANN model is an important statistical method for survival prediction that can include a considerable number of parameters and provides the opportunity to practice in very large patient populations.
  38 in total

1.  The Impact of Baseline Edmonton Symptom Assessment Scale Scores on Treatment and Survival in Patients With Advanced Non-small-cell Lung Cancer.

Authors:  Sharon F McGee; Tinghua Zhang; Hannah Jonker; Scott A Laurie; Glen Goss; Garth Nicholas; Khalid Albaimani; Paul Wheatley-Price
Journal:  Clin Lung Cancer       Date:  2017-06-08       Impact factor: 4.785

Review 2.  Diagnosis of iron-deficient states.

Authors:  Natasha M Archer; Carlo Brugnara
Journal:  Crit Rev Clin Lab Sci       Date:  2015-08-14       Impact factor: 6.250

3.  Comparison between artificial neural network and Cox regression model in predicting the survival rate of gastric cancer patients.

Authors:  Lucheng Zhu; Wenhua Luo; Meng Su; Hangping Wei; Juan Wei; Xuebang Zhang; Changlin Zou
Journal:  Biomed Rep       Date:  2013-07-18

Review 4.  Role of systemic inflammatory response in predicting survival in patients with primary operable cancer.

Authors:  Campbell S D Roxburgh; Donald C McMillan
Journal:  Future Oncol       Date:  2010-01       Impact factor: 3.404

5.  Use of an artificial neural network to predict head injury outcome.

Authors:  Anand I Rughani; Travis M Dumont; Zhenyu Lu; Josh Bongard; Michael A Horgan; Paul L Penar; Bruce I Tranmer
Journal:  J Neurosurg       Date:  2010-09       Impact factor: 5.115

6.  The Edmonton Symptom Assessment System (ESAS): a simple method for the assessment of palliative care patients.

Authors:  E Bruera; N Kuehn; M J Miller; P Selmser; K Macmillan
Journal:  J Palliat Care       Date:  1991       Impact factor: 2.250

7.  Preparing for the end of life: preferences of patients, families, physicians, and other care providers.

Authors:  K E Steinhauser; N A Christakis; E C Clipp; M McNeilly; S Grambow; J Parker; J A Tulsky
Journal:  J Pain Symptom Manage       Date:  2001-09       Impact factor: 3.612

Review 8.  Prognostic factors in patients with recently diagnosed incurable cancer: a systematic review.

Authors:  Catherine A Hauser; Martin R Stockler; Martin H N Tattersall
Journal:  Support Care Cancer       Date:  2006-05-18       Impact factor: 3.603

9.  Preoperative C-reactive protein levels are associated with tumor size and lymphovascular invasion in resected non-small cell lung cancer.

Authors:  Jin Gu Lee; Byoung Chul Cho; Mi Kyung Bae; Chang Young Lee; In Kyu Park; Dae Joon Kim; Song Vogue Ahn; Kyung Young Chung
Journal:  Lung Cancer       Date:  2008-06-02       Impact factor: 5.705

10.  Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

Authors:  M Parsaeian; K Mohammad; M Mahmoudi; H Zeraati
Journal:  Iran J Public Health       Date:  2012-06-30       Impact factor: 1.429

View more
  3 in total

1.  Flap failure prediction in microvascular tissue reconstruction using machine learning algorithms.

Authors:  Yu-Cang Shi; Jie Li; Shao-Jie Li; Zhan-Peng Li; Hui-Jun Zhang; Ze-Yong Wu; Zhi-Yuan Wu
Journal:  World J Clin Cases       Date:  2022-04-26       Impact factor: 1.534

2.  Impact of Augmented Intelligence on Utilization of Palliative Care Services in a Real-World Oncology Setting.

Authors:  Ajeet Gajra; Marjorie E Zettler; Kelly A Miller; John G Frownfelter; John Showalter; Amy W Valley; Sanya Sharma; Shreenath Sridharan; Jonathan K Kish; Sibel Blau
Journal:  JCO Oncol Pract       Date:  2021-09-10

3.  Comparative Analysis of Web of Science and Pubmed Indexed Medical Journals Published in Former Yugoslav Countries.

Authors:  Izet Masic; Slobodan M Jankovic
Journal:  Med Arch       Date:  2020-08
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.