Literature DB >> 35664686

Design of an artificial neural network to predict mortality among COVID-19 patients.

Mostafa Shanbehzadeh1, Raoof Nopour2, Hadi Kazemi-Arpanahi3,4.   

Abstract

Introduction: The fast pandemic of coronavirus disease 2019 (COVID-19) has challenged clinicians with many uncertainties and ambiguities regarding disease outcomes and complications. To deal with these uncertainties, our study aimed to develop and evaluate several artificial neural networks (ANNs) to predict the mortality risk in hospitalized COVID-19 patients. Material and methods: The data of 1710 hospitalized COVID-19 patients were used in this retrospective and developmental study. First, a Chi-square test (P < 0.05), Eta coefficient (η > 0.4), and binary logistics regression (BLR) analysis were performed to determine the factors affecting COVID-19 mortality. Then, using the selected variables, two types of feed-forward (FF) models, including the back-propagation (BP) and distributed time delay (DTD) were trained. The models' performance was assessed using mean squared error (MSE), error histogram (EH), and area under the ROC curve (AUC-ROC) metrics.
Results: After applying the univariate and multivariate analysis, 13 variables were selected as important features in predicting COVID-19 mortality at P < 0.05. A comparison of the two ANN architectures using the MSE showed that the BP-ANN (validation error: 0.067, most of the classified samples having 0.049 and 0.05 error rates, and AUC-ROC: 0.888) was the best model. Conclusions: Our findings show the acceptable performance of ANN for predicting the risk of mortality in hospitalized COVID-19 patients. Application of the developed ANN-based CDSS in a real clinical environment will improve patient safety and reduce disease severity and mortality.
© 2022 Published by Elsevier Ltd.

Entities:  

Keywords:  Artificial intelligence; COVID-19; Machine learning; Neural networks

Year:  2022        PMID: 35664686      PMCID: PMC9148440          DOI: 10.1016/j.imu.2022.100983

Source DB:  PubMed          Journal:  Inform Med Unlocked        ISSN: 2352-9148


Coronavirus disease 2019 Artificial intelligence Machine learning Clinical decision support systems Artificial neural network Binary logistic regression Back-propagation Distributed time delay Feed-forward Mean squared error Root mean square error Area under the ROC curve

Introduction

Since March 11, 2020, the coronavirus disease 2019 (COVID-19) pandemic has remained a worldwide public health concern [1]. With the widespread outbreak of COVID-19, the healthcare systems of many countries have failed to meet the growing needs of patients for diagnosis, treatment, and care [2,3]. Due to the weakness of many healthcare industries in dealing with the overwhelming demands during the pandemic, the need to use advanced intelligence and computing technologies has increased [4,5]. In addition, due to the lack of a definitive and approved treatment and the increasing number of infected cases and deaths, artificial intelligence (AI) techniques have become essential to identifying and triaging patients and predicting disease severity and outcome detection [[6], [7], [8]]. Using AI-based prediction models for the early prognosis of the illness and forecasting patients' clinical deterioration can diminish the adverse outcomes of the COVID-19 pandemic [9,10], maintain treatment efficiency, and improve resource utilization [11]. Machine learning (ML) is a branch of AI that can play an essential role in the prognosis, diagnosis, and treatment of various diseases, especially chronic and complex conditions [12,13]. ML techniques extract applied knowledge to support decision-making by exploring cumulative datasets [14,15]. The ML process involves several phases, e.g., data gathering, visualization, and extracting applicable and informative patterns from massive raw datasets. It combines computational, statistical, and database sciences [16]. By training valid and qualitative predictive models, ML techniques are critical to effective triaging and improvement of treatment outcomes [17]. Artificial neural networks (ANN), as a subclass of ML, are adaptive, tutorial, and computational functions that mimic the structure and behavior of neurons in the human brain [18,19]. This method can be trained to discriminate and classify intricate patterns of diseases through an iterative learning process. Once proper training is executed, ANNs can predict with higher accuracy than traditional statistical models. Due to their ability to detect multifaceted nonlinear relations among predictors and outcomes, ANNs have been effectively applied in clinical decision support systems (CDSS) [[20], [21], [22], [23], [24]]. Thus, the current study first selected the most influential variables on COVID-19 mortality at the time of admission, then compared different ANN architectures, and finally established a CDSS based on the best-selected ones to predict COVID-19 mortality.

Materials and methods

Study design and setting

This was a retrospective and developmental study conducted in 2022. The records of 1980 COVID-19 patients were analyzed. The patients had been referred to the Ayatollah Taleghani Hospital (COVID-19 hub center), southwest of Khuzestan Province, Iran, from August 2021 to January 2022. Of these, 1221 and 759 cases were female and male, respectively. The methodology of this study is shown in Fig. 1 in brief. Its included dataset preparation, feature selection, model development, and evaluation.
Fig. 1

The study's roadmap.

The study's roadmap.

Predictor and outcome variables

A total of 58 features were selected and classified into four main categories: demographic, clinical manifestations, epidemiological, and hospitalization indicators (see Table 1 ). The output variable was life status characterized by two values: surviving (code 0) and deceased (code 1).
Table 1

Characteristics associated with COVID-19 mortality.

NO.Variable categoryVariable name
1Demographic factorsAge (year), Height (cm)1, Weight (Kg)2, Blood type (AB+, AB, O+, O, A+, A, B+, B), Sex (male, female)
2Hospitalization factorsLength of hospitalization (Day), ICU hospitalization (Yes, No)
3Clinical manifestations, including symptoms and signsContusion (Has, Does not have), Headache (Has, Does not have), Body temperature, Fever (Has, Does not have), Dyspnea (Has, Does not have), Loss of taste (Has, Does not have), Rhinorrhea (Has, Does not have), Muscular pain (Has, Does not have), Cardiac disease (Has, Does not have), Loss of smell (Has, Does not have), Lung consolidation (Has, Does not have), Cough (Has, Does not have), Gastrointestinal manifestation (GI) (Has, Does not have), Chill sensation (Has, Does not have), other underlying diseases (Has, Does not have), pneumonia (Has, Does not have), Nausea (Has, Does not have), Vomiting (Has, Does not have), Blood pressure (Has, Does not have), Diabetes (Has, Does not have), Sore throat (Has, Does not have)
4TherapyOxygen therapy (Has, Does not have)
5Laboratory dataHypersensitive troponin (ng/L)3, White cell count (Cells/mL)4, Erythrocyte sedimentation rate (mm/hr)5, C-reactive protein (mg/L)6, Alkaline phosphatase (Units/L)7, Prothrombin time (s)8, Activated partial thromboplastin time (s)8, Lactate dehydrogenase (Units/L)7, Blood glucose (mg/dL)9, Serum albumin (g/dL)10, Alanine aminotransferase (units/L)7, Aspartate aminotransferase (units/L)7, Total bilirubin (mg/dL)9, Blood urea nitrogen (mg/dL)9, Blood potassium (mEq/L)11, Blood phosphor (mg/dL)9, Blood magnesium (mEq/L)11, Blood sodium (mEq/L)11, Blood calcium (mg/dL)6, Absolute neutrophil count (103Cells/μL)12, Absolute lymphocyte count (103 Cells/μL)12, Platelet count (Cells/μL)12, Hemoglobin rate (g/dL)10, Hematocrit (L/L)13, Red cell count (mc/mL)13, Blood creatinine (mg/dL)9
6Epidemiological factorsSmoking (Yes, No), Alcohol addiction (Has, Does not have)

1- Centimeters, 2- Kilograms, 3- Nanograms per liter, 4- Cell per microliter, 5-Millimeters per hour, 6- Milligrams per liter, 7- Units per liter, 8- Seconds, 9- Milligrams per deciliter, 10- Grams per deciliter, 11- Milliequivalents per liter, 12- Number of cells per microliter, 13- Million cells per microliter.

Characteristics associated with COVID-19 mortality. 1- Centimeters, 2- Kilograms, 3- Nanograms per liter, 4- Cell per microliter, 5-Millimeters per hour, 6- Milligrams per liter, 7- Units per liter, 8- Seconds, 9- Milligrams per deciliter, 10- Grams per deciliter, 11- Milliequivalents per liter, 12- Number of cells per microliter, 13- Million cells per microliter.

Dataset preprocessing

The dataset was normalized before ANN implementation. This step was performed to achieve maximum performance and have a more straightforward ANN implementation. For this purpose, the normalization process was carried out in three phases: Scrutinizing the database for outliers, duplicates, or a high percentage of missing values: Two Health Information Management experts (M−SH and H–KA), in consultation with two Infectious Diseases specialists, screened all the data samples. Outlier values were deleted from the dataset by the authors. Case records with more than 60% missing values were also excluded from the analysis. Replacing the missing values for the case records with less than 60% missing data: The simple K-means algorithm with specific Euclidean distances of K = 1, 3, and 5 was used to impute the missing values. In this method, the missing values are filled with the same feature value belonging to the nearest case. This closest case is very similar to the cases having missing values in terms of all attribute values. The algorithm uses the value of the feature belonging to this case to fill the missing value for the incomplete data case. Moreover, imputation was evaluated via the root mean square error (RMSE) in different algorithm iterations. Choosing the most important factors affecting COVID-19 mortality using the feature selection (FS) process: FS means reducing the dataset features in data preprocessing [25]. This process was undertaken in order to 1- reduce the dataset dimension for a better understanding of the data, 2- enhance the data mining algorithm's performance, 3- prevent algorithm overfitting, 4- accelerate algorithm development, and 5- simplify data visualization [[26], [27], [28], [29]]. This study used the Chi-square test and Eta coefficient method to determine the best factors affecting mortality in COVID-19 patients. The P < 0.05 was considered for significant relationships between determinant factors and mortality among COVID-19 patients. For the Eta, a coefficient of more than 0.4 was considered the most critical factor. We applied the multivariate analysis of binary logistic regression (BLR) to determine the factors having a computational correlation with the dependent variable; we used the multivariate analysis of BLR with the forward LR method. We also considered the variable entering the model at P < 0.05 as the highly hybrid correlated factor predicting COVID-19 mortality to form the ANN model.

Implementing the artificial neural network

ANN is the abstraction of the human brain structure and attempts to mimic its performance [30]. It consists of three layers: the input layer, the calculation or hidden layer, and the output layer [31,32], (see Fig. 2 ). Each layer includes neurons in ANN and performs different tasks in its layers [33]. The input layer receives elements such as data, images, or signals from the environment and turns them into normalized pieces suitable for mathematical calculations in the output layer. The calculation process occurs in the hidden layer(s) which has the highest number of neurons and performs the calculation operation through proper communication between neurons. In the output layer, the neurons receive the results of the processing layer calculations and present them to the user [[34], [35], [36]]. In the ANN, there are weights between the neurons to transfer information between nodes in the computation process [37]. Based on the adjusted weight during ANN training, the computation process results in previous nodes reaching the common next node to continue the process in the next node and present the results [38,39]. Another critical parameter in ANN is the activation function that describes neurons' processing results in the spanned amounts in nonlinear connections between neurons. This function increases the nonlinear learning in ANN and makes it amplified for a sophisticated computational process [40]. In this study, we used the feed-forward (FF) ANN in MATLAB 2013-a to train and test our algorithm based on the dataset of COVID-19 patients. FF, also known as multi-layered perceptron (MLP), is the most understandable and popular ANN configuration adaptable with the non-linearity forward connection between neurons [41]. To simulate the non-linearity connection between neurons, we used the tansig activation function method because of its rapid performance during the training process [42]. The Levenberg Marquardt (LM) algorithm with its high running speed was applied in MATLAB to train the ANN and adjust the weight connected with the neurons during this process [43]. We also set the training iterations to 1000 and the training time to unlimited due to the high speed of the LM algorithm.
Fig. 2

The overall schema of ANN configuration.

The overall schema of ANN configuration.

Evaluating the artificial neural network

This study used the two FF types of back-propagation (BP) and distributed time delay (DTD) to implement the model predicting COVID-19 mortality. In evaluation phases, performance was assessed in two steps. First, we separately implemented each FF-type of the ANN. To compare and evaluate each ANN configuration, the confusion matrix metrics (Table 2 ) such as the accuracy (Equation (1)) and F-score (Equation (2)) were assessed. The true positive (TP) and true negative (TN) are deceased and surviving cases correctly classified by model. The false negative (FN) and false positive (FP) are the same cases incorrectly classified. 70% of the COVID-19 sample was used to train and 30% to test the ANN algorithms by default. We also split our dataset into 50% and 50% of train and test samples, 60% of train samples and 40% of test samples, 80% of train samples and 20% of test samples to better investigate dataset splitting to build the predictive model. We set the number of neurons in the input and output layers to the number of input and output research variables. To determine the number of nodes in the hidden layer, we started from one neuron, added one neuron, and compared each ANN step to obtain the best configuration. After achieving the best structure of each FF-type of the ANN, in the second step, we compared the selected design of the FF to assess the validation process of two FF types of the ANN during fitting and achieve the best model predicting COVID-19 mortality. In this step, we used the mean squared error (MSE) and area under the ROC curve (AUC-ROC) to compare the various FF types. We investigated the capability of classifying the selected model using the confusion matrix and the error histogram diagram.
Table 2

Confusion matrix.

Real Model+
+True Positive (TP)False Negative (FN)
False Positive (FP)True Negative (TN)
Confusion matrix.

Results

After applying the exclusion criteria such as non-hospitalized COVID-19 cases, patients who were less than 18 years of age, incomplete case records (missing more than 60%), and admission time before January 9, 2020, or after January 20, 2021, 270 patient records were excluded. Out of the 1710 eligible records, 1121 (63.6%) and 589 (34.4%) records belonged to surviving and deceased cases, respectively, with a mean age of 61.62 years. Evaluation of the simple K-means clustering algorithm in imputing the missing values for different iterations (up to 15 epochs) of the algorithm and specific K = 1, K = 3, and K = 5 is shown in Fig. 3, Fig. 4, Fig. 5 .
Fig. 3

The RMSE of simple K-means in K = 1 for 15 epochs.

Fig. 4

The RMSE of simple K-means in K = 3 for 15 epochs.

Fig. 5

The RMSE of simple K-means in K = 5 for 15 epochs.

The RMSE of simple K-means in K = 1 for 15 epochs. The RMSE of simple K-means in K = 3 for 15 epochs. The RMSE of simple K-means in K = 5 for 15 epochs. Different performance criteria of two selected ANN modes (1–10 neurons from left to right). Different performance criteria of two selected ANN modes (1–10 neurons from left to right). Different performance criteria of two selected ANN modes (1–10 neurons from left to right). Different performance criteria of two selected ANN modes (1–10 neurons from left to right). Based on Fig. 3, Fig. 4, for K = 1 and K = 3, the simple K-means gained the error value rates between RMSE of 1–3. In K = 5, these interval amounts were increased to 0.5–3 for 15 iterations. The results of clustering the cases and filling missing values by the simple K-means showed no significant difference between the actual and predicted values by the algorithm [0.5–3], which indicated the desirable performance of the algorithm. The results of selecting each variable that had a significant relationship with COVID-19 mortality at P < 0.05 or Eta>0.4 are presented in Table 3 .
Table 3

The essential variables at P < 0.05 or Eta>0.4

Variable nameVariable typeVariable featureWith codeVariable frequency or mean (SD)P-value or Eta coefficient
CoughNominalNo (0)Yes [1]No (779)Yes (931)<0.001
ContusionNominalNo (0)Yes [1]No (729)Yes (981)<0.001
NauseaNominalNo (0)Yes [1]No (983)Yes (798)0.001
VomitingNominalNo (0)Yes [1]No (839)Yes (871)0.001
Oxygen therapyNominalNo (0)Yes [1]No (926)Yes (784)<0.001
DyspneaNominalNo (0)Yes [1]No (830)Yes (880)0.001
Loss of tasteNominalNo (0)Yes [1]No (758)Yes (952)<0.001
Loss of smellNominalNo (0)Yes [1]No (930)Yes (780)<0.001
RhinorrheaNominalNo (0)Yes [1]No (789)Yes (921)<0.001
Sore throatNominalNo (0)Yes [1]No (739)Yes (971)0.001
Other underlying diseasesNominalNo (0)Yes [1]No (660)Yes (1050)<0.001
Cardiac diseaseNominalNo (0)Yes [1]No (829)Yes (881)0.001
Blood pressureNominalNo (0)Yes [1]No (850)Yes (860)0.001
White cell countNumeric9223.52 (6223)0.9
Platelet countNumeric212318.59 (658.2)0.9
Absolute lymphocyte countNumeric21.54 (8.432)0.6
Absolute neutrophil countNumeric75.22 (4.3)0.6
Blood urea nitrogenNumeric53.52 (6.663)0.6
Aspartate amino transferaseNumeric55.5 (12.3)0.6
Alanine aminotransferaseNumeric48.32 (5.2)0.7
Blood glucoseNumeric135.40 (41.2)0.7
Lactate dehydrogenaseNumeric604.22 (41.6)0.9
Activated partial thromboplastin timeNumeric28.6 (6.7)0.9
Alkaline phosphataseNumeric255 (150.9)0.7
Erythrocyte sedimentation rateNumeric33.24 (19.3)0.7
Hypersensitive troponinNominalNegative (0)Positive [1]Negative (1224)Positive (486)0.001
Lung consolidationNominalNo (0)Yes [1]No (437)Yes (1273)<0.001
Pleural fluidNominalNo (0)Yes [1]No (410)Yes (1300)<0.001
ICU hospitalizationNominalNo (0)Yes [1]No (875)Yes (935)<0.001
Length of hospitalizationNumeric4.83 (3.2)0.6
AgeNumeric58.8 (7.6)0.6
The essential variables at P < 0.05 or Eta>0.4 Based on the information represented in Table 3, 31 variables had a significant relationship with COVID-19 mortality at P < 0.05 or η > 0.4. They were then considered as the important factors affecting mortality. The 18 variables of blood calcium (η = 0.25), blood phosphor (η = 0.12), blood magnesium (η = 0.01), blood sodium (η = 0.17), blood potassium (η = 0.12), total bilirubin (η = 0.11), blood albumin (η = 0.26), prothrombin time (η = 0.25), C-reactive protein (P = 0.201), height (η = 0.16), weight (η = 0.14), blood type (P = 0.155), sex (P = 0.123), headache (P = 0.244), gastrointestinal manifestation (P = 0.10), muscle pain (P = 0.36), chill sensation (P = 0.55), fever (P = 0.48), body temperature (η = 0.163), pneumonia (P = 0.115), diabetes (P = 0.12), smoking (P = 0.06), alcohol consumption (P = 0.11), red cell count (η = 0.12), hematocrit (η = 0.113), hemoglobin rate (η = 0.153), and serum albumin (η = 0.121) with P > 0.05 or η < 0.4 were excluded from the study. After entering the significant variables into the BLR model, we obtained the results shown in Table 4 .
Table 4

The results of entering the variables into the BLR.

Model if Term Removed
VariableModel Log-LikelihoodChange in −2 Log-LikelihooddfSig. of the Change
Step 1ICU hospitalization−216.347199.2091.000
Step 2Pleural fluid−116.742131.9751.000
ICU hospitalization−123.828146.1451.000
Step 3Absolute neutrophil count−50.75519.2601.000
Pleural fluid−109.997137.7451.000
ICU hospitalization−93.544104.8381.000
Step 4Vomiting−41.12522.9321.000
Absolute neutrophil count−42.35025.3821.000
Pleural fluid−105.400151.4821.000
ICU hospitalization−83.648107.9791.000
Step 5Vomiting−136.16422.6591.000
Absolute neutrophil count−38.49727.3251.000
Pleural fluid−99.083148.4961.000
ICU hospitalization−79.791109.9131.000
Length of hospitalization−29.6599.6491.000
Step 6Vomiting−31.59121.7891.000
Loss of taste−24.8358.2761.000
Absolute neutrophil count−33.22425.0551.000
Pleural fluid−89.802138.2101.000
ICU hospitalization−73.150104.9061.000
Length of hospitalization−25.90310.4131.000
Step 7Vomiting−27.25521.9511.000
Loss of taste−20.7578.9551.000
Loss of smell−20.6978.8351.000
Absolute neutrophil count−28.46124.3641.000
Pleural fluid−81.758130.9591.000
ICU hospitalization−65.34498.1311.000
Length of hospitalization−21.64810.7371.000
Step 8Vomiting−19.60816.9551.000
Loss of taste−16.89911.5351.000
Loss of smell−17.38912.5151.000
Rhinorrhea−16.27910.2961.000
Absolute neutrophil count−23.35224.4411.000
Pleural fluid−76.071129.8801.000
ICU hospitalization−55.32888.3941.000
Length of hospitalization−16.64111.0211.000
Step 9Vomiting−15.96618.0511.000
Loss of taste−12.08410.2881.000
Loss of smell−13.48213.0831.000
Rhinorrhea−12.86211.8431.000
Absolute neutrophil count−17.74921.6181.000
Pleural fluid−71.503129.1251.000
ICU hospitalization−47.94882.0161.000
Length of hospitalization−12.41010.9401.000
Age−11.1318.3821.000
Step 10Vomiting−10.19717.5641.000
Oxygen therapy−6.94011.0491.000
Loss of taste−7.43212.0321.000
Loss of smell−8.98315.1351.000
Rhinorrhea−8.52414.2181.000
Absolute neutrophil count−11.58120.3301.000
Pleural fluid−65.568128.3041.000
ICU hospitalization−40.11777.4031.000
Length of hospitalization−6.78310.7341.000
Age−6.47510.1191.000
Step 11Vomiting−7.13717.1681.000
Oxygen therapy−4.80412.5021.000
Loss of taste−4.41511.7251.000
Loss of smell−6.15915.2131.000
Rhinorrhea−5.22113.3371.000
Absolute neutrophil count−8.05118.9971.000
Erythrocyte sedimentation rate−1.4165.7261.000
Pleural fluid−60.513123.9201.000
ICU hospitalization−38.92780.7491.000
Length of hospitalization−3.5289.9511.000
Age−4.72512.3441.000
Step 12Vomiting−3.05715.9971.000
Oxygen therapy−1.21612.3141.000
Loss of taste−1.90213.6871.000
Loss of smell−3.44516.7721.000
Rhinorrhea−1.57413.0311.000
Platelet count−8.5526.9881.000
Absolute neutrophil count−5.94721.7761.000
Erythrocyte sedimentation rate−9.3608.6041.000
Pleural fluid−59.973129.8291.000
ICU hospitalization−35.36380.6091.000
Length of hospitalization−0.09510.0731.002
Age−1.03811.9591.001
Step 13Vomiting−2.38616.5881.000
Oxygen therapy−1.15912.1351.000
Loss of taste−1.69513.2051.000
Loss of smell−2.81417.4441.000
Rhinorrhea−1.76613.3481.000
White-cell count−5.0591.9331.001
Platelet count−7.9927.8011.005
Absolute neutrophil count−1.09320.0031.000
Erythrocyte sedimentation rate−9.5728.9601.003
Pleural fluid−1.511128.8391.000
ICU hospitalization−1.12680.0671.000
Length of hospitalization−9.0269.8681.002
Age−1.25712.3311.000
The results of entering the variables into the BLR. Based on the information given in Table 4, 13 variables of vomiting (P < 0.001), oxygen therapy (P < 0.001), loss of taste (P < 0.001), loss of smell (P < 0.001), rhinorrhea (P < 0.001), white cell count (P = 0.014), platelet count (P = 0.005), absolute neutrophil count (P < 0.001), erythrocyte sedimentation rate (P = 0.003), pleural fluid (P < 0.001), ICU hospitalization (P < 0.001), length of hospitalization (P = 0.002), and age (P < 0.001) were entered into the forward LR of the BLR at 13 steps at P < 0.05. Comparing the Log-likelihood (LL) size of the 13th step and 1st step demonstrated that the LL rate of BLR in the 13th step (3.57) was reduced compared to the 1st step (LL = 216.347) by entering these variables. With this reduction, the BLR capability for predicting COVID-19 mortality in the 13th step markedly increased more than in the 1st step. Therefore, we considered these 13 highly correlated variables for developing the ANN predicting COVID-19 mortality. To compare different ANN configurations of BP and DTD, we presented the performance results of adding 10 neurons to the ANN hidden layer using the accuracy, F-score, and AUC-ROC. The results of the different datasets partitioning up to 10 nodes in the hidden layer are depicted in Table 5 .
Table 5

Comparing different configurations of the ANN.

BP (50% train, 50% test)DTD (50% train, 50% test)
ConfigurationTPFNFPTNConfigurationTPFNFPTN
13-1-145013929782413-1-1428161320801
13-2-147711226785413-2-1455134311810
13-3-148310623388813-3-1466123295826
13-4-15028721990213-4-1473116273848
13-5-15276216795413-5-1489100261860
13-6-15335614497713-6-150188225896
13-7-153950112100913-7-151376208913
13-8-15503998102313-8-153455146975
13-9-15484179104213-9-1542471191002
13-10-15672261106013-10-155336951026
BP (60% train, 40% test)BP (60% train, 40% test)
ConfigurationTPFNFPTNConfigurationTPFNFPTN
13-1-145513428283913-1-1436156309812
13-2-148110826186013-2-1461128300821
13-3-148610322589613-3-1473116281840
13-4-15068320991213-4-1480109265856
13-5-15296015197013-5-149198249872
13-6-15434612699513-6-150683216905
13-7-15503994102713-7-151673197924
13-8-15612880104113-8-154049137984
13-9-15682159106213-9-1546431151006
13-10-15771251107013-10-156326851036
BP (70% train, 30% test)BP (70% train, 30% test)
ConfigurationTPFNFPTNConfigurationTPFNFPTN
13-1-146512428283913-1-1450142300821
13-2-14909926186013-2-1465124288833
13-3-14769322589613-3-1480109271850
13-4-15157420991213-4-149099255866
13-5-15395015197013-5-150188241880
13-6-15233512699513-6-151574211910
13-7-15553494102713-7-152663191930
13-8-15701980104113-8-155039131990
13-9-15731659106213-9-155633951026
13-10-1581851107013-10-157019811040
BP (80% train, 20% test)BP (80% train, 20% test)
ConfigurationTPFNFPTNConfigurationTPFNFPTN
13-1-147111825986213-1-1459133289832
13-2-14959427187013-2-1475114278843
13-3-15028720591613-3-1485104259862
13-4-15216818993213-4-149693252869
13-5-15414814098113-5-151079238883
13-6-154531116100513-6-151772206915
13-7-15602980104113-7-153653184937
13-8-15751461106013-8-1562271201001
13-9-15781153106813-9-154623861035
13-10-1586325109613-10-157811711050

Based on comparing the different architectures of two selected configurations of the ANN using the confusion matrix, we obtained BP-ANN with the structure of 13–10-1 with TP = 586, FN = 3, FP = 25, and TN = 1096. DTD with the design of 13–10-1 with TP = 578, FN = 11, FP = 71, and TN = 11,050 gained the best performance compared to other configurations in 80% data for train and 20% data for test. The results of measuring the TPR, FPR, TNR, TPR, and the precision of two selected ANN configurations for various dataset splittings are depicted in Fig. 6, Fig. 7, Fig. 8, Fig. 9.

Comparing different configurations of the ANN. Based on comparing the different architectures of two selected configurations of the ANN using the confusion matrix, we obtained BP-ANN with the structure of 13–10-1 with TP = 586, FN = 3, FP = 25, and TN = 1096. DTD with the design of 13–10-1 with TP = 578, FN = 11, FP = 71, and TN = 11,050 gained the best performance compared to other configurations in 80% data for train and 20% data for test. The results of measuring the TPR, FPR, TNR, TPR, and the precision of two selected ANN configurations for various dataset splittings are depicted in Fig. 6, Fig. 7, Fig. 8, Fig. 9.
Fig. 6

Different performance criteria of two selected ANN modes (1–10 neurons from left to right).

Fig. 7

Different performance criteria of two selected ANN modes (1–10 neurons from left to right).

Fig. 8

Different performance criteria of two selected ANN modes (1–10 neurons from left to right).

Fig. 9

Different performance criteria of two selected ANN modes (1–10 neurons from left to right).

Based on the results shown in Fig. 6, Fig. 7, Fig. 8, Fig. 9, we observed that by using 80% of the data to train the ANN and 20% for the test, the performance was enhanced compared to other data splitting types. The BP of this ANN type in this state of data splitting with TPR = 0.994, FPR = 0.022, TNR = 0.977, FNR = 0.005, and precision = 0.994 had the best performance for predicting the mortality of COVID-19 patients. Fig. 10 displays the selected configurations of each FF type with the best total performance.
Fig. 10

The best configuration of BP (above) and DTD (below).

The best configuration of BP (above) and DTD (below). The results of comparing the two selected architectures of FF during training, testing, and validation to assess the ANN's validation during ANN training are given in Fig. 11, Fig. 12 by investigating the MSE and error histogram diagram.
Fig. 11

Comparing the two selected configurations using the MSE.

Fig. 12

Comparing the two selected configurations using the error histogram.

Comparing the two selected configurations using the MSE. Comparing the two selected configurations using the error histogram. Evaluating the validation process of the BP-ANN (left side of Fig. 11) showed this rate reached less than 10−1 during the training and fitting of the ANN. In the 9th step of ANN fitting iterations for BP, we obtained the validation rate as the best (validation = 0.055). Moreover, measuring the validation process during the training of the DTD type of the ANN showed that the validation rate dropped to 10−1 during ANN fitting. The precise value of validation in the 4th step of the ANN fitting was 0.089. Therefore, comparing the two selected architectures of the ANN demonstrated that the BP-ANN achieved a lower error rate than the DTD during the ANN fitting using the validation evaluation in the MSE diagram. The results of evaluating the two selected configurations of the ANN using the error histogram diagram with 20 bins showed that the BP-ANN (left side) classified the 250 training and less than 100 validation and test samples of the study in the bin = −0.049 and approximately 30 samples in the bin = 0.05 (the bins near-zero error). The DTD categorized about 110 training, 25 validation, and 20 test samples in bin = −0.053 and about 50 training, less than 10 validation, and 10 test samples in bin = 0.036. The rest cases were also located in other bins during the DTD fitting. Comparing the two modes of FF using the error histogram diagram showed that the BP-ANN had a lower error rate than the DTD during the fitting of the two ANN methods. Fig. 13 depicts the comparison of the two selected configurations of the ANN in 80% of data for the model train and 10% for the test, and 10% for validation using the ROC curve in each state of training, validation, and test modes in each form of training, validation, and test modes. The vertical and horizontal vertices present true-positive rates (TPR) and false-positive rates (FPR).
Fig. 13

All ROC modes of the two FF of the ANN.

All ROC modes of the two FF of the ANN. Based on Fig. 13, in the training mode, the BP algorithm with AUC-train = 0.926 had a slightly better performance than the DTD with AUC-train = 0.896. In the validation mode of ANN, the BP with AUC-validation = 0.873 obtained better capability than the DTD with AUC-validation = 0.791, but in the test mode, DTD with AUC-test = 0.881 gained slightly better ability than BP with the AUC-test = 0.853. In general, the BP with the AUC = 0.901 had better potential in categorizing deceased and surviving COVID-19 cases than DTD with AUC = 0.866. Based on the comparison of the two selected ANN architectures during the fitting process using the MSE, error histogram, and ROC curve, we concluded that the BP architecture of the FF obtained better performance in classifying the training, validation, and test samples. Based on the BP of the FF, we designed the CDSS user interface for COVID-19 mortality (Fig. 14 ). The users, such as physicians, entered the 13 essential factors, and then the CDSS suggested the predictive results about the high or low risk of mortality of the COVID-19 patients.
Fig. 14

The CDSS to predict COVID-19 mortality based on ANN.

The CDSS to predict COVID-19 mortality based on ANN.

Discussion

The adoption of ML-based CDSSs to support clinical decisions about COVID-19 is on the rise [10]. Such technologies can improve treatment outcomes for patients with COVID-19 [10,11]. Proactive prediction of COVID-19 mortality using ML models can help promote the survival chances of hospitalized patients [11,44]. This study aimed to predict mortality among hospitalized COVID-19 patients based on the best configuration of the ANNs. To this end, we used the data of 1710 hospitalized COVID-19 patients to achieve the most important factors affecting COVID-19 mortality. To detect the most important predictive factors influencing COVID-19 mortality, we applied the chi-square test and Eta coefficient as univariate and BLR as multivariate analysis. Using the univariate analysis, 31 features were selected as the most influential factors on COVID-19 mortality at P < 0.05 and η > 0.4. After using the BLR, 13 variables were determined as the most critical predictors for COVID-19 mortality. To develop the model, we used two architectures of ANN, including the BP and DTD of the FF. The results indicated that the design of 13–10-1 (10 neurons in the hidden layer) with TP = 586, FN = 3, FP = 25, and TN = 1096 associated with the BP, and 13-10-1 (10 neurons in DTD's hidden layer) with TP = 578, FN = 11, FP = 71, and TN = 1050 belonging to the DTD presented the best ability in each architecture. Thus, the BP-ANN with an AUC of 0.901 was considered the best ANN architecture to predict COVID-19 mortality. In previous studies, different ML methods were trained to predict COVID-19 outcomes such as disease progression and deterioration [45,46], ICU hospitalization [[46], [47], [48], [49], [50]], and mortality [47,48,[51], [52], [53], [54], [55], [56]]. The most important of these algorithms can be listed as ANN [[57], [58], [59], [60], [61], [62], [63], [64]], ensemble models (boosting algorithms) [[65], [66], [67], [68], [69]], decision trees, in particular random forests (RF) [6,58,61,70,71], support vector machine (SVM) [58,61], and Naive Bayes (NB) [72]. According to the literature, the ANN model [[57], [58], [59], [60], [61], [62], [63], [64]] has the greatest performance in predicting COVID-19 mortality. The results of other reviewed studies also showed that ensemble ML (hybrid) models [[65], [66], [67], [68], [69]] and RF [58,61,70,71] algorithms are the most widely used and effective models for predicting COVID-19 mortality. So far, most efforts have targeted the application of ANNs and their comparison with other techniques for mortality prediction in patients with COVID-19. Accordingly, Gao et al. (2020) conducted a retrospective study on the data of 2520 hospitalized COVID-19 patients. The result showed that the ANN model with an AUC of 0.9760 was the most successful algorithm for mortality prediction [51]. Vaid et al. (2020) analyzed the data of 4029 positive COVID-19 patients. Their results showed that the MLP-ANN classifier gained the best performance to predict COVID-19-related mortality [73]. The results of one study conducted by Zhao et al. (2020) on 313 COVID-19 patient data showed that the ANN achieved the best performance in predicting mortality with an AUC of 0.75 [74]. Asteris et al. (2022) trained four ML techniques on the data of 10,237 patients, and finally implemented and evaluated the ANN model to predict mortality in COVID-19 patients with an accuracy of 89.47% [75]. The ANN model developed by Lin et al. (2021) predicts the mortality risk of COVID-19 patients with an AUC of 0.96 [76]. Adib et al. (2021) also compared three ML models' performance for mortality analysis of pregnant women with COVID-19. The results showed the ANN technique yields significantly higher prediction performance (precision of 100% and accuracy of 95%) [77]. Accordingly, Naseem et al. (2021) applied several ML methods for predicting mortality in confirmed COVID-19 patients. They found that the deep neural network (DNN), with an accuracy of 99.53% and AUC of 88.5%, gained the highest performance [78]. Similarly, Xiaoran et al. (2020) proposed a DNN model based on the clinical data of 5766 individuals to predict the likelihood of ICU admission and mortality among hospitalized COVID-19 patients with an AUC of 0.780 [79]. Furthermore, Hoon et al. (2020) implemented a DNN-based prediction model with a database containing the data of 361 COVID-19 patients to predict the mortality of COVID-19 patients. The developed model provided 100% sensitivity, 91% specificity, and 92% accuracy [80]. Alsuwaiket et al. (2020) proposed an ANN-based prediction method to predict COVID-19 mortality, and the proposed model attained appropriate performance with an MAE of 0.053 and an MSE of 0.032 [81]. In addition, Sankaranarayanan et al. (2021) compared the performance of four ANN models on 1025 patients' data to predict the mortality risk among hospitalized COVID-19 patients. The most successful performance was obtained by using the recurrent neural network (RNN) with an AUC-ROC of 0.938 [82]. Schiaffino et al. (2021) also assessed the performance of four ANN models using a dataset (n = 1541) to predict COVID-19 mortality. The model developed with the MLP-ANN yielded the best performance in predicting mortality in COVID-19 cases (AUC = 0.844) [83]. Finally, Karthikeyan et al. (2021) designed an ANN-based model for predicting COVID-19-related mortality with an AUC of 0.90 [11]. In the present study, given the superiority of the ANN model, we attempted to identify the most effective configuration to predict COVID-19 mortality. Hence, the current retrospective study aimed to develop and validate two ANN models based on 13 variables to predict mortality among hospitalized COVID-19 patients. Based on the findings, the BP-ANN obtained the best performance with an AUC-ROC of 0.901. Implementing a CDSS interface based on the best ANN configuration creates more added value. FS is an effective prerequisite to improving the performance of ML models. The selected variables were used as inputs to the ML models. In the reviewed studies [11,17,44,74,[83], [84], [85], [86], [87]], the most important clinical factors for COVID-19 mortality prediction were old age, chronic underlying diseases, oxygen saturation, loss of taste/smell, pleural fluid, ICU hospitalization, length of stay (LOS), lymphocyte count, CRP rate, and D-dimer level. In the current study, after feature selection, the variables of age, vomiting, oxygen therapy, loss of taste, loss of smell, rhinorrhea, white-cell count, platelet count, absolute neutrophil count, erythrocyte sedimentation rate, pleural fluid, ICU hospitalization, and length of hospitalization were introduced as the top predictors. We applied nominated features as inputs to train different ANN models for the mortality prediction of COVID-19 patients. The ANN algorithm implementation in previous studies and our study demonstrated optimal performance for most indicators.

Limitations and implications

The CDSS designed herein seems to predict the mortality risk of hospitalized COVID-19 patients with acceptable accuracy. However, the implementation of the proposed system has several limitations that must be addressed. The most significant limitation of the present study was the retrospective and single-center nature of the selected dataset. The dataset contained inconsistent, erroneous, and abnormal data fields, affecting the quality of modeling and limiting the comprehensiveness and generalizability of data. Therefore, an attempt was made to minimize this challenge by referring to the responsible physician. In addition, to solve the problem of incomplete fields, the cases with more than 60% missing data were excluded from the analysis. In other instances, empty values were replaced with the values predicted by the simple K-means algorithm with specific values of K. The selected dataset lacks some essential variables, such as imaging data. However, as our study aimed to predict mortality at the time of admission, the available clinical and administrative data were sufficient. Finally, we only used two ANN algorithms in different configurations. In the future, the performance of our proposed model can be enhanced if more ML techniques are tested on larger, prospective, and multicenter datasets. Finally, future studies should concentrate on more external validations to improve the modeling quality and alleviate the bias.

Conclusions

Timely and accurate prediction of COVID-19 patients' outcomes, especially determining their mortality risk, is critical to optimal use of limited hospital resources and supporting clinical decisions. Using the ANN-based CDSS developed in our study in real clinical environments will promote patient safety and reduce COVID-19 severity and mortality. Still, further external validation studies are required to validate our findings.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  44 in total

1.  Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation.

Authors:  Pan Pan; Yichao Li; Yongjiu Xiao; Bingchao Han; Longxiang Su; Mingliang Su; Yansheng Li; Siqi Zhang; Dapeng Jiang; Xia Chen; Fuquan Zhou; Ling Ma; Pengtao Bao; Lixin Xie
Journal:  J Med Internet Res       Date:  2020-11-11       Impact factor: 5.428

2.  Prediction of the confirmed cases and deaths of global COVID-19 using artificial intelligence.

Authors:  Qingchun Guo; Zhenfang He
Journal:  Environ Sci Pollut Res Int       Date:  2021-01-07       Impact factor: 4.223

3.  An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model.

Authors:  Hoon Ko; Heewon Chung; Wu Seong Kang; Chul Park; Do Wan Kim; Seong Eun Kim; Chi Ryang Chung; Ryoung Eun Ko; Hooseok Lee; Jae Ho Seo; Tae-Young Choi; Rafael Jaimes; Kyung Won Kim; Jinseok Lee
Journal:  J Med Internet Res       Date:  2020-12-23       Impact factor: 5.428

4.  Machine learning and network medicine approaches for drug repositioning for COVID-19.

Authors:  Suzana de Siqueira Santos; Mateo Torres; Diego Galeano; María Del Mar Sánchez; Luca Cernuzzi; Alberto Paccanaro
Journal:  Patterns (N Y)       Date:  2021-11-09

5.  An artificial neural network model to predict the mortality of COVID-19 patients using routine blood samples at the time of hospital admission: Development and validation study.

Authors:  Ju-Kuo Lin; Tsair-Wei Chien; Lin-Yen Wang; Willy Chou
Journal:  Medicine (Baltimore)       Date:  2021-07-16       Impact factor: 1.889

6.  Artificial Neural Network Modeling of Novel Coronavirus (COVID-19) Incidence Rates across the Continental United States.

Authors:  Abolfazl Mollalo; Kiara M Rivera; Behzad Vahedi
Journal:  Int J Environ Res Public Health       Date:  2020-06-12       Impact factor: 3.390

7.  Prediction model and risk scores of ICU admission and mortality in COVID-19.

Authors:  Zirun Zhao; Anne Chen; Wei Hou; James M Graham; Haifang Li; Paul S Richman; Henry C Thode; Adam J Singer; Tim Q Duong
Journal:  PLoS One       Date:  2020-07-30       Impact factor: 3.240

8.  Data Mining-Based Analysis of Chinese Medicinal Herb Formulae in Chronic Kidney Disease Treatment.

Authors:  Ping Xia; Kun Gao; Jiadong Xie; Wei Sun; Ming Shi; Wei Li; Jing Zhao; Jin Yan; Qiong Liu; Min Zheng; Xin Wang; Qijing Wu; Enchao Zhou; Jihong Chen; Lingdong Xv; Weiming He
Journal:  Evid Based Complement Alternat Med       Date:  2020-01-24       Impact factor: 2.629

9.  Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19.

Authors:  Varun Arvind; Jun S Kim; Brian H Cho; Eric Geng; Samuel K Cho
Journal:  J Crit Care       Date:  2020-11-16       Impact factor: 4.298

10.  Genetic prediction of ICU hospitalization and mortality in COVID-19 patients using artificial neural networks.

Authors:  Panagiotis G Asteris; Eleni Gavriilaki; Tasoula Touloumenidou; Evaggelia-Evdoxia Koravou; Maria Koutra; Penelope Georgia Papayanni; Alexandros Pouleres; Vassiliki Karali; Minas E Lemonis; Anna Mamou; Athanasia D Skentou; Apostolia Papalexandri; Christos Varelas; Fani Chatzopoulou; Maria Chatzidimitriou; Dimitrios Chatzidimitriou; Anastasia Veleni; Evdoxia Rapti; Ioannis Kioumis; Evaggelos Kaimakamis; Milly Bitzani; Dimitrios Boumpas; Argyris Tsantes; Damianos Sotiropoulos; Anastasia Papadopoulou; Ioannis G Kalantzis; Lydia A Vallianatou; Danial J Armaghani; Liborio Cavaleri; Amir H Gandomi; Mohsen Hajihassani; Mahdi Hasanipanah; Mohammadreza Koopialipoor; Paulo B Lourenço; Pijush Samui; Jian Zhou; Ioanna Sakellari; Serena Valsami; Marianna Politou; Styliani Kokoris; Achilles Anagnostopoulos
Journal:  J Cell Mol Med       Date:  2022-01-22       Impact factor: 5.310

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.