Literature DB >> 36034376

Characterizing Risk of In-Hospital Mortality Following Subarachnoid Hemorrhage Using Machine Learning: A Retrospective Study.

Abstract

Background: Subarachnoid hemorrhage has a high rate of disability and mortality, and the ability to use existing disease severity scores to estimate the risk of adverse outcomes is limited. Collect relevant information of patients during hospitalization to develop more accurate risk prediction models, using logistic regression (LR) and machine learning (ML) technologies, combined with biochemical information.
Methods: Patient-level data were extracted from MIMIC-IV data. The primary outcome was in-hospital mortality. The models were trained and tested on a data set (ratio 70:30) including age and key past medical history. The recursive feature elimination (RFE) algorithm was used to screen the characteristic variables; then, the ML algorithm was used to analyze and establish the prediction model, and the validation set was used to further verify the effectiveness of the model. Result: Of the 1,787 patients included in the mimic database, a total of 379 died during hospitalization. Recursive feature abstraction (RFE) selected 20 variables. After simplification, we determined 10 features, including the Glasgow coma score (GCS), glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and sepsis-related organ failure assessment (SOFA) score. The validation set and Delong test showed that the simplified RF model has a high AUC of 0.949, which is not significantly different from the best model. Furthermore, in the DCA curve, the simplified GBM model has relatively higher net benefits. In the subgroup analysis of non-traumatic subarachnoid hemorrhage, the simplified GBM model has a high AUC of 0.955 and relatively higher net benefits. Conclusions: ML approaches significantly enhance predictive discrimination for mortality following subarachnoid hemorrhage compared to existing illness severity scores and LR. The discriminative ability of these ML models requires validation in external cohorts to establish generalizability.

Entities: Chemical

Keywords: SAH; machine learning; prediction model; recursive feature abstraction; subarachnoid hemorrhage

Year: 2022 PMID： 36034376 PMCID： PMC9407038 DOI： 10.3389/fsurg.2022.891984

Source DB: PubMed Journal: Front Surg ISSN： 2296-875X

Introduction

Subarachnoid hemorrhage (SAH) is a type of hemorrhagic stroke that accounts for 3% of all stroke types. With the development of medicine, the global case fatality rate has decreased from 50% to 17%, but the mortality rate of subarachnoid hemorrhage remains high (1–3). In addition, survivors are often left with a permanent disability, cognitive deficits (particularly in executive function and short-term memory), and mental health symptoms (depression, anxiety), leading to significant reductions in health-related quality of life. In recent years, machine learning (ML), as an area of artificial intelligence, has been able to learn from data based on computational modeling. Similarly, ML can fit higher-order relationships between covariates and outcomes in data-rich environments (4–6). The purpose of this study was to determine whether ML algorithms using demographics, comorbidities, laboratory tests, and other variables can predict the prognosis of SAH fairly accurately and to identify factors that contribute to predictive ability.

Patient Selection and Methods

Data Source

This study was a retrospective study based on the Medical Information Mart for Intensive Care IV (7) (MIMIC-IV version 1.0) database. An individual who has finished the Collaborative Institutional Training Initiative examination (Certification number 43357625 for author Deng) can access the database.

Participant Selection

Inclusion criteria are as follows: (1) patients with subarachnoid hemorrhage confirmed by ICD-9 or ICD-10; (2) people with an age of more than 16 years old; and (3) admission to ICU with the Glasgow coma score (GCS). Moreover, for patients with ICU admissions more than once, only data of the first ICU admission of the first hospitalization were included in the analysis.

Predictors

In this study, the data were extracted from MIMIC-IV, including age, gender, race, language, GCS, sepsis-related organ failure assessment (SOFA) score, and history of trauma. Then, we extracted data containing vital signs, laboratory findings, treatment history of heparin, and antibiotics during hospitalization. Besides, we collected the Charlson comorbidity index (CCI) composed of myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, peptic ulcer disease, diabetes, paraplegia, renal disease, malignant cancer, severe liver disease, metastatic solid tumor, and acquired immunodeficiency syndrome (AIDS).

Outcomes

Patients diagnosed with subarachnoid hemorrhage died during hospitalization.

Statistical Analysis

Categorical variables were presented as numbers and percentages that were analyzed using the χ2 test or the Fisher exact test, while continuous variables were expressed as mean ± SD or median with interquartile range (IQR), which were analyzed by an independent t-test or Mann–Whitney U test. Each feature has different importance or coef attributes in the model, and these data determine the importance of the feature in the model. Recursive feature elimination (RFE) returns the importance of each feature through the learner (8, 9). Then, the least important feature is removed from the current feature set. This step of recursion on the feature set is repeated until the required number of features is finally reached. Then, features are then considered in groups of 5–60; they are organized according to the grade obtained by the feature selection method. In order to find the best hyperparameters, 10-fold cross-validation is used as a resampling method. In each iteration, every nine folds are used as a training subset, and the remaining one is processed to adjust the hyperparameters. In this way, each sample will participate in the training model and test the model, so that all data can be used to the greatest extent. In this study, we divided the data set (ratio 70:30), trained the model, and verified it. We calculated the median and 95% confidence interval of the area under the curve (AUC), where the AUC value of 1.0 indicated complete discrimination and 0.5 indicated no discrimination. Finally, the accuracy, sensitivity, specificity, negative predictive value, and positive predictive value of external data verification were calculated. Additionally, we conducted the decision curve analysis (DCA) to determine the clinical usefulness of the included variables by quantifying the net benefit at different threshold probabilities. All analyses were performed by the statistical software package R version 4.1.3 (http://www.R-project.org, The R Foundation). In our study, we used the “Caret” R packages to achieve the process. P values less than 0.05 (two-sided test) were considered statistically significant.

Results

Baseline Characteristics

Variable values of the SAH patients in MIMIC-IV were analyzed. A total of 1,787 cases were included in the study, of which 349 died during hospitalization. It is found from the data in the table that the infection indexes of the dead patients are significantly increased, and the coagulation system has an abnormal function, thrombocytopenia, electrolyte disorder, and so on. At the same time, the temperature and oxygen saturation of these patients fluctuate more widely and are more likely to be accompanied by other diseases (Table 1 and Figure 1).

Table 1

Baseline characteristics of MIMIC-IV.

	Survival (n = 1,438)	Dead in hospital (n = 349)	P-value
Baseline characteristics
Age (year)	63 (51, 76)	70 (59, 82)	<0.001
Sex (female)	720 (50.07%)	169 (48.42%)	0.581
Race
Black	91 (6.33%)	17 (4.87%)	<0.001
White	932 (64.81%)	163 (46.70%)
Hispanic	65 (4.52%)	10 (2.87%)
Asian	46 (3.20%)	13 (2.87%)
Others	304 (21.14%)	136 (38.97%)
Language
English	1,283 (89.22%)	155 (44.41%)	0.714
Unknow	309 (21.49%)	40 (11.46%)
Marital
Single	368 (25.59%)	48 (13.75%)	<0.001
Married	629 (43.74%)	120 (34.38%)
Divorced	98 (6.82%)	18 (5.16%)
Widowed	159 (11.06%)	43 (12.32%)
Unknow	184 (12.80%)	120 (34.38%)
Weight	75.50 (64.00, 88.00)	73.00 (61.70, 87.30)	0.033
Trauma	697 (48.47%)	109 (31.23%)	<0.001
Coexisting disorders
Myocardial infarction	109 (7.58%)	42 (12.03%)	0.007
Congestive heart failure	148 (10.29%)	53 (15.19%)	0.009
Peripheral vascular disease	87 (6.05%)	26 (7.45%)	0.335
Cerebrovascular disease	815 (56.68%)	255 (73.07%)	<0.001
Dementia	62 (4.31%)	13 (3.72%)	0.624
Chronic pulmonary disease	187 (13.00%)	55 (15.76%)	0.177
Rheumatic disease	26 (1.81%)	6 (1.72%)	0.911
Peptic ulcer disease	8 (0.56%)	3 (0.86%)	0.516
Diabetes	250 (17.39%)	75 (21.49%)	0.075
Paraplegia	150 (10.43%)	49 (14.04%)	0.055
Renal disease	118 (8.21%)	51 (14.04%)	<0.001
Malignant cancer	51 (3.55%)	20 (5.73%)	0.061
Severe liver disease	19 (1.32%)	12 (3.44%)	0.007
Metastatic solid tumor	18 (1.25%)	9 (2.58%)	0.068
AIDS	3 (0.21%)	2 (0.57%)	0.248
Vital signs (1st 24 h)
Heart rate (min)	77.49 (69.64, 87.88)	80.89 (73.26, 93.58)	<0.001
Temperature (°C)	37.00 (36.78, 37.24)	37.00 (36.51, 37.45)	0.001
SBP (mmHg)	124 (115, 134)	123 (111, 134)	<0.001
DBP (mmHg)	65 (58, 72)	62 (55, 70)	<0.001
MBP (mmHg)	82 (75, 88)	80 (73, 88)	<0.001
Respiratory rate (min)	18 (16, 20)	19 (17, 22)	<0.001
SPO₂	97.28 (96.00, 98.72)	97.88 (95.55, 99.25)	<0.001
Laboratory
WBC	9.78 (7.91, 11.78)	12.59 (10.27, 15.70)	<0.001
Hematocrit	33.86 (29.80, 37.70)	32.93 (28.55, 36.90)	0.017
Hemoglobin	11.23 (9.86, 12.69)	10.85 (9.21, 12.33)	0.001
Mch	30.58 (29.40, 31.81)	30.55 (29.30, 31.86)	0.492
Mchc	33.31 (32.47, 34.15)	33.07 (32.05, 33.87)	<0.001
Mcv	91.00 (88.00, 95.00)	92.00 (88.00, 96.00)	0.039
RBC	3.70 (3.25, 4.15)	3.52 (3.10, 4.10)	0.008
Rdw	13.72 (13.03, 14.79)	14.48 (13.50, 15.95)	<0.001
Platelet	228.68 (183.00, 287.00)	194.86 (140.00, 249.33)	<0.001
Neutrophils	77.67 (73.30, 83.95)	78.50 (77.58, 86.00)	0.008
Lymphocytes	13.94 (9.20, 16.80)	12.50 (6.70, 14.20)	<0.001
Monocytes	5.87 (4.20, 7.00)	5.87 (4.33, 6.23)	0.482
Eosinophils	0.90 (0.30, 1.30)	0.75 (0.20, 1.13)	0.002
Basophils	0.35 (0.20, 0.45)	0.30 (0.20, 0.36)	<0.001
Bicarbonate	24.48 (22.90, 26.00)	23.25 (20.00, 24.62)	<0.001
Bun	14.50 (11.00, 19.50)	20.00 (14.67, 30.25)	<0.001
Calcium	8.64 (8.35, 8.95)	8.45 (8.08, 8.66)	<0.001
Chloride	103.90 (101.60, 106.00)	105.67 (103.00, 111.20)	<0.001
Creatinine	0.75 (0.61, 0.93)	0.95 (0.70, 1.40)	<0.001
Glucose	119.23 (107.00, 136.00)	150.65 (131.38, 178.10)	<0.001
Sodium	139.35 (137.44, 141.50)	141.00 (139.00, 145.19)	<0.001
Potassium	3.93 (3.75, 4.18)	4.00 (3.80, 4.32)	<0.001
INR	1.12 (1.05, 1.22)	1.20 (1.10, 1.40)	<0.001
PT	12.38 (11.57, 13.50)	13.34 (12.10, 15.10)	<0.001
APTT	28.28 (25.95, 30.93)	29.00 (26.20, 34.31)	<0.001
NLR	5.57 (4.42, 9.06)	6.40 (5.46, 12.60)	<0.001
Therapy
Heparin	1,154 (80.25%)	157 (44.99%)	<0.001
Antibiotic	823 (57.23%)	226 (64.76%)	0.010
Scoring system
SOFA	3 (2, 5)	6 (4, 8)	<0.001
GCS	13 (10, 14)	9 (3, 15)	<0.001

MIMIC-IV, Medical Information Mart for Intensive Care-IV; AIDS, acquired immunodeficiency syndrome; SBP, systolic blood pressure; DBP, diastolic blood pressure; MBP, mean blood pressure; Mch, mean corpuscular hemoglobin; Mchc, mean corpuscular hemoglobin concentration; Mcv, mean corpuscular volume; RBC, red blood cell; Rdw, red blood cell volume distribution width; SPO

Figure 1

Overview of the methods used for data extraction, training, and testing. MIMIC-IV, Medical Information Mart for Intensive Care-IV; SAH, subarachnoid hemorrhage.

Overview of the methods used for data extraction, training, and testing. MIMIC-IV, Medical Information Mart for Intensive Care-IV; SAH, subarachnoid hemorrhage. Baseline characteristics of MIMIC-IV. MIMIC-IV, Medical Information Mart for Intensive Care-IV; AIDS, acquired immunodeficiency syndrome; SBP, systolic blood pressure; DBP, diastolic blood pressure; MBP, mean blood pressure; Mch, mean corpuscular hemoglobin; Mchc, mean corpuscular hemoglobin concentration; Mcv, mean corpuscular volume; RBC, red blood cell; Rdw, red blood cell volume distribution width; SPO

Variable Importance

Through feature screening by the RFE algorithm, we find that it has the highest accuracy when 20 features are included (Figure 2). In order to further simplify the model, we choose the models with an accuracy similar to the best feature number to verify the analysis. Therefore, we establish the prediction model with the characteristic numbers of 10 and 20. Model1 includes GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score, while Model2 include GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease. Then, these variables were used in all the subsequent analyses for all models in both training and testing sets.

Figure 2

Correlation diagram between different feature numbers and accuracy in the RFE algorithm. RFE, recursive feature elimination.

Prediction performance of different models

We use 10 features and 20 features to establish the traditional regression and ML models, respectively. In simplified Model 1, the logistic regression (LR), random forest (RF), gradient boosting machine (GBM), artificial neural network (NNET), support vector machine (SVM), eXtreme gradient boosting (XGB), adapting boosting (ADA), and naïve Bayes (NB) models obtained AUCs of 0.883, 0.949, 0.945, 0.888, 0.926, 0.925, 0.936, and 0.920, respectively (Table 2 and Figure 3). In Model 2, the LR, RF, GBM, NNET, SVM, XGB, ADA, and NB models obtained AUCs of 0.921, 0.958, 0.959, 0.801, 0.942, 0.950, 0.951, and 0.927, respectively (Table 3 and Figure 4). Through the Delong test, different models constructed by LR, NNET, and XGB algorithms are different (Table 4). Comparatively, RF-Model 1 had the highest predictive performance among these models. The decision curve is suitable for comparing the net benefits of the best model and alternative methods of clinical decision-making. Among the two different models, the net benefit of the model composed of the GBM algorithm is higher than that of other models, indicating that the model has a better effect in predicting the in-hospital mortality of SAH (Figures 5, 6).

Table 2

Prediction performance of Model1 in the testing set.

Model	Accuracy	Sensitivity	Specificity	PPV	NPV	AUC	95% CI
LR-Model1	0.902	0.904	0.889	0.982	0.577	0.883	0.874–0.925
RF-Nodel1	0.920	0.928	0.875	0.976	0.694	0.949	0.894–0.941
GBM-Model1	0.918	0.928	0.865	0.973	0.693	0.945	0.892–0.939
NNET-Model1	0.889	0.918	0.753	0.946	0.658	0.888	0.860–0.914
SVM-Model1	0.902	0.902	0.900	0.984	0.568	0.926	0.874–0.925
XGB-Model1	0.909	0.936	0.788	0.951	0.739	0.925	0.882–0.931
Ada-Model1	0.907	0.928	0.804	0.958	0.703	0.936	0.880–0.930
NB-Model1	0.895	0.926	0.755	0.944	0.694	0.920	0.866–0.919

PPV, positive predictive values; NPV, negative predictive values; AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO

Figure 3

Table 3

Prediction performance of Model2 in the testing set.

Model	Accuracy	Sensitivity	Specificity	PPV	NPV	AUC	95% CI
LR-Model2	0.893	0.910	0.800	0.962	0.613	0.921	0.864–0.917
RF-Model2	0.930	0.938	0.891	0.978	0.739	0.958	0.906–0.950
GBM-Model2	0.916	0.935	0.826	0.962	0.730	0.959	0.904–0.948
NNET-Model2	0.850	0.906	0.622	0.906	0.622	0.801	0.818–0.879
SVM-Model2	0.891	0.896	0.857	0.978	0.540	0.942	0.862–0.916
XGB-Model2	0.920	0.941	0.823	0.960	0.757	0.950	0.894–0.941
Ada-Model2	0.923	0.941	0.840	0.964	0.759	0.951	0.898–0.944
NB-Model2	0.891	0.924	0.745	0.942	0.685	0.927	0.862–0.916

PPV, positive predictive values; NPV, negative predictive values; AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO

Figure 4

Area under the receiver operating characteristic curve by different Model2 algorithms in the validation cohort. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machines; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease.

Table 4

Delong test of models.

Model	Model	P	Model	Model	P
LR-Model1	LR-Model2	0.002	SVM-Model1	SVM-Model2	0.222
RF-Model1	RF-Model2	0.124	XGB-Model1	XGB-Model2	0.013
GBM-Model1	GBM-Model2	0.076	Ada-Model1	Ada-Model2	0.130
NNET-Model1	NNET-Model2	<0.001	NB-Model1	NB-Model2	0.377

LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO

Figure 5

Decision curve analysis of Model1. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score.

Figure 6

Decision curve analysis of Model2. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease.

Area under receiver operating characteristic curve by different Model1 algorithms in the validation cohort. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. Area under the receiver operating characteristic curve by different Model2 algorithms in the validation cohort. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machines; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease. Decision curve analysis of Model1. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. Decision curve analysis of Model2. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease. Prediction performance of Model1 in the testing set. PPV, positive predictive values; NPV, negative predictive values; AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO Prediction performance of Model2 in the testing set. PPV, positive predictive values; NPV, negative predictive values; AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO Delong test of models. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO Through the importance ranking of the ML algorithm, the first 10 important characteristics of two different models of RF are consistent (Figure 7). Moreover, the importance of the GCS accounted for the highest proportion.

Figure 7

Variable importance in RF models. RF, random forest; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. Model2 was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease.

Performance of Models in Subgroup (Non-Traumatic Subarachnoid Hemorrhage) Analysis

In order to verify the prediction ability of the model in non-traumatic subarachnoid hemorrhage, we took the cases without definite trauma as a new research subgroup (Table 5) and divided them into a training set and a test set (ratio 70:30). After establishing the model with the simplified characteristic variables in the training set, the prediction ability was verified with the test set. Within the training set, the LR, RF, GBM, NNET, SVM, XGB, Ada, and NB models were established, and the testing set obtained AUCs of 0.909, 0.951, 0.955, 0.891, 0.929, 0.956, 0.947, and 0.921 (Table 6 and Figure 8). Among the eight models, GBM has the highest prediction performance and NNET has the worst generalization ability. As shown in Figure 9, the net benefit of the GBM model exceeded that of other ML models and LR regression models, indicating that the model has better performance in predicting the queue.

Table 5

Baseline characteristics of patients without traumatic subarachnoid hemorrhage.

	Survival (n = 741)	Dead in hospital (n = 240)	P-value
Baseline characteristics
Age (year)	59 (50, 70)	69 (58, 80)	<0.001
Sex (female)	421 (57%)	131 (55%)	0.595
Race
Black	1 (0%)	0 (0%)	<0.001
White	22 (3%)	12 (5%)
Hispanic	63 (9%)	13 (5%)
Asian	42 (6%)	8 (3%)
Others	482 (65%)	106 (44%)
Language
English	662 (89%)	205 (85%)	0.126
Unknow	79 (11%)	35 (15%)
Marital
Single	193 (26%)	33 (14%)	<0.001
Married	365 (49%)	87 (36%)
Divorced	66 (9%)	13 (5%)
Widowed	49 (7%)	22 (9%)
Unknow	68 (9%)	85 (35%)
Weight	76.87 (65.5, 90.9)	73.2 (61, 90)	0.034
Coexisting disorders
Myocardial infarction	52 (7%)	26 (11%)	0.066
Congestive heart failure	56 (8%)	31 (13%)	0.014
Peripheral vascular disease	54 (7%)	18 (8%)	0.913
Cerebrovascular disease	741 (100%)	240 (100%)	1.000
Dementia	14 (2%)	3 (1%)	0.494
Chronic pulmonary disease	102 (14%)	41 (17%)	0.212
Rheumatic disease	13 (2%)	4 (2%)	0.928
Peptic ulcer disease	4 (15)	3 (1%)	0.285
Diabetes	98 (13%)	44 (18%)	0.060
Paraplegia	110 (15%)	40 (17%)	0.499
Renal disease	40 (5%)	34 (14%)	<0.001
Malignant cancer	26 (4%)	14 (6%)	0.128
Severe liver disease	7 (1%)	9 (4%)	0.006
Metastatic solid tumor	10 (1%)	8 (3%)	0.062
AIDS	2 (0%)	2 (1%)	0.270
Vital signs (1st 24 h)
Heart rate (min)	75.64 (69.04, 85.4)	80.3 (72.98, 92.05)	<0.001
Temperature (°C)	36.96 (36.77, 37.23)	36.95 (36.51, 37.44)	0.265
SBP (mmHg)	124(115, 134)	123(111, 134)	<0.001
DBP (mmHg)	65(58, 72)	62(55, 70)	<0.001
MBP (mmHg)	82(75, 88)	80(73, 88)	<0.001
Respiratory rate (min)	17 (16, 18)	19 (17, 21)	<0.001
SPO₂	97.31 (96.03, 98.77)	97.92 (95.59, 99.29)	0.104
Laboratory
WBC	10.1 (8.37, 12.03)	13.05 (10.35, 16.2)	<0.001
Hematocrit	34.43 (30.83, 38.3)	33.55 (29.34, 37.53)	0.062
Hemoglobin	11.53 (10.2, 12.9)	11.1 (9.41, 12.7)	0.007
Mch	30.47 (29.43, 31.66)	30.5 (29.1, 31.72)	0.775
Mchc	33.4 (32.58, 34.28)	33.1 (32.06, 33.91)	<0.001
Mcv	91 (87, 94)	92 (88, 95)	0.018
RBC	3.81 ± 0.61	3.72 ± 0.75	0.088
Rdw	13.57 (12.99, 14.54)	14.35 (13.47, 16.08)	<0.001
Platelet	243.09 (199.5, 306.2)	205.56 (147.31, 261.82)	<0.001
Neutrophils	78.2 (73.2, 84.6)	79.88 (77.59, 86.4)	0.003
Lymphocytes	13.93 (9.4, 17.3)	12.45 (6.53, 14.2)	<0.001
Monocytes	5.4 (3.6, 6.3)	5.86 (4, 6)	0.169
Eosinophils	0.8 (0.3, 1.3)	0.7 (0.2, 1.1)	<0.001
Basophils	0.35 (0.2, 0.48)	0.33 (0.2, 0.36)	<0.001
Bicarbonate	24.15 (22.76, 25.86)	22.64 (19.8, 24.63)	<0.001
Bun	13.33 (10.5, 18)	19.86 (14.63, 30.21)	<0.001
Calcium	8.67 (8.42, 8.96)	8.45 (8.09, 8.67)	<0.001
Chloride	104 (102, 106.32)	106 (103.27, 111.58)	<0.001
Creatinine	0.71 (0.58, 0.87)	0.95 (0.69, 1.42)	<0.001
Glucose	119.21 (108.71, 134.28)	153.32 (131.89, 181.5)	<0.001
Sodium	139.3 (137.44, 141.33)	141.17 (139.08, 145.62)	<0.001
Potassium	3.89 (3.74, 4.1)	3.99 (3.76, 4.28)	0.003
INR	1.1 (1.04, 1.19)	1.17 (1.1, 1.35)	<0.001
PT	12.28 (11.5, 13.2)	13.04 (11.9, 14.74)	<0.001
APTT	28.48 (26.29, 31.48)	29.04 (26.29, 35.11)	0.033
NLR	7.39(4.28, 9.05)	10.29(5.46, 13.21)	<0.001
Therapy
Heparin	639 (86)	104 (43)	<0.001
Antibiotic	435 (59)	156 (65)	0.098
Scoring system
SOFA	3 (2, 5)	6 (4, 8.25)	<0.001
GCS	13 (9, 14)	8 (3, 15)	0.002

AIDS, acquired immunodeficiency syndrome; SBP, systolic blood pressure; DBP, diastolic blood pressure; MBP, mean blood pressure; Mch, mean corpuscular hemoglobin; Mchc, mean corpuscular hemoglobin concentration; Mcv, mean corpuscular volume; RBC, red blood cell; Rdw, red blood cell volume distribution width; SPO

Table 6

The prediction performance of the non-traumatic subarachnoid hemorrhage prediction model in the test set.

Model	Accuracy	Sensitivity	Specificity	PPV	NPV	AUC	95%CI
LR-model	0.905	0.926	0.817	0.956	0.720	0.909	0.867–0.935
RF-model	0.902	0.943	0.760	0.931	0.794	0.951	0.864–0.932
GBM-model	0.908	0.933	0.809	0.952	0.750	0.955	0.871–0.938
NNET-model	0.873	0.930	0.689	0.907	0.750	0.891	0.832–0.910
SVM-model	0.892	0.921	0.774	0.943	0.706	0.929	0.853–0.924
XGB-model	0.899	0.943	0.750	0.927	0.794	0.956	0.860–0.930
Ada-model	0.892	0.938	0.736	0.923	0.779	0.947	0.853–0.924
NB-model	0.877	0.934	0.693	0.907	0.764	0.921	0.835–0.911

Figure 8

Figure 9

DCA curves of different models of non-traumatic subarachnoid hemorrhage. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; the model was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score.

Area under the receiver operating characteristic curve of different models of non-traumatic subarachnoid hemorrhage in the validation cohort. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; the model was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. DCA curves of different models of non-traumatic subarachnoid hemorrhage. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; the model was adjusted for GCS, glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. Baseline characteristics of patients without traumatic subarachnoid hemorrhage. AIDS, acquired immunodeficiency syndrome; SBP, systolic blood pressure; DBP, diastolic blood pressure; MBP, mean blood pressure; Mch, mean corpuscular hemoglobin; Mchc, mean corpuscular hemoglobin concentration; Mcv, mean corpuscular volume; RBC, red blood cell; Rdw, red blood cell volume distribution width; SPO The prediction performance of the non-traumatic subarachnoid hemorrhage prediction model in the test set. PPV, positive predictive values; NPV, negative predictive values; AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO.

Discussion

Subarachnoid hemorrhage (SAH) has a high mortality and disability rate, and many complications may occur after the onset, while most of the current studies have used a single feature for prognosis research, ignoring the adverse outcomes caused by other factors. Recently, a large number of studies have reported that peripheral blood, biochemical, and other conventional indicators are associated with the prognosis of subarachnoid hemorrhage, so we used the indicators commonly found in the mimic database for model building. In this study, we use RFE to screen important features. After simplifying the model, we use the traditional logistic regression and ML algorithm for modeling. There is basically no significant difference in the prediction ability between these simplified models and the best models. At the same time, the simplified models can reduce the phenomenon of overfitting and are more suitable for clinical use to reduce unnecessary workloads. In subgroup analysis, the model established with the same characteristics has higher AUCs, which also proves that the model has a better ability to predict the prognosis of patients with non-traumatic subarachnoid hemorrhage. From the study, we found a larger association of mortality with patients’ electrolyte levels, glucose levels, and whether they used heparin in addition to the traditional GCS. In addition, the SOFA score also history a significant mortality factor, and this score mainly describes indicators of impairment in multiple organ functions (10) (respiratory, neurological, cardiovascular, hepatic, coagulation, and renal). The underlying mechanism may be caused by the patient’s past medical history leading to organ failure or by coagulopathy due to bleeding. Impaired consciousness occurs in some patients after SAH. GCS is assessed by the ability to eye opening, best verbal response, and best motor response, can easily and rapidly assess the state of consciousness of a patient, and to identify development of complications and the potential degree of ultimate recovery (11). Meanwhile, in our study, glucose level served as an important factor in the prediction of death. Pppacena et al. found that higher blood glucose was associated with higher mortality after SAH (12). Meanwhile, a higher rate of glycemic variability was also associated with prognosis after SAH (13). Recently, the neutrophil to lymphocyte ratio (NLR) was reported by most literature studies to have a correlation with the prognosis of SAH (14), so we also calculated NLR as a feature. In univariate analysis, there was a clear difference between the two groups, and after filtering by ML algorithms, NLR failed to be included in the model as a better feature, perhaps because of inconsistent outcomes across studies. The higher importance of leukocytes at the same time is consistent with the finding by Srinivasan et al. and Chamling et al. that early elevation of peripheral leukocytes is associated with the occurrence of DCI and poor functional outcomes (15). Sodium and chloride are important components of electrolytes in humans, and 36% of SAH patients present with hyponatremia after the onset, mainly as a result of cerebral salt-wasting syndrome (CSWS) and syndrome of inappropriate antidiuretic hormone secretion (SIADH). Vrsajkov et al. and Saramma et al. found better outcomes in patients who did not develop hyponatremia during ICU treatment (16, 17). Hyponatremia has also been reported to be associated with an increased risk of vasospasm. This may be the main reason for the poor prognosis of patients (18). Low bicarbonate concentrations occur in patients with severe acute illness. Although the current mechanism is unknown, increased systemic vascular resistance can occur after SAH, leading to transient lactic acidosis with the formation of neurogenic pulmonary edema, resulting in poor patient outcomes reported in a case study (19), Satoh et al. found that patients presenting with neurogenic pulmonary edema had lower bicarbonate concentrations (20). In addition, Stephan et al. found that one in five patients had abnormally low bicarbonate levels on admission and a poor prognosis (21). Our study found that the use of heparin in SAH patients was able to improve outcomes, which was consistent with the findings of Post et al. (22) that the use of heparin was able to reduce mortality after SAH. The concomitant use of low-dose heparin may reduce the risk of thrombosis and reduce the poor prognosis resulting from thrombus shedding (23, 24). In summary, the characteristic factors screened by RFE in our study were all investigated in SAH; meanwhile, they were all correlated with prognosis. The strength of this study is that the method of ML was used to combine the relevant factors to predict the mortality of SAH, while feature acquisition was simple and able to be acquired within a smaller hospital. Patients with SAH are sicker, and early and accurate prediction of mortality is able to provide clinicians with more time to adjust the corresponding treatment options, while, at the same time, in clinical work, further treatment should be given to the related diseases. In addition, the validation set was adopted in this study to verify the reliability of the model so that it had better reliability. Finally, most of the data in this study come from publicly available databases, and their data have good reliability. Our study has limitations, which are similar to most studies related to public databases. First, the MIMIC database cannot provide the relevant imaging examination of cases. Therefore, we cannot perform an M-Fisher score on patients to establish a model nor can we evaluate whether patients have obvious trauma information and the nature of aneurysms. Second, as a public database, the MIMIC database may cause data errors due to the errors of researchers or the database itself when extracting data. In addition, there is the possibility of SAH error classification. In order to reduce the deviation caused by inaccurate code, we adopt the extensively used ICD-9 and -10 codes. Third, as with all potential retrospective studies, there are unmeasured confounding factors that lead to selection bias. Finally, although our study explored the mortality of SAH in the intensive care unit, other results, such as long-term prognosis and complications, also need further study.

Conclusion

This study suggests that some important features may be related to the prognosis after SAH. The ML model deals with a large number of variables and then distinguishes patients who die in hospitals to promote the implementation of timely and effective treatment. In the future, further verification of its clinical application value will be necessary.

24 in total

1. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Authors: A L Goldberger; L A Amaral; L Glass; J M Hausdorff; P C Ivanov; R G Mark; J E Mietus; G B Moody; C K Peng; H E Stanley
Journal: Circulation Date: 2000-06-13 Impact factor: 29.690

2. Low-Dose versus Therapeutic Range Intravenous Unfractionated Heparin Prophylaxis in the Treatment of Patients with Severe Aneurysmal Subarachnoid Hemorrhage After Aneurysm Occlusion.

Authors: Mathias Kunz; Sebastian Siller; Carolina Nell; Roman Schniepp; Franziska Dorn; Volker Huge; Joerg-Christian Tonn; Hans-Walter Pfister; Christian Schichor
Journal: World Neurosurg Date: 2018-06-27 Impact factor: 2.104

3. High-Dose Nadroparin Following Endovascular Aneurysm Treatment Benefits Outcome After Aneurysmal Subarachnoid Hemorrhage.

Authors: Rene Post; IJsbrand A J Zijlstra; Rene van den Berg; Bert A Coert; Dagmar Verbaan; W Peter Vandertop
Journal: Neurosurgery Date: 2018-08-01 Impact factor: 4.654

4. Big Data and Machine Learning in Health Care.

Authors: Andrew L Beam; Isaac S Kohane
Journal: JAMA Date: 2018-04-03 Impact factor: 56.272

5. Impact of Early Leukocytosis and Elevated High-Sensitivity C-Reactive Protein on Delayed Cerebral Ischemia and Neurologic Outcome After Subarachnoid Hemorrhage.

Authors: Anirudh Srinivasan; Ashish Aggarwal; Sachin Gaudihalli; Manju Mohanty; Manju Dhandapani; Harminder Singh; Kanchan K Mukherjee; Sivashanmugam Dhandapani
Journal: World Neurosurg Date: 2016-02-18 Impact factor: 2.104

6. Cardiac injury associated with neurogenic pulmonary edema following subarachnoid hemorrhage.

Authors: S A Mayer; M E Fink; S Homma; D Sherman; G LiMandri; L Lennihan; R A Solomon; L M Klebanoff; A Beckford; E C Raps
Journal: Neurology Date: 1994-05 Impact factor: 9.910

7. Clinical and predictive significance of hyponatremia after aneurysmal subarachnoid hemorrhage.

Authors: Vladimir Vrsajkov; Gordana Javanović; Snežana Stanisavljević; Arsen Uvelin; Jelena Panti Vrsajkov
Journal: Balkan Med J Date: 2012-09-01 Impact factor: 2.021

8. Hyponatremia after aneurysmal subarachnoid hemorrhage: Implications and outcomes.

Authors: Pp Saramma; R Girish Menon; Adesh Srivastava; P Sankara Sarma
Journal: J Neurosci Rural Pract Date: 2013-01

9. Pituitary hormone level changes and hypxonatremia in aneurysmal subarachnoid hemorrhage.

Authors: Aisha Maimaitili; Mijiti Maimaitili; Aikeremu Rexidan; Junyi Lu; Kuerban Ajimu; Xiaojiang Cheng; Kun Luo; Duishanbai Sailike; Yuan Liu; Kadeer Kaheerman; Changjiu Tang; Tingrong Zhang
Journal: Exp Ther Med Date: 2013-04-18 Impact factor: 2.447

10. Early prophylactic anticoagulation after subarachnoid hemorrhage decreases systemic ischemia and improves outcome.

Authors: Annika Hantsche; Florian Wilhelmy; Johannes Kasper; Tim Wende; Gordian Hamerla; Stefan Rasche; Jürgen Meixensberger; Dirk Lindner
Journal: Clin Neurol Neurosurg Date: 2021-07-10 Impact factor: 1.876