| Literature DB >> 35756926 |
João Fonseca1, Xiuyun Liu2, Hélder P Oliveira1,3, Tania Pereira1.
Abstract
Background: Traumatic Brain Injury (TBI) is one of the leading causes of injury related mortality in the world, with severe cases reaching mortality rates of 30-40%. It is highly heterogeneous both in causes and consequences, complicating medical interpretation and prognosis. Gathering clinical, demographic, and laboratory data to perform a prognosis requires time and skill in several clinical specialties. Machine learning (ML) methods can take advantage of the data and guide physicians toward a better prognosis and, consequently, better healthcare. The objective of this study was to develop and test a wide range of machine learning models and evaluate their capability of predicting mortality of TBI, at hospital discharge, while assessing the similarity between the predictive value of the data and clinical significance.Entities:
Keywords: Traumatic Brain Injury; clinical significance; feature importance; feature selection; intensive care unit; machine learning; mortality prediction
Year: 2022 PMID: 35756926 PMCID: PMC9226580 DOI: 10.3389/fneur.2022.859068
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.086
Figure 1Overview of the pipeline describing the developed approach. Starting with pre-processing of the data, followed by a data split, feature engineering and sampling, and subsequent training and testing several learning models.
Figure 2Characterization of the dataset. (A) Proportion of class labels, (B) Frequency plot of the number of patients per age. (C) Frequency plot of the number of patients vs. the number of days since the injury to the death.
Summary of the different types of data and parameters in the Hackathon Pediatric Traumatic Brain Injury (HPTBI) dataset.
|
|
|
|
|---|---|---|
| CT positive for cerebral edema or brain swelling? | Catheter type, quantity, and length of stay | Age |
| CT positive for compression or effacement of the basilar cisterns? | ICP | Where did the patient go when they left the ED? |
| CT positive for epidural hematoma? | Did the patient have a cardiac arrest? | Sex |
| CT positive for intraparenchymal hemorrhage? | Did the patient receive a decompressive craniectomy? | Days from injury to admission |
| CT positive for intraventricular hemorrhage? | Did the patient receive enteral nutrition? | |
| CT positive for midline shift? | Did the patient have an epidural hematoma evacuated? | |
| CT positive for skull fracture? | Cardiac arrest | |
| CT positive for subarachnoid hemorrhage? | GCS | |
| CT positive for subdural hematoma? | GCS ED | |
| Pharmaceuticals ordered |
Computed Tomography.
Intracranial pressure.
Glasgow Coma Scale.
Intensive Care Unit.
Emergency Department.
Table summarizing the tuned hyperparameters for each model and the corresponding best and most frequent values.
|
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||
| KNN | Number of neighbors | 1 : 1 : 10 | 6 | 9 | 9 | 6 | 7 | 9 | 9 | 8 |
| Weights | Uniform, distance | Distance | Distance | Distance | Distance | Distance | Distance | Distance | Distance | |
| Distance metric | Manhattan, Euclidean | Manhattan | Manhattan | Euclidean | Manhattan | Manhattan | Manhattan | Euclidean | Manhattan | |
| RF | Number of estimators | 20 : 1 : 50 | 25 | 37 | 37 | 25 | 25 | 37 | 42 | 37 |
| Max depth of the tree | 10, 30, 50, 85, 100, None | 30 | 30 | 85 | 30 | 30 | 30 | None | 10 | |
| Max features to split | Square root, log2 | Square root | Square root | Square root | Square root | log2 | Square root | Square root | Square root | |
| Minimum samples per Leaf | 1,2,5,8,10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Minimum samples to split | 1,2,5,8,10 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| ANN | Solver | LBFGS, Stochastic Gradient Descent, ADAM | lbfgs | lbfgs | lbfgs | lbfgs | lbfgs | lbfgs | lbfgs | lbfgs |
| Activation function | Identity, logistic, tanH, ReLU | tanH | tanH | tanH | tanH | tanH | tanH | tanH | tanH | |
| Alpha | 0.0001,0.001, 0.01, 0.05, 0.1 | 0.01 | 0.001 | 0.01 | 0.05 | 0.01 | 0.001 | 0.001 | 0.01 | |
| Learning rate | Constant, adaptive | adaptive | Constant | Adaptive | Constant | Adaptive | Constant | Adaptive | Adaptive | |
| XGBoost | Number of estimators | 50, 100, 1000 | 100 | 1,000 | 100 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
| Max depth | 1, 3, 7, 10 | 7 | 7 | 10 | 7 | 7 | 7 | 7 | 7 | |
| Subsample | 0.3 : 1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Alpha | 0.0001,0.001, 0.01, 0.05, 0.1 | 0.001 | 0.001 | 0.0001 | 0.001 | 0.0001 | 0.0001 | 0.0001 | 0.001 | |
| Colsample by tree | 0.3 : 1.0 | 0.5 | 0.3 | 0.5 | 0.3 | 0.5 | 0.5 | 0.5 | 0.3 | |
| Learning rate | 0.001, 0.01, 0.05, 0.1, 1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.05 | |
The best value is the value of the hyperparameters on the best estimator among all 50 trials. The most frequent value is the value that shows up the most among the 50 estimators that are obtained throughout the 50 trials.
Koehrsen's Feature Selector.
Principal Component Analysis.
Independent Component Analysis.
Feature Selection.
k-Nearest Neighbors.
Random Forest.
Artificial Neural Networks.
eXtreme Gradient Boosting.
Figure 3Comparison of model scores per feature selection method: (A) k-Nearest Neighbors (KNN), (B) Artificial Neural Networks (ANN), (C) Random Forest (RF), (D) eXtreme Gradient Boosting (XGBoost).
Figure 4Ten most important features according to normalized feature importance computed by Koerhsen's feature selector tool, which uses a Gradient Boost Model.
Figure 5Ten most important features according to Gini feature importance computed by random forest.
Figure 6Ten most important features according to Gini feature importance computed by XGBoost.
Figure 7Frequency of Glasgow Coma Scale (GCS) scores (A) evaluated in the intensive care unit (ICU), (B) evaluated in the emergency department (ED). The scores range from 3 (completely unresponsive) to 15 (completely responsive).