| Literature DB >> 34728454 |
Amin Naemi1, Thomas Schmidt2, Marjan Mansourvar2, Mohammad Naghavi-Behzad3,4, Ali Ebrahimi2, Uffe Kock Wiil2.
Abstract
OBJECTIVES: This systematic review aimed to assess the performance and clinical feasibility of machine learning (ML) algorithms in prediction of in-hospital mortality for medical patients using vital signs at emergency departments (EDs).Entities:
Keywords: health informatics; information management; information technology
Mesh:
Year: 2021 PMID: 34728454 PMCID: PMC8565537 DOI: 10.1136/bmjopen-2021-052663
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Research questions
| Q1 | What ML techniques have been used to predict in-hospital mortality in EDs? |
| Q2 | What are the common vital signs variables used in studies? |
| Q3 | How researchers prepared data for ML techniques? |
| Q4 | What are the approaches to solve the problem (eg, binary classification or time series prediction)? |
| Q5 | What are the challenges and open issues in this domain? |
EDs, emergency departments; ML, machine learning.
Search keywords in different groups
| Group 1—ML keywords | Artificial intelligence, machine learning, deep learning, learning algorithms, supervised machine learning, unsupervised machine learning |
| Group 2—Medical keywords | Clinical deterioration, mortality, in-hospital mortality, death, vital sign, emergency departments |
| Group 3—Document type | Journal |
| Group 4—Publication year | 1 January 2010 to 1 August 2021 |
| Group 5—Language | English |
| Final result | (Group 1) AND (Group 2) AND (Group 3) AND (Group 4) AND (Group 5) |
ML, machine learning.
Inclusion and exclusion criteria
| Inclusion criteria | Exclusion criteria |
| Prediction of in-hospital mortality should be the main aim of the study | Studies which are conference articles, posters, abstracts, books or book chapters and review articles. |
| The study should be done at ED. | Studies which were done in paediatrics field. |
| The study should be done on adult patients. | Non-English studies. |
| ML algorithms should be used for the prediction task. | |
| Vital signs variables should be among the predictors used to build ML models. |
ED, emergency department; ML, machine learning.
Figure 1Flow diagram of study selection (PRISMA chart). PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
General characteristics of included studies
| ID | Authors | Year | Country | Study type | Population | Outcome portion (%) | Patients type | Mean age (y) | Male (%) | Vital signs |
| A1 | Levin | 2017 | USA | Retrospective | 172 726 | 0.40 | All admissions | 46 | 44.7 | TP, HR, RR, SBP, SpO2 |
| A2 | Klug | 2020 | Israel | Retrospective | 799 522 | 1.55 | All admissions | 55 | 51.5 | TP, HR, SBP, DBP, SpO2 |
| A3 | Faisal | 2020 | UK | Retrospective | 24 696 | 5.33 | All admissions | 63 | 46.9 | RR, SpO2, SBP, PR, GCS, TP |
| A4 | Kwon | 2020 | South Korea | Retrospective | 23 587 | 3.98 | Infectious patients | 63 | 46.1 | SBP, RR, TP, HR |
| A5 | Raita et al | 2019 | USA | Retrospective | 135 470 | 2.10 | All admissions | 46 | 56.8 | TP, PR, SBP, DBP, RR, SpO2 |
| A6 | Liu | 2011 | Singapore | Retrospective | 100 | 40.00 | All admissions | 65 | 63 | RR, PR, SBP, DBP, SpO2, GCS |
| A7 | Karlsson | 2021 | Sweden | Retrospective | 445 | 14.1 | Infectious patients | 73 | 52.6 | HR, SBP, DBP, RR, TP, SpO2 |
| A8 | Chen | 2021 | Taiwan | Retrospective | 52 626 | 9.4 | Infectious patients | 72 | 58.3 | HR, SBP, DBP, TP, RR, SpO2, GCS |
| A9 | Joseph | 2020 | USA | Retrospective | 445 925 | <13 | All admissions | 53 | 45.9 | HR, SBP, DBP, RR, TP, SpO2 |
| A10 | Rodriguez | 2021 | Columbia | Retrospective/ | 2510 | 11.5 | Infectious patients | 62 | 49.8 | SBP, DBP, TP, RR, SpO2, GCS |
| A11 | Soffer | 2020 | Israel | Retrospective | 118 262 | 5.3 | All admissions | 73 | 52.6 | HR, SBP, DBP, TP, RR, SpO2 |
| A12 | van Doorn | 2021 | Netherlands | Retrospective | 1344 | 13 | Infectious patients | 71 | 54.4 | HR, SBP, DBP, TP, RR, SpO2, GCS |
| A13 | Taylor | 2015 | USA | Retrospective | 5278 | 4.92 | Infectious patients | 65 | 55 | HR, TP, SBP, DBP, RR, SpO2 |
| A14 | Perng | 2019 | Taiwan | Retrospective | 42 220 | 4.71 | Infectious patients | 64 | 56.5 | SBP, GCS, TP, HR |
| A15 | Shamout | 2020 | UK | Retrospective | 37 284 | 0.80 | All admissions | 68 | 48.8 | HR, SBP, RR, TP, SpO2 |
DBP, diastolic blood pressure; GCS, Glasgow coma scale; HR, heart rate; PR, pulse rate; RR, respiratory rate; SBP, systolic blood pressure; SpO2, oxygen saturation; TP, body temperature.
Selected articles list and their ML-related characteristics
| Id | ML algorithms | Evaluation metrics | Time horizon | Personalised | Handling missing values | Hyperparameter opgtimisation | Approach | Validation |
| A1 | RF | AUC | – | No | No | No | Binary classification | Internal |
| A2 | GB | AUC, sensitivity, specificity, Brier Score | – | No | No | No | Binary classification | Internal |
| A3 | LR, RF, SVM, NN | AUC, Brier Score | – | No | No | Yes | Binary classification | External |
| A4 | GB, RF | AUC | 72 hours | No | No | Yes | Binary classification | External |
| A5 | LR, RF, GB DT, DNN | AUC, sensitivity, specificity | – | No | No | Yes | Binary classification | Internal |
| A6 | NN, SVM | Accuracy, sensitivity, specificity | 9–12 min | No | No | No | Binary classification | Internal |
| A7 | RF | AUC, sensitivity, specificity, PPV, NPV, PLR, NLR | 7 days, 30 days | No | No | No | Binary classification | Internal |
| A8 | SVM, GB, NN | AUC, accuracy, sensitivity, specificity, PPV, NPV | – | No | Yes | No | Binary classification | Internal |
| A9 | LR, DNN, GB | AUC, accuracy, sensitivity, specificity, | – | No | No | Yes | Binary classification | Internal |
| A10 | DT, RF, NN, SVM | AUC, accuracy | – | No | No | No | Binary classification | Internal |
| A11 | GB | AUC, sensitivity, specificity, NPV, PPV, FPR | – | No | Yes | No | Binary classification | Internal |
| A12 | LR, NN, RF, GB | AUC, sensitivity, specificity | 31 days | No | No | No | Binary classification | Internal |
| A13 | RF, DT, LR | AUC | – | No | Yes | No | Binary classification | Internal |
| A14 | KNN, SVM, RF, DNN | AUC, accuracy | 72 hours, 28 days | No | Yes | No | Binary classification | Internal |
| A15 | DNN | AUC | 2 hours, 24 hours | Yes | Yes | No | Time series regression | External |
AUC, area under the curve; DNN, deep neural networks; DT, decision tree; FPR, false positive rate; GB, gradient boosting; KNN, K-nearest neighbours; LR, logistic regression; ML, machine learning; NLR, negative likelihood ratio; NN, neural networks; NPV, negative predictive value; PLR, positive likelihood ratio; PPV, positive predictive value; RF, random forest; SVM, support vector machine.
Summary of machine learning algorithms’ description, advantages and disadvantages
| Algorithm | Description | Advantages | Disadvantages |
| LR | LR is a supervised ML algorithm adopted from linear regression. It can be used for classification problems and finding the probability of an event happening. | Fast training, good for small data sets, easy to understand. | Not very accurate, not proper for non-linear problems, high chance of overfitting, not flexible to adopt to complex data sets. |
| DT | DT is a supervised ML algorithm that solves a problem by transforming the data into a tree representation where each internal node represents an attribute, and each leaf denotes a class label. | Easy to understand and interpret, robust to outliers, no standardisation or normalisation required, useful for regression and classification. | High chance of overfitting, not suitable for large data sets, adding new samples lead to regeneration of the whole tree. |
| KNN | KNN is a supervised and instance-based ML algorithm. It can be used when we want to forecast a label of a new sample based on similar samples with known labels. Different similarity or distance measures such as Euclidean can be used. | Simple and easy to understand, easy to implement, no need for training, useful for regression and classification. | Memory intensive, costly, slow performance, all training data might be involved in decision-making. |
| SVM | SVM is an instance-based and supervised ML technique that generates a boundary between classes known as hyperplane. Maximising the margin between classes is the main goal of this technique. | Efficient in high dimensional spaces. Effective when the number of dimensions is greater than the number of samples, long training time, useful for regression and classification. | Not suitable for large data sets, not suitable for noisy data sets, Regularisation capabilities which prevent overfitting, handling non-linear data. |
| GB | GB is a supervised ML algorithm, which produces a model in the form of an ensemble of weak prediction models, usually DT. GB is an iterative gradient technique that minimises a loss function by iteratively selecting a function that points towards the negative gradient. | High accuracy, high flexibility, fast execution, useful for regression and classification, robust to missing values and overfitting. | Sensitive to outliers, not suitable for small data sets, many parameters to optimise. |
| RF | RF is an ensemble and supervised ML algorithm that is based on the bagging technique, which means that many subsets of data are randomly selected with replacement and each model such as DT is trained using one subset. The output is the average of all predictions of various single models. | High accuracy, fast execution, useful for regression and classification, robust to missing values and overfitting. | Not suitable for limited data sets, may change considerably by a small change in the data. |
| NN | NN is a family of supervised ML algorithms. It is inspired by biological neural network of the human brain. NN consists of input, hidden, output layers and multiple neurons (nodes) carry data from input layer to output layer. | Accurate, suitable for complex, non-linear classification and regression problems. | Very slow to train and test, requires large amount of data, computationally expensive and prone to overfitting. |
| DNN | DNN is a family of supervised ML algorithms. DNN is based on NN where the adjective ‘deep’ comes from the use of multiple layers in the network. Usually having two or more hidden layers counts as DNN. There are some specific training algorithms and architecture such as LSTM, GAN, CNN for DNNs. DNNs provide the opportunity to solve complex problems when the data are very diverse, unstructured and interconnected. | High accuracy, features are automatically deduced and optimally tuned, robust to noise, architecture is flexible. | Needs very large amount of data, computationally expensive, not easy to understand, no standard theory in selecting the right settings, difficult for less skilled researchers. |
CNN, convolutional neural networks; DNN, deep neural networks; DT, decision tree; GAN, generative adversarial networks; GB, gradient boosting; KNN, K-nearest neighbours; LR, logistic regression; LSTM, long-short term memory networks; ML, machine learning; NN, neural networks; RF, random forest; SVM, support vector machine.
Risk of bias (ROB) assessment of included studies according to PROBAST checklist
| ID | PROBAST items | ||||
| Participants | Predictors | Outcomes | Sample size and missing data | Statistical analysis | |
| A1 | + | + | + | – | – |
| A2 | + | ? | + | – | – |
| A3 | + | + | + | – | – |
| A4 | + | ? | + | – | – |
| A5 | + | + | + | – | – |
| A6 | – | ? | + | – | – |
| A7 | – | + | + | – | – |
| A8 | + | + | + | + | – |
| A9 | + | + | + | ? | + |
| A10 | ? | + | + | – | ? |
| A11 | + | + | + | ? | ? |
| A12 | ? | + | + | – | ? |
| A13 | + | + | ? | + | ? |
| A14 | + | ? | ? | – | ? |
| A15 | + | + | + | + | ? |
+, low risk of bias; –, high risk of bias; ?, unclear risk of bias.
PROBAST, prediction risk of bias assessment tool.