| Literature DB >> 32530947 |
Wenjuan Wang1, Martin Kiik2, Niels Peek3,4, Vasa Curcin1,5,6, Iain J Marshall1, Anthony G Rudd1, Yanzhong Wang1,5,6, Abdel Douiri1,5,6, Charles D Wolfe1,5,6, Benjamin Bray1.
Abstract
BACKGROUND ANDEntities:
Mesh:
Year: 2020 PMID: 32530947 PMCID: PMC7292406 DOI: 10.1371/journal.pone.0234722
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Notations of special machine learning terms.
| Term | Explanation |
|---|---|
| Supervised learning | A subgroup of ML models that requires both predictors and outcomes (labels) |
| Unsupervised learning | A subgroup of ML models meant to find previously unknown patterns in data without pre-existing labels |
| Feature | Predictor or variable in a ML model |
| Feature selection | Variable selection or attribute selection |
| Generalisation ability | The ability of a model to generalise the learned pattern to new data |
| Over-fitting | A model corresponds too closely or exactly to a particular set of data, and may fail to fit new data |
| Missing data mechanism | Three missing-data mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) |
| Imputation | The process of replacing missing data with substituted values |
| Training | The learning process of the data pattern by a model |
| Testing | A validation set used for testing the model |
| LASSO | Least Absolute Shrinkage and Selection Operator: a regression technique that performs both variable selection and regularization |
| Support Vector Machine (SVM) | A supervised classifier that seeks to find the best hyperplane to separate the data |
| Naïve Bayes (NB) | A family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naïve) independence assumptions between the features |
| Bayesian Network (BN) | A type of probabilistic graphical model that uses Bayesian inference for probability computations |
| k-nearest neighbours (kNN) | A type of instance-based learning, where the predictionis only approximated locally with the k nearest neighbours |
| Artificial Neural Network (ANN) | A computational model based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain |
| Decision Tree | A tree with a set of hierarchical decisions which eventually gives a final decision |
| Random Forest (RF) | An ensemble learning method that uses a multitude of decision trees |
| Super learner | A stacking algorithm using cross-validated predictions of other models and assigning weights to these predictions to optimise the final prediction |
| Adaptive network based fuzzy inference system (ANFIS) | A fuzzy Sugeno model put in the framework of adaptive systems to facilitate learning and adaptation |
| Xgboost | A decision-tree-based ensemble ML algorithm that uses a gradient boosting framework |
| Adaptive Boosting, (Adaboost) | An algorithm used in combination with others to convert a set of weak classifiers into a strong one |
| Parameters | Coefficients of a model that need to be learned from the data |
| Hyperparameters | Configurations of a model which are often selected and set before training the model |
| Validation | The process of a trained model evaluated with a testing dataset |
| Discrimination | The ability of a model to separate individual observations in multiple classes |
| Calibration | Adjusting the predicted probability from the model to more closely match the observed probability in the test set |
| Cross-validation (CV) | A model validation technique for assessing how the results of a statistical analysis (model) will generalize to an independent data set |
| Leave One Out CV | A performance measurement approach that uses one observation as the validation set and the remaining observations as the training set |
| Leave One Centre Out CV | A performance measurement approach that uses observations from one centre as the validation set and the remaining observations as the training set |
| Bootstrapping | Resampling multiple new datasets with replacement from the original data set |
Fig 1PRISMA flowchart.
Fig 2Number of papers published according to the algorithms used (top) and outcomes (bottom) predicted at each year.
A brief summary of the included studies.
| Reference | Sample (Feature) size | Outcomes | Predictors/variables/features | Missing values handled | Hyperparameter selection | Validation | Calibration | Best Algorithm | Compared algorithms |
|---|---|---|---|---|---|---|---|---|---|
| Al Taleb et al. 2017 | 358 (15) | Length of Stay | At admission | Single imputation | Not reported | 10-fold CV | No | Bayesian Network | DT (C4.5) |
| Asadi et al. 2014 | 107 (8) | 90-day binary and 7 scale mRS | At admission | Not reported | No | Training, test, validation for ANN, Nested CV for SVM | No | SVM | ANN, Linear Regression |
| Liang et al. 2019 | 435 (4) | 90-day binary mRS | Admission, laboratory data | Not reported | Not reported | Training and test split | No | ANN | LR |
| Heo et al. 2019 | 2604 (38) | 90-day binary mRS | Admission | Complete case analysis | No | Training and test split | No | DNN | RF, LR |
| Konig et al. 2007 | 3184 (43) | 100-day Bathel Index | first 72h after admission | Complete case analysis | Yes, Grid search | Temporal and external validation | Yes for LR | - | RF, SVM, LR |
| Celik et al. 2014 | 570 (22) | 10-day mortality | At admission | - | Yes, grid search | 5-fold CV | No | LR | ANN |
| Ho et al. 2014 | 190 (26) | Discharge mortality | Admission and interventions | Complete case analysis | Not reported | 10-fold CV | No | SVM | Naïve Bayes, DT, RF, PCA+SVM, LR |
| Cox et al. 2016 | 2580 (72) | Post stroke spasticity | Not clear | Not reported | Not reported | Training, test and validation split | No | RF | DT (CART), Adaboost |
| Kruppa et al. 2014 | 3184 (43) | 100-day Bathel Index | First 72h after admission data | Complete case analysis | Yes, For KNN, bNN and RF | Temporal and external validation | Yes, Brier score | SVM and LR | K-NN, b-NN, RF |
| Easton et al. 2014 | 933 (-) | Short/very short mortality | Not clear | Not reported | Yes, DT is pruned | Training and test split | No | - | Naïve Bayes, DT, LR |
| Mogensen and Gerds 2013 | 516 (12) | 3-month/1-year/3-year/5-year mortality | Admissiondata | Complete case analysis | No, manually set up | Bootstrap CV | Yes, Brier score | - | Pseudo RF, Cox Regression, and Random survival forest |
| Van Os et al. 2018 | 1383 (83) | Good reperfusion socre, 3-month binary mRS | Admission, laboratory and treatment data | Multiple imputation by chained equations | Yes, nested CV with random grid search | Nested CV | No | - | RF, SVM, ANN, super learner, LR |
| Peng et al. 2010 | 423 (10) | 30-day mortality | Admission, laboratory, radiographic data | No missing values | Yes, empirically | 4-fold CV | No | RF | ANN, SVM, LR |
| Tokmakci et al. 2008 | 70 (6) | Quality of life | Admissiondata | Not reported | Not reported | Training and test split | No | ANFIS | |
| Monteiro et al. 2018 | 425 (152) | 3-month binary mRS | Admission/2 hours/24 hours/7 days data | single imputation | Yes, Grid search | 10-fold CV | No | RF and Xgboost | DT, SVM, RF, LR (LASSO) |
| Tjortjis et al. 2007 | 671 (37) | 2-month mortality | Admission data | Cases discarded with missing outcomes | Yes, pruned | Training and test split | No | DT (T3) | DT (C4.5) |
| Lin et al. 2018 | 382 (5) | Neurologic deterioration | Admission and laboratory data | Not reported | Yes, CV | Training and test split | No | - | SVM |
| Tanioka et al. 2019 | 95 (20) | Delayed cerebral ischemic after SAH | Admission/1-3 days variables | Complete case analysis | Yes, Grid search | Leave one out CV | No | - | RF |
Fig 3Number of terms reported in each study (top) and number of studies reported for each assessment term (bottom). * indicates criteria adjusted for ML models.
Fig 4Boxplots showing the distribution of sample size and feature size according to algorithms used (top) and outcomes predicted (bottom).