| Literature DB >> 33948244 |
Jaron Arbet1, Cole Brokamp2,3, Jareen Meinzen-Derr2,3, Katy E Trinkley4,5, Heidi M Spratt6.
Abstract
Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design. © The Association for Clinical and Translational Science 2020.Entities:
Keywords: Machine learning; electronic health record; healthcare research; research methodology; translational research
Year: 2020 PMID: 33948244 PMCID: PMC8057454 DOI: 10.1017/cts.2020.513
Source DB: PubMed Journal: J Clin Transl Sci ISSN: 2059-8661
Fig. 1.Illustration of the iterative machine learning process.
Metrics for evaluating prediction accuracy for various types of outcomes
| Outcome type | Prediction accuracy metric[ |
|---|---|
| Continuous | mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), proportion of outcome variance explained (R2) |
| Binary | AUC, C-statistic, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), precision, recall |
| Categorical with three or more categories | “one-versus-all[ |
| Time-to-event | C-statistic, Integrated Brier Score |
Most of the above performance metrics are defined in Kuhn [79]. For machine learning with time-to-event outcomes, Ishwaran [111] used the C-statistic [36], while Bou-Hamad [112] used the Integrated Brier Score [113].
For a categorical outcome with three or more categories, “one-versus-all” versions of metrics for a binary outcome can be used which assess prediction accuracy for one category versus all other categories combined.
General properties of different machine learning models (adapted from Kuhn [12] and Hastie et al. [2]): ✓ = good, ○ = fair, × = poor
| Property | Classical | CART | Random forests | Boosted trees | Support | MARS | Neural | Penalized |
|---|---|---|---|---|---|---|---|---|
| Allows P > N | × | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Automatic nonlinear and interaction effects | × | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | × |
| Can handle many irrelevant predictors | × | ✓ | ✓ | ✓ | × | ✓ | × | ✓ |
| #Tuning parameters | 0 | 1 | 0–1 | 3 | 1–3 | 1–2 | ≥2 | 1–2 |
| Robust to outliers/noise in predictors | × | ✓ | ✓ | ✓ | × | × | × | × |
| Handles missing values in predictors | × | ✓[ | ✓[ | ✓[ | × |
| × | × |
| Necessary pre-processing[ | Corr | CS | CS, Corr | CS | ||||
| Computation time | ✓ | ✓ |
| × | × | ✓ | × | ✓ |
| Interpretability | ✓ | ✓ | × | × | × | ✓ | × | ✓ |
| R software packages | lm(), glm() | partykit, | partykit, randomForestSRC, ranger, randomForest | gbm, xgboost | e1071, kernlab | earth | nnet, tensorflow, keras | glmnet, ncvreg |
Tree-based models can naturally handle missing values in predictors using a method called “surrogate splits” [61]. Although not all software implementations support this, example software that does allow missing predictor values in trees are the rpart [115] and partykit [116] R packages. Other acronyms: “P > N”: the total number of predictors “P” is much larger than the total number of samples “N”; CART: classification and regression tree; MARS: multivariate adaptive regression splines; LASSO: least absolute shrinkage and selection operator.
In theory, MARS can handle missing values [11]; however, we are not aware of any free software that supports this.
Corr: remove highly correlated predictors, CS: center and scale predictors to be on the same scale (i.e. cannot naturally handle a mix of categorical and numeric predictors on their original measurement scales).