| Literature DB >> 34977663 |
Yining Lu1, Kyle Kunze2, Matthew R Cohn3, Ophelie Lavoie-Gagne3, Evan Polce3, Benedict U Nwachukwu1, Brian Forsythe3.
Abstract
PURPOSE: To develop and internally validate a machine-learning algorithm to reliably predict cost after anterior cruciate ligament reconstruction (ACLR).Entities:
Year: 2021 PMID: 34977663 PMCID: PMC8689347 DOI: 10.1016/j.asmr.2021.10.013
Source DB: PubMed Journal: Arthrosc Sports Med Rehabil ISSN: 2666-061X
Baseline Characteristics of the Study Population, n = 7,311
| Variable | n (%), Median (IQR) |
|---|---|
| Demographics and clinical history | |
| Age | 31 (24-41) |
| Sex | |
| Female | 2,808 (38.4) |
| Male | 4,503 (61.6) |
| Race | |
| White | 4,368 (59.7) |
| Black | 604 (8.3) |
| Hispanic | 616 (8.4) |
| Asian or Pacific Islander | 317 (4.3) |
| Native American | 34 (0.5) |
| Other | 1,372 (18.8) |
| Hispanic | |
| Not Hispanic | 6,691 (91.5) |
| Hispanic, White | 186 (2.5) |
| Hispanic, Black | 41 (0.6) |
| Hispanic, other race | 393 (5.4) |
| Insurance status | |
| Medicare | 46 (0.6) |
| Medicaid | 954 (13.0) |
| Private insurance | 5,094 (69.7) |
| Self-pay | 424 (5.8) |
| No charge | 4 (0.1) |
| Other | 789 (10.8) |
| Discharge quarter | |
| 1 | 1,896 (25.9) |
| 2 | 1,948 (26.6) |
| 3 | 1,668 (22.8) |
| 4 | 1,799 (24.6) |
| Number of chronic conditions | 1 (0-1) |
| Operative characteristics | |
| Anesthesia | |
| MAC/IV sedation | 642 (8.8) |
| Local anesthesia | 335 (4.6) |
| General anesthesia | 4,695 (64.2) |
| Regional anesthesia | 1,367 (18.7) |
| OR time | 118 (89-150) |
| Concomitant procedures | |
| Meniscal repair | 299 (4.1) |
| Menisectomy | 1,349 (18.5) |
| Microfracture | 90 (1.2) |
| Synovectomy | 71 (1.0) |
| Graft from distance | 15 (0.2) |
| Plica excision | 50 (0.7) |
| Community characteristics | |
| Median household income state quartile | |
| 1 | 1,257 (17.2) |
| 2 | 1,613 (22.1) |
| 3 | 1,908 (26.1) |
| 4 | 2,533 (34.6) |
| Median household income for patient ZIP code | |
| 1 | 1,257 (17.2) |
| 2 | 1,613 (22.1) |
| 3 | 1,908 (26.1) |
| 4 | 2,533 (34.6) |
| Patient location: CBSA | |
| Non-CBSA | 148 (2.0) |
| Micropolitan statistical area | 354 (4.8) |
| Metropolitan statistical area | 6,809 (93.1) |
| Total charges | |
| High | 845 (11.6) |
| Average | 6,278 (85.9) |
| Low | 188 (2.6) |
CBSA, core-based statistical area; IQR, interquartile range; IV, intravenous; MAC, monitored anesthesia care; OR, operating room.
Fig 1Simplified graphic demonstrating the basic decision-tree: from the root node (an example patient), the algorithm takes the case through several branching points based on the feature space until a leaf node is reached, where the patient falls into a cohort that cannot be further split, and the predicted probability and label are provided accordingly. (OR, operating room.)
Fig 2(A) Variable importance plot of the random forest for patients with predicted charges <$1,660.57 and (B) those with predicted charges >$1,6707.9. The variable importance plot demonstrates a global ranking of variables that were the most contributory to improved model performance, importance is relative and provided as a dimensionless quantity. (IV, intravenous care; MAC, monitored anesthesia care; OR, operating room.)
Model Performances on Internal Validation via 0.632 Bootstrap
| Metric | Accuracy | AUROC | Multinomial Brier Score |
|---|---|---|---|
| Elastic net | 0.8614 (0.8613-0.8616) | 0.799 (0.798-0.801) | 0.224 (0.223-0.225) |
| Random forest | 0.8783 (0.8782-0.8784) | 0.848 (0.847-0.849) | 0.208 (0.207-0.209) |
| XGBoost | 0.8742 (0.8741-0.8742) | 0.849 (0.847-0.850) | 0.208 (0.207-0.209) |
| SVM | 0.8688 (0.8687-0.8689) | 0.783 (0.782-0.783) | 0.231 (0.230-0.232) |
AUROC, area under receiver operator curve SVM, support vector machines; XGBoost, extreme gradient boosted machine.
Null Brier: 1.27
Fig 3(A) Calibration and (B) discrimination as illustrated by the area under the receiver operating characteristics curve (AUROC) between low-cost and high-cost patients of the random forest algorithm. The ideal calibration curve should have an intercept of 0 and a slope of 1.
Fig 4Decision curve analysis of comparing the complete model with model predictions using only OR time. The downsloping line marked by “all” plots the net benefit from the default strategy of changing management for all patients, while the horizontal line marked “none” represents the strategy of changing management for none of the patients (net benefit is zero at all thresholds). The “all” line slopes down because at a threshold of zero, false positives are given no weight relative to true positives; as the threshold increases, false positives gain increased weight relative to true positives and the net benefit for the default strategy of changing management for all patients decreases. (OR, operating room.)
Fig 5Example of individual patient-level explanation for random forest algorithm predictions. This patient had a predicted probability of 96.7% of incurring cost within one standard deviation of the median cost, and features that supported this prediction included the use of regional anesthesia, no chronic conditions, and OR time <106. (OR, operating room.)
Definition of Machine Learning Concepts and Methods Used
| Term | Definition |
|---|---|
| Multiple imputation | A popular method for handling missing data, which is often a source of bias and error in model output. In this approach, missing value in the dataset is replaced with an imputed value based on a statistical estimation; this process is repeated randomly resulting in multiple “completed” datasets, each consisting of observed and imputed values. These are combined utilizing a simple formulae known as Rubin’s rule to give final estimates of target variables. |
| Recursive feature elimination | A feature selection algorithm that searches for an optimal subset of features by fitting a given machine learning algorithm (random forest and naïve Bayes in our case) to the predicted outcome, ranking the features by importance, and removing the least important features, this is done repeatedly, in a “recursive” manner until a specified number of features remain or a threshold value of a designated performance metric has been reached. The features can then be entered as inputs into the candidate models for prediction of the desired outcome. |
| 0.632 Bootstrapping | The method for training an algorithm based on the input features selected from recursive feature elimination. Briefly, model evaluation consists of reiterative partitions of the complete dataset into train and test sets. For each combination of train and test set, the model is trained on the train set using 10-fold cross validation repeated 3 times. The performance of this model is then evaluated on the respective test set, and no data points from the training set was included in the test set. This sequence of steps is then repeated for 999 more data partitions. |
| Extreme gradient boosting | Algorithm of choice among stochastic gradient boosting machines, a family in which multiple weak classifiers (a classifier that predicts marginally better than random) are combined (in a process known as boosting) to produce an ensemble classifier with a superior generalized misclassification error rate. |
| Random forest | Algorithm of choice among tree-based algorithms, an ensemble of independent trees, each generating predictions for a new sample chosen from the training data, and whose predictions are averaged to give the forest’s prediction. The ensembling process is distinct in principle from gradient boosting. |
| Neural network | A nonlinear regression technique based on one or more hidden layers consisting of linear combinations of some or all predictor variables, through which the outcome is modeled, these hidden layers are not estimated in a hierarchical fashion. The structure of the network mimic neurons in a brain. |
| Elastic-net penalized logistic regression | A penalized linear regression based on a function to minimize the squared errors of the outputs, belongs to the family of penalized linear models including ridge regression and the lasso. |
| Support vector machines | A supervised learning algorithm that performs classification problems by representation of each data point as a point in abstract space and defines a plane known as a hyperplane that separates the points into distinct binary classes, with maximal margin. Hyperplanes can be linear or nonlinear, as we have implemented in the presented analysis, using a circular kernel. |
| Area under the receiver operating characteristic curve | A common metric to model performance, utilizing the receiver operating characteristics curve, which plots calculated sensitivity and specificity given the class probability of an event occurring (instead of using a 50:50 probability). The area under the ROC curve classically ranges from 0.5 to 1, with 0.5 being a model that is no better than random and 1 being a model that is completely accurate in assigning class labels. |
| Calibration | The ability of a model to output probability estimates that reflect the true event rate in repeat sampling from the population. An ideal model is a straight line with intercept 0 and slope of 1 (i.e., perfect concordance of model predictions to observed frequencies within the data). |
| Brier’s Score | The mean squared difference between predicted probabilities of models and observed outcomes in the testing data. The Brier score can generally range from 0 for a perfect model to 0.25 for a noninformative model. |
| Decision curve analysis | A measure of clinical utility whereby a clinical “net benefit” for one or more prediction models or diagnostic tests is calculated in comparison to default strategies of treating all or no patients. This value is calculated based on a set threshold, defined as the minimum probability of disease at which further intervention would be warranted. The decision curve is constructed by plotting the ranges of threshold values against the net benefit yielded by the model at each value; as such, a model curve that is farther from the bottom left corner yields more net benefit than one that is closer. |