| Literature DB >> 34984485 |
Katrin B Johannesdottir1, Henrik Kehlet2, Pelle B Petersen2, Eske K Aasvang2,3, Helge B D Sørensen1, Christoffer C Jørgensen2.
Abstract
Background and purpose: Prediction of postoperative outcomes and length of hospital stay (LOS) of patients is vital for allocation of healthcare resources. We investigated the performance of prediction models based on machinelearning algorithms compared with a previous risk stratification model using traditional multiple logistic regression, for predicting the risk of a LOS of > 2 days after fast-track total hip and knee replacement. Patients and methods: 3 different machine learning classifiers were trained on data from the Lundbeck Centre for Fast-track Hip and Knee Replacement Database (LCDB) collected from 9,512 patients between 2016 and 2017. The chosen classifiers were a random forest classifier (RF), a support vector machine classifier with a polynomial kernel (SVM), and a multinomial Naïve-Bayes classifier (NB).Entities:
Mesh:
Year: 2022 PMID: 34984485 PMCID: PMC8815306 DOI: 10.2340/17453674.2021.843
Source DB: PubMed Journal: Acta Orthop ISSN: 1745-3674 Impact factor: 3.717
Figure 1Tthe study population. DNPR: Danish National Patient Registry, THA: Total hip arthroplasty, TKA: Total knee arthroplasty, LCDB: The Lundbeck Foundation Centre for Fast-track Hip and Knee replacement Database, LOS: Length of hospital stay.
The 22 patient characteristics used as an input vector for the binary classification and the clinical outcome of length of stay > 2 days. Values are count (%) unless otherwise specified
| Factor | No. of patients | Missing |
|---|---|---|
| Mean age [SD] | 68 [10] | 0 (0) |
| Mean BMI [SD] | 28 [5] | 100 (1) |
| Female sex | 6,213 (59) | 0 (0) |
| Walking aid | 2,391 (23) | 195 (2) |
| Living alone | 3,632 (34) | 80 (1) |
| In institution | 75 (1) | |
| Smoking | 1,347 (13) | 89 (1) |
| Alcohol > 24g/day | 790 (8) | 91 (1) |
| Total knee arthroplasty | 4,448 (42) | 0 (0) |
| Psychiatric disease | 1,520 (14) | 0 (0) |
| Cardiac disease | 1,439 (14) | 106 (1) |
| Pulmonary disease | 945 (9) | 64 (1) |
| Hypertension | 5,911 (56) | 0 (0) |
| Non-insulin-dependent diabetes mellitus | 925 (9) | 61 (1) |
| Insulin-dependent diabetes mellitus | 199 (2) | |
| Anticoagulants | 821 (28) | 0 (0) |
| Preoperative anemia | 2,552 (24) | 155 (2) |
| Hypercholesterolemia | 3,102 (29) | 76 (1) |
| Previous cerebral attack | 582 (6) | 158 (2) |
| Previous thromboembolism | 737 (7) | 140 (1) |
| Cancer | 328 (3) | 98 (1) |
| Kidney disease | 176 (2) | 244 (2) |
| LOS > 2 days | 1,863 (18) | 0 (0) |
Characteristics are expressed in a vector for each patient when performing the binary classification. Values are represented as a count of patients carrying each attribute unless specified otherwise.
Results of training a random forest classifier, a support vector machine classifier, and a multinomial Naïve-Bayes classifier using 10-fold cross-validation, compared with a traditional risk calculation method of multiple logistic regression.
| Type | Accuracy | AUC (95% CI) | AUPRC | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|---|
| Random forest classifier | 0.75 | 0.71 (0.70–0.73) | 0.33 | 0.44 | 0.82 | 0.61 |
| Support vector machine classifier | 0.73 | 0.71 (0.69–0.72) | 0.34 | 0.52 | 0.78 | 0.62 |
| Multinomial Naïve-Bayes classifier | 0.64 | 0.66 (0.65–0.68) | 0.23 | 0.60 | 0.64 | 0.56 |
| Multiple logistic regression | 0.83 | 0.70 (0.69–0.72) | N/A | 0.36 | 0.87 | 0.36 |
Calculated from previously published results ().
AUC: Area under the curve.
AUPRC: Area under the precision-recall curve.
Figure 2Receiver operating curves (ROC) and precision-recall curves of the three classification models: the random forest classifier (RF), the support vector machine (SVM) classifier with a polynomial kernel, and a multinomial Naïve-Bayes (NB) classifier. The lines represent the mean ROC and the mean precision-recall curve.