| Literature DB >> 35022512 |
Belal Alsinglawi1, Osama Alshari2, Mohammed Alorjani3, Omar Mubin1, Fady Alnajjar4, Mauricio Novoa5, Omar Darwish6.
Abstract
This work introduces a predictive Length of Stay (LOS) framework for lung cancer patients using machine learning (ML) models. The framework proposed to deal with imbalanced datasets for classification-based approaches using electronic healthcare records (EHR). We have utilized supervised ML methods to predict lung cancer inpatients LOS during ICU hospitalization using the MIMIC-III dataset. Random Forest (RF) Model outperformed other models and achieved predicted results during the three framework phases. With clinical significance features selection, over-sampling methods (SMOTE and ADASYN) achieved the highest AUC results (98% with CI 95%: 95.3-100%, and 100% respectively). The combination of Over-sampling and under-sampling achieved the second-highest AUC results (98%, with CI 95%: 95.3-100%, and 97%, CI 95%: 93.7-100% SMOTE-Tomek, and SMOTE-ENN respectively). Under-sampling methods reported the least important AUC results (50%, with CI 95%: 40.2-59.8%) for both (ENN and Tomek- Links). Using ML explainable technique called SHAP, we explained the outcome of the predictive model (RF) with SMOTE class balancing technique to understand the most significant clinical features that contributed to predicting lung cancer LOS with the RF model. Our promising framework allows us to employ ML techniques in-hospital clinical information systems to predict lung cancer admissions into ICU.Entities:
Mesh:
Year: 2022 PMID: 35022512 PMCID: PMC8755804 DOI: 10.1038/s41598-021-04608-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 3Lung cancer LOS predictive framework in ICU settings.
Figure 1Confusion matrix for class-balancing techniques for lung cancer LOS with CS.
A Comparison between class-balancing methods using clinical significance features selection (CS) and random forest model with (*Confidence Interval “CI %95”).
| Method | Sensitivity* | Specificity | AUC* | IBA* | G.mean* |
|---|---|---|---|---|---|
| SMOTE-CS | 98% [95.3-100] | 98% [95.3–100] | 98%[95.3–100] | 96% [92.2–99.8] | 98% [95.3–100] |
| ADASYN-CS | 100% | 100% | 100% | 100% | 100% |
| ENN-CS | 89% [82.9–95.1] | 11% [4.9–17.1] | 50% [40.2–59.8] | 0% | 0% |
| TomekLinks-CS | 96% [92.2-99.8] | 4% [0.2–7.8] | 50% [40.2] | 0% | 0% |
| SMOTETomek-CS | 98% [95.3–100] | 98% [95.3–100] | 98% [95.3–100] | 96% [95.3–100] | 98% [95.3–100] |
| SMOTE-ENN-CS | 97% [93.7–100] | 98% [95.3–100] | 97% [93.7-100] | 94% [89.3–98.7] | 97% [93.7–100] |
Figure 2SHAP (mean value; the impact of each model’s (features) on the model output magnitude for selected Class-balancing methods with RF.