| Literature DB >> 35079127 |
Zheng Wang1, Shandian Zhe1, Joshua Zimmerman2, Candice Morrisey2, Joseph E Tonna3, Vikas Sharma3, Ryan A Metcalf4,5.
Abstract
Accurately predicting red blood cell (RBC) transfusion requirements in cardiothoracic (CT) surgery could improve blood inventory management and be used as a surrogate marker for assessing hemorrhage risk preoperatively. We developed a machine learning (ML) method to predict intraoperative RBC transfusions in CT surgery. A detailed database containing time-stamped clinical variables for all CT surgeries from 5/2014-6/2019 at a single center (n = 2410) was used for model development. After random forest feature selection, surviving features were inputs for ML algorithms using five-fold cross-validation. The dataset was updated with 437 additional cases from 8/2019-8/2020 for validation. We developed and validated a hybrid ML method given the skewed nature of the dataset. Our Gaussian Process (GP) regression ML algorithm accurately predicted RBC transfusion amounts of 0 and 1-3 units (root mean square error, RMSE 0.117 and 1.705, respectively) and our GP classification ML algorithm accurately predicted 4 + RBC units transfused (area under the curve, AUC = 0.826). The final prediction is the regression result if classification predicted < 4 units transfused, or the classification result if 4 + units were predicted. We developed and validated an ML method to accurately predict intraoperative RBC transfusions in CT surgery using local data.Entities:
Mesh:
Year: 2022 PMID: 35079127 PMCID: PMC8789772 DOI: 10.1038/s41598-022-05445-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Feature selection and hybrid machine learning (ML) model. The raw cardiothoracic (CT) surgery dataset contained numerous features. To reduce the features to a manageable number, a random forest feature selection procedure was performed. The selected features were then used as inputs into the Gaussian Process (GP) regression and classification ML algorithms.
Combined characteristics of cases in the training (development) and test (validation) datasets.
| All patients (test + training datasets) | 0 units transfused | 1–3 units transfused | 4 + units transfused | |
|---|---|---|---|---|
| Number of cases | 2847 (100%) | 1962 (69%) | 712 (25%) | 173 (6%) |
| Age (years) | 58.9 | 58.4 | 60.0 | 58.9 |
| Gender (% female) | 31 | 27 | 39 | 42 |
| Not Hispanic/Latino | 89 | 89 | 88 | 86 |
| Hispanic/Latino | 7 | 7 | 7 | 6 |
| Unknown | 4 | 4 | 5 | 8 |
| White/Caucasian | 83 | 85 | 78 | 79 |
| Other | 6 | 6 | 7 | 6 |
| Black or African American | 3 | 3 | 5 | 4 |
| American Indian and Alaska Native | 3 | 2 | 4 | 2 |
| Unknown | 3 | 2 | 2 | 5 |
| Asian | 1 | 1 | 2 | 2 |
| Native Hawaiian and Other Pacific Islander | 1 | 1 | 2 | 2 |
| RBC transfusions (mean) | 1.26 | 0 | 1.62 | 6.34 |
| Plasma transfusions (mean) | 1.36 | 0.73 | 1.83 | 6.58 |
| Platelet transfusions (mean) | 0.66 | 0.34 | 1.17 | 2.18 |
| Hemoglobin before surgery | 12.15 | 13.08 | 10.45 | 10.25 |
| Hemoglobin after surgery | 9.61 | 9.91 | 8.96 | 8.87 |
| Platelet count before surgery | 204 | 211 | 193 | 175 |
| Creatinine before surgery | 1.41 | 1.28 | 1.63 | 1.69 |
| Cell salvage used (ml) | 443 | 433 | 381 | 817 |
| Pre-existing end stage renal disease | 145/2847 (5%) | 53/1962 (3%) | 71/712 (10%) | 21/173 (12%) |
| Pre-existing hypertension | 1647/2847 (58%) | 1094/1962 (56%) | 426/712 (60%) | 127/173 (73%) |
| Pre-existing peripheral artery disease | 96/2847 (3%) | 53/1962 (3%) | 36/712 (5%) | 7/173 (4%) |
| Pre-existing cerebrovascular disease | 132/2847 (5%) | 68/1962 (3%) | 50/712 (7%) | 14/173 (8%) |
| Pre-existing diabetes mellitus | 705/2847 (25%) | 444/1962 (23%) | 215/712 (30%) | 46/173 (27%) |
| Elective | 73 | 80 | 61 | 46 |
| Urgent | 17 | 15 | 24 | 22 |
| Emergent | 10 | 5 | 15 | 32 |
| CABG (26) | CABG (31) | CABG (15) | Ascending aortic dissection (13) | |
| Placement LVAD (7) | Placement LVAD (7) | Mediastinal exploration (10) | Transplant heart (10) | |
| Transplant heart (5) | Replacement aortic valve (5) | Placement LVAD (8) | Placement LVAD (8) | |
| Mediastinal exploration (4) | CABG with aortic valve replacement (3) | Transplant heart (8) | Aortic valve and mitral valve replacement (5) | |
| Replacement aortic valve (4) | Transplant heart (3) | Replacement aortic valve (3) | Mediastinal exploration (4) | |
| CABG with aortic valve replacement (3) | Mitral valve replacement (3) | Ascending aortic aneurysm (3) | CABG (8) | |
| Mitral valve replacement (3) | Minimally invasive aortic valve replacement (3) | CABG with aortic valve replacement (3) | Sternal exploration (4) | |
| Ascending aortic dissection (3) | Ascending aortic aneurysm repair (3) | Placement ECMO (3) | Thoracoabdominal aortic aneurysm repair (4) | |
| Ascending aortic aneurysm repair (2) | Mediastinal exploration (3) | Transplant double lung with bypass (3) | Placement ECMO (3) | |
CABG coronary artery bypass graft, LVAD left ventricular assist device, ECMO extracorporeal membrane oxygenation.
Top 10 features after random forest feature selection.
| Rank | Feature | Feature descriptiona | Importance | Number of cases where feature was present/total cases used for feature selection |
|---|---|---|---|---|
| 1 | CPT code | Veno-arterial extracorporeal membrane oxygenation (ECMO) initiation | 0.020874491 | 114/2410 |
| 2 | ICD-10 procedure code | ECMO continuous | 0.015373679 | 56/2410 |
| 3 | CPT code | ECMO cannulation | 0.013020617 | 94/2410 |
| 4 | CPT code | Thoracoabdominal aortic aneurysm repair | 0.010975536 | 10/2410 |
| 5 | Laboratory result | Blood gas analysis, barometric pressure | 0.010827222 | 863/2410 |
| 6 | Laboratory result | Blood gas analysis, potassium | 0.009008439 | 843/2410 |
| 7 | Laboratory result | Ionized calcium | 0.00858092 | 945/2410 |
| 8 | Laboratory result | Hemoglobin | 0.007877384 | 2125/2410 |
| 9 | Laboratory result | Albumin | 0.00751122 | 1756/2410 |
| 10 | ICD-10 code | Respiratory ventilation > 96 h | 0.007166464 | 113/2410 |
CPT current procedural terminology, ICD international classification of diseases.
aAll features used were only applied if they were known to be available for the patient prior to the surgery start time. Features (e.g. laboratory values, billing codes) were excluded from the model for a given patient if they were not available before the surgery start time.
Random Forest (RF) versus Gaussian process (GP) regression.
| 0 units transfused | 1–3 units transfused | 4 + units transfused | |
|---|---|---|---|
| Random forest | 0.829 (0.043) | 1.191 (0.047) | 5.799 (0.612) |
| Gaussian process regression | 0.064 (0.102) | 1.758 (0.033) | 7.613 (0.635) |
| Gaussian process regression for less severe cases | 0.766 (0.016) | ||
| Random forest | 7.007 | 1.624 | 56.568 |
| Gaussian process regression | 0.117 | 1.705 | 56.941 |
| Gaussian process regression for less severe cases | 0.985 | ||
This table shows our GP regression model compared with the RF regression model. GP regression performed best as demonstrated by the low root mean square error (RMSE) in the 0 units transfused and 1–3 units transfused categories. In contrast, performance of both models suffered when predicting 4 + RBCs transfused. Therefore, in the final model we restricted the GP regression prediction to cases where < 4 RBCs transfused was predicted by GP classification.
4 + RBCs transfused classification.
| AUC | Sensitivity | Specificity | F1 score | |
|---|---|---|---|---|
| Gaussian process | 0.826 (0.017) | 0.892 (0.027) | 0.678 (0.030) | 0.766 (0.015) |
| Random forest | 0.812 (0.009) | 0.812 (0.052) | 0.668 (0.052) | 0.726 (0.009) |
| Decision tree | 0.610 (0.021) | 0.272 (0.044) | 0.948 (0.007) | 0.413 (0.055) |
| XGBoost | 0.820 (0.015) | 0.805 (0.046) | 0.741 (0.051) | 0.757 (0.013) |
| Neural network | 0.723 (0.027) | 0.682 (0.055) | 0.742 (0.037) | 0.697 (0.025) |
| Gaussian process | 0.826 | 0.778 | 0.771 | 0.774 |
| Random forest | 0.803 | 0.944 | 0.642 | 0.764 |
| Decision tree | 0.572 | 0.278 | 0.866 | 0.421 |
| XGBoost | 0.697 | 0.694 | 0.659 | 0.676 |
| Neural network | 0.760 | 0.833 | 0.679 | 0.748 |
Multiple machine learning (ML) models were compared using several performance metrics. Our Gaussian Process (GP) classification model demonstrated the best performance in the development phase and in the subsequent validation phase. AUC area under the receiver operator curve. F1 Score = the harmonic mean of the sensitivity and positive predictive value. Note that standard deviations are only listed for the model development phase that used the initial dataset because it used five-fold cross validation. The model validation phase used the initial dataset for training (n = 2410) and the additional set of cases (n = 437) for its test phase.
Figure 2Area under the curve (AUC) for each machine learning (ML) classification algorithm for the development phase. Here, we used only the initial dataset (n = 2410) with five-fold crossvalidation to reduce overfitting. Gaussian Process (GP) demonstrated the greatest AUC and overall performance.
Figure 3Area under the curve (AUC) for each machine learning (ML) classification algorithm after the validation phase. Here, the initial dataset (n = 2410) was used as the training dataset and the additional cases included after updating the database (n = 437) were used as the test dataset. Gaussian Process (GP) again showed the highest AUC and best overall performance.