Literature DB >> 33061545

Predicting Postoperative Length of Stay for Isolated Coronary Artery Bypass Graft Patients Using Machine Learning.

Fatima Alshakhs¹, Hana Alharthi¹, Nida Aslam², Irfan Ullah Khan², Mohamed Elasheri³.

Abstract

PURPOSE: Predictive analytics (PA) is a new trending approach in the field of healthcare that uses machine learning to build a prediction model using supervised learning algorithms. Isolated coronary artery bypass grafting (iCABG), an open-heart surgery, is commonly performed in the treatment of coronary heart disease. AIM: The aim of this study was to develop and evaluate a model to predict postoperative length of stay (PLoS) for iCABG patients using supervised machine learning techniques, and to identify the features with the highest contribution to the model.
METHODS: This is a retrospective study that uses historic data of adult patients who underwent isolated CABG (iCABG). After initial data pre-processing, data imputation using the kNN method was applied. The study used five prediction models using Naïve Bayes, Decision Tree, Random Forest, Logistic Regression and k Nearest Neighbor algorithms. Data imbalance was managed using the following widely used methods: oversampling, undersampling, "Both", and random over-sampling examples (ROSE). The features selection process was conducted using the Boruta method. Two techniques were applied to examine the performance of the models, (70%, 30%) split and cross-validation, respectively. Models were evaluated by comparing their performance using AUC and other metrics.
RESULTS: In the final dataset, six distinct features and 621 instances were used to develop the models. A total of 20 models were developed using R statistical software. The model generated using Random Forest with "Both" resampling method and cross-validation technique was deemed the best fit (AUC=0.81; F1 score=0.82; and recall=0.82). Attributes found to be highly predictive of PLoS were pulmonary artery systolic, age, height, EuroScore II, intra-aortic balloon pump used, and complications during operation.
CONCLUSION: This study demonstrates the significance and effectiveness of building a model that predicts PLoS for iCABG patients using patient specifications and pre-/intra-operative measures.

Entities: Chemical

Keywords: CABG; LoS; classifiers; predictive analytics

Year: 2020 PMID： 33061545 PMCID： PMC7537993 DOI： 10.2147/IJGM.S250334

Source DB: PubMed Journal: Int J Gen Med ISSN： 1178-7074

Introduction

Heart-related disorders are the leading cause of death worldwide. Ischemic heart disease has been the top cause of death for the last decade, both worldwide,1 and in Saudi Arabia.2 Coronary artery bypass grafting (CABG) is a common procedure proven to be an effective treatment for coronary heart disease.3,4 CABG is a type of open-heart surgery, which in general are invasive procedures that require an extended postoperative length of stay (PLoS, the time between surgery and discharge) in the hospital. When patients stays at the hospital, they are exposed to significant health risks, such as nosocomial infections, psychological disorders, and mortality.4 Thus, predicting the PLoS of heart-related operations is critical for understanding the overall risks of a given procedure, including those associated with extended hospital stays. In addition, PLoS is a common measure used in studies assessing the quality and outcomes of heart-related operations, as it reflects factors such as quality of care and hospital reimbursement.5 Furthermore, UK national health systems reported that better communication of expected PLoS and recovery timespan would be a relief to patients and their families, prepare them psychosocially, and reduce the distress related to late discharge.6 Predictive analytics techniques use machine learning algorithms to analyse large volumes of historical data to reveal hidden patterns and/or distinctive relationships7 based on a classification mechanism.8 With today’s unprecedented volume of patient data (amassed through the increased use of electronic medical records, among other resources), predictive analytics offers a promising new approach to advance clinical applications –predicting patient risk for heart attack, risk of post-surgery readmission, and even predicting which cancer treatment will result in the best outcome for a given patient.9 Of particular relevance to this discussion, predictive analytics can be used to identify patients at high risk of postoperative complications, which in turn can improve patient management and efficient resource allocation.10 At the study setting, Saud Albabtain Cardiac Centre, Dammam, Saudi, CABG is the most frequently performed surgical procedure. Thus, we focused our efforts on using predictive analytics to reveal patterns among patients recovering from isolated CABG (iCABG; ie, when the CABG procedure is performed in the absence of any other simultaneous procedure). We set out with the objective of developing and testing a best fit model to predict PLoS among iCABG patients at the study institution.

Materials and Methods

This is a retrospective study of historic data from adult patients who underwent isolated CABG, focusing on the metric of postoperative length of stay. Data were gathered from a single site institution, Saud Al Babtain Cardiac Centre in Dammam, Saudi Arabia. In this study setting, postoperative length of stay (PLoS) is measured as the time elapsed between the day of surgery and the day of the patient’s discharge. Some hospitals report a patient’s “hospital stay”, where they measure the total length of stay from patient admission for the operation until discharge,11 but the PLoS in this study begins on the reported date of surgery. A classification approach is a supervised learning method used in machine learning that builds a model trained on a sample dataset. This sample dataset can then be leveraged to predict the class label of a new observation. Our predictive models were developed using three major steps: data pre-processing, model development, and model evaluation (Figure 1).12

Figure 1

Model development cycle for prediction of postoperative length of stay for isolated coronary artery bypass grafting (iCABG). Notes: Reproduced from Free machine learning diagram - free powerpoint templates; 2017. Available from: .12 This study used a classification method to develop the predictive models. The target variable (PLoS) was converted to a categorical binary class: Average or Below (AB) and Above Average (AA). According to the literature, the average PLoS for a CABG patient is 7 days.11,13-18 In our data, the mean, median, and mode were (8.87), (7), and (6) days, respectively. As shown in Figure 2, the PLoS histogram is highly skewed to the right with a long, flattened tail. This distribution is largely due to the presence of outliers, as evident in the boxplot (Figure 3). The maximum length of stay was 378 days. Because of this skewness in the dataset, we used the median PLoS (7 days) as our cut-off point between the AA and AB groups. This value aligned with the average PLoS of CABG patients (7 days) as reported in the literature. Thus, those in the Above Average (AA) group had a PLoS of more than 7 days, and those in the Average and Below (AB) group had a PLoS of 7 days or fewer.

Figure 2

Histogram of postoperative length of stay including outliers.

Figure 3

Boxplot of postoperative length of stay with the outliers.

Histogram of postoperative length of stay including outliers. Boxplot of postoperative length of stay with the outliers. We used the classifiers that work best for a categorical dichotomous target class: Naïve Bayes (NB), Decision Tree (DT), K Nearest Neighbour (kNN), Logistic Regression (LR), and Random Forest (RF).19 In addition, the output of these classifiers is easier to interpret and used in clinical settings in comparison with other classifiers.20

Data Collection and Description

Saud Al-Babtain Cardiac Center, Dammam, Saudi Arabia is a 68-bed cardiac center that provides various services for patients with heart disease, including pharmacological, surgical, interventional, and electrical treatments. It serves the pediatric and adult population of the eastern region in Saudi Arabia, and also accepts referees from other Gulf regions.21 The study was conducted in accordance with institutional (Saud Al-Babtain Cardiac Center) guidelines and approved by the institutional review board (SBCC-IRB-MC-2019-01). The study was conducted in accordance with institutional guidelines (Saud Al-Babtain Cardiac Center) and the Declaration of Helsinki, and was approved by our institutional review board (SBCC-IRB-MC-2019-01). A waiver for informed consent was obtained due to the retrospective nature of the study. Every effort was taken to ensure that the privacy and confidentiality of patients was maintained. Study investigators developed a case report form for specifying attributes related to risk factors associated with the PLoS of iCABG patients. Clinical attributes were chosen based on a literature review. The form also includes fields for demographic data, patient history, comorbidities, preoperative measures, type of CABG procedure, intra- and post-operative measures, and the dates of admission, operation, and discharge. Data included in this study were from adult patients (>18 years) who underwent iCABG surgery. Patients below 18 years old and patients who underwent multiple grafting were excluded from the study. The dataset retrieved from the study setting was an Excel sheet that comprised of 50 fields (attributes) and 721 de-identified records (patients), which represent patients who underwent iCABG from the period of November 2014 till the time of study at January 2019. Data description of the complete dataset are presented in .

Data Pre-Processing

Features Cleaning

Data pre-processing started with feature cleaning, by removing the duplicate fields along with the variables with more than 80% missing data. Certain features (fields) from the complete dataset represented variables deemed irrelevant to this study, because our model focused only on data collected on the day of surgery. Therefore, we omitted attributes such as discharge medication and total blood loss after 48 hours. In the case of patients admitted to the study setting whose iCABG procedure was delayed for some number of days (due to clinical and/or administrative reasons), the total delay was included as an additional feature “duration from admission until operation” (DAO). Once PLoS and DAO (if applicable) were calculated, the fields used in the calculations were date of admission, date of surgery, date of discharge, and after the calculation these fields were removed from the dataset. As a result, the final list of attributes included in this study reduced to 35, as shown in Table 1.

Table 1

iCABG Attributes Extracted to Build the Models

Categories	#	Attributes
Demographic data	1	Age
	2	Gender
	3	Ethnicity
	4	Marital status
	5	BMI
	6	Height
	7	Smoking history
Comorbidities	8	Hypertension
	9	Hypercholesterolemia
	10	Renal disease
	11	Renal failure
	12	Diabetic
	13	Diabetes treatment
	14	Cerebrovascular disease
	15	Chronic lung disease
Patient history	16	Premedication
	17	Family history of ischemic heart disease
	18	Heart failure
	19	Previous cardiac, vascular, or thoracic surgery
	20	Number of previous heart operations
Pre-Op measures	21	Angina
	22	Pulmonary artery systolic
	23	Dyspnoea
	24	Poor mobility
	25	Poor mobility due to any non-cardiac reason
	26	Operative urgency
	27	EuroScore II
	28	DAO
	29	Pre-op intra-aortic balloon pump used
Intra-Op measures	30	CABG procedure
	31	Number of arterial grafts
	32	Intra-aortic balloon pump used
	33	Complications during operation
Post-Op measures	34	Infective complication
Post-Op measures	35	Blood loss at 24 hours

iCABG Attributes Extracted to Build the Models

Data Pruning

The data pruning process includes anomaly detection and handling of missing data. In scouring the dataset for anomalies, we identified and removed 31 instances of patients found to be deceased (the analysis of mortality outcomes is out of scope of the current study). Another seven records were removed because discharge dates were not recorded. After this step, the total number of usable records was 683. A missing value analysis using SPSS was conducted on the remaining 683 records and 2% missing values was found. A Little’s MCAR test was conducted on these records where a t-test was calculated for numerical values and chi-square test for categorical variables. In all cases we found non-significant P-values (P>0.05), indicating that these data were missing completely at random and can be imputed. K-nearest neighbors (kNN) is a classification algorithm used to impute the missing values. Since the kNN imputation method is susceptible to data extreme values, outliers data points must be treated first in order to reduce the variability.22 Treatment of outliers is an important step in model generation, as the presence of outliers in numerical variables tends to distort the accuracy of the prediction model.23 To detect outliers, all numerical variables were explored using a SPSS Boxplot chart, using the definitions of legitimate and extreme outliers as described by Hoaglin and Iglewicz.24 In our study, we found the variables of age and height to have legitimate outliers; however, BMI, DAO, pulmonary artery systolic, EuroScore II, and blood loss at 24 hours exhibited extreme outliers. In order to manage the extreme outliers, these attributes were converted to categorical variables. Categorical variables reduce the time necessary to build the model compared to continuous variables and produce better prediction results.25 In addition, categorical variables make it possible to overcome the problem of outliers without risking the reduction of number of values/instances. Body Mass Index (BMI) is a variable comprising numeric values and was divided into categorical BMI, in which less than 17 is underweight, 18–24 is healthy weight, 25–29 is overweight, 30–39 is obese, and more than 40 is extreme obesity.26 For pulmonary artery systolic, the normal range is between 8–20 for a patient at rest, and considered high if it exceeds 25 at rest and 30 at activity. The pulmonary artery systolic at the study setting was measured at rest, so the variable was transformed into three categories: Normal (<25), Moderate (26–30), and High (>30).27 After DAO was calculated, it was transformed into three categories: short (<3 days), medium (3–7 days), and long (>7 days) after reviewing the entries with the study setting. We removed records containing outliers in EuroScore II (7 outliers) and blood loss at 24 hours (55 outliers) because there was no clear-cut point to transform them into categorical counterparts. As a result, the entire records that contain these outliers were removed and the total sample size remaining after outlier processing was 621 instances which were used to build our models. Finally, the missing values were filled using the K Nearest Neighbor (kNN) imputation method using RStudio. kNN is applicable to all data types, which makes it a valid imputation method in our case.28 It is an effective method28 that produces highly accurate results.29 Imputation of missing values using a machine learning algorithm proved its superiority over statistical methods, where it is more suitable to apply in medical domains due to the nature of data.30 To fill a missing datum, kNN searches for similar records according to other variables, thereby identifying a specified number of this record’s “neighbors”. Next, the algorithm fills the missing cell by approximating a value based on this variable’s values in its neighbors. We used the default kNN imputation in RStudio with one change: rather than the default k=5, we used k=11 (odd numbers tend to provide better results and avoid tie situations).31 The final dataset used in model development contained 621 instances and 35 attributes in addition to the target variable (PLoS) which includes AA and AB classes.

Feature Selection

Reducing the number of attributes tends to produce higher accuracy models.32 Furthermore, it is faster to apply the prediction when there are fewer attributes to consider. Using less attributes without compromising model accuracy is the current gold standard.33 As a first step, we used RStudio to apply a feature selection process to eliminate variables that had no effect on the model’s predictions. The Boruta algorithm uses a random forest approach to evaluate attributes was used. It first creates a duplicate shuffled version of the dataset’s attribute values, then compares Z-scores (“Importance”) of the shuffled dataset’s attributes between the original and shuffled set. Attributes which exhibit a higher Z-score in the original dataset are deemed important.34 The Boruta algorithm is considered unbiased, relatively more stable in evaluating important and unimportant attributes, and it conducts several Random Forest iterations rather than a single one, as in other feature selection methods. It also takes into consideration the interaction between the attributes themselves.35 Using this algorithm, with 600 iterations, we found the following attributes to be most important (Figure 4): pulmonary artery systolic, age, height, EuroScore II, intra-aortic balloon pump used, and complications during operation. Figure 4 shows the important (highly predictive) attributes with the green color shading, while the unimportant ones had red shading. The boxplot represents the Z-scores of the important and unimportant variables.35

Figure 4

Result of Boruta feature selection method.

Model Development and Evaluation

Data Sampling and Balancing the Dataset

Data imbalance or skew distribution is one of the challenges in machine learning. Imbalanced datasets, ie, those in which the number of events in the target class exceeds the number of non-events, or vice versa, are a common issue when using real-life data. In our data the target class is binary; 216 (35%) positive events (Above Average) PLoS and 405 (65%) negative events (Average or Below). This is considered imbalanced. Four data processing methods were used to manage the imbalanced target class data: oversampling, undersampling, both (over and under), and random over-sampling examples (ROSE). In an oversampling method, the software adds several random records to the minority class, which is in our case the Above Average (AA), to yield target classes with equal numbers of instances. Undersampling, on the other hand, randomly reduces the number of instances in the majority class to be equal with the number of the instances in the minority class.36 The Both method combines oversampling and undersampling by randomly adding to the minority class and removing some instances from the majority class until the two classes are more in balance. Finally, the ROSE method develops synthetic (artificial) records based on certain algorithms and adds them to the minority class. ROSE uses a bootstrapping method to create artificial instances from the neighboring data points in the minority class. This technique is favored compared to oversampling and tends to reduce the overfitting that could result from oversampling with replacement.36,37 Table 2 shows the results of resampling methods to balance the target class.

Table 2

Results of Resampling Methods to Balance the Target Class

Method	Results	Total Records
Oversampling method	Records of target class AA randomly increased	810
Undersampling method	Records of target class AB randomly decreased to match the size of the AA target class	432
Both method	Records of AA increased, records of AB decreased, and the result of this method was 209 AB records and 232 AA records	441
ROSE	Records of AA was increased using synthetic records. The result of this method was 338 AB and 362 AA class.	700

Results of Resampling Methods to Balance the Target Class To evaluate the model accuracy, several techniques can be used such as standard (70% training set, 30% testing set), K-Fold cross-validation, leave one out, holdout, and bootstrap. To evaluate the models developed in this study, we performed data sampling for training and testing. Seventy percent of the datapoints were reserved for training and the remaining 30% were used to test the model. We also ran the K-fold cross-validation to check the consistency of our proposed model results. The significance of the K-fold cross-validation mechanism has been validated through other studies.38–41 The K-fold cross-validation mechanism reduces the problem of bias and variance. In our study, the K-fold cross-validation has been used to divide the data into training and testing sets and we used a K value equal to 10. In order to perform equal validation for each sub-group, data resampling techniques were applied before splitting the data (70%:30%) or running the K-fold cross-validation.

Developing the Learning Models

To develop the predictive models, five classifiers were used: Naïve Bayes, Decision Tree, Logistic Regression, K Nearest Neighbor, and Random Forest. For each classifier, we used four different datasets, ie, the resulting dataset from each of the four resampling methods described above. Therefore, a total of 20 models were developed.

Model Evaluation

Model evaluation is an essential step in model development, during which it demonstrates how well a given model is performing. Through model evaluation we compared performance and accuracy across all models. Performance metrics included AUC (Area Under the ROC (Receiver Operating Characteristics) Curve) which plots the true positive rate against the false positive rate.42 Also, recall (sensitivity) which is the probability of correctly identifying patients with Above Average (AA) PLoS. In addition, precision measures the ratio of correctly identified patients within the pool of who were predicted to have Above Average (AA) PLoS. Finally, F1 is the weighted average between recall and precision. Accuracy is another model evaluation measure that used the number of correct predictions amongst all predictions.19,43

Results

A total of 20 models were developed, and we evaluated the performance of each model using the following metrics: the AUC, recall, precision, F1 score, and accuracy. In terms of AUC, the Naïve Bayes and Random Forest models outperformed the other classifiers (kNN, Decision Tree, Logistic Regression). When evaluation AUC metrics, an AUC close to 1 indicates the model is a good fit for prediction.44 Table 3 summarizes the different measurements used to compare the performances between Naïve Bayes and Random Forest classifiers using all resampling methods and using data splitting for training and testing (70–30%). Our results demonstrate that the best fit model is the Random Forest classifier as it produces the highest AUC using oversampling and Both resampling methods.

Table 3

Summary Results of Naïve Bayes and Random Forest Classifiers with All Resampling Methods Using 70–30% Split

Resampling	Naïve Bayes				Random Forest
	Over	Under	Both	ROSE	Over	Under	Both	ROSE
AUC	0.64	0.69	0.71	0.66	0.89*	0.68	0.80*	0.70
Accuracy	0.64	0.63	0.65	0.61	0.78	0.60	0.75	0.60
Recall	0.53	0.57	0.59	0.54	0.82	0.58	0.74	0.63
Precision	0.66	0.66	0.73	0.60	0.75	0.63	0.78	0.58
F1 Score	0.59	0.61	0.65	0.57	0.78	0.60	0.77	0.60

Note: *AUC values closest to 1 (80% and above).

Summary Results of Naïve Bayes and Random Forest Classifiers with All Resampling Methods Using 70–30% Split Note: *AUC values closest to 1 (80% and above). In order to have more confidence with these results, we used 10-K cross-validation for the Random Forest classifier since it has produced the highest AUC with all the resampling methods. The results as presented in Table 4 show similar results to the 70–30% splitting. However, undersampling AUC in cross-validation showed noteworthy improvement, from 0.68 to 0.81.

Table 4

Summary Results of Random Forest Classifier with All Resampling Methods Using 10-K Cross- Validation

	Random Forest
Resampling	Over	Under	Both	ROSE
AUC	0.80	0.81	0.81	0.80
Accuracy	0.80	0.82	0.81	0.77
Recall	0.80	0.82	0.82	0.80
Precision	0.80	0.82	0.82	0.79
F1 Score	0.79	0.80	0.82	0.78

Summary Results of Random Forest Classifier with All Resampling Methods Using 10-K Cross- Validation For further evaluation, we used the ROC curve to compare between the different resampling methods using Random Forest classifier. First, with a 70–30% split (Figures 5 and 6) and second, with 10-k cross-validation (Figures 7 and 8). It can be seen that the Random Forest model using Both resampling method with 10-K cross-validation outperformed the other models (Figure 7).

Figure 5

Receiver operating characteristics (ROC) curve for random forest model using both resampling method and 70–30% split.

Figure 6

Receiver operating characteristics (ROC) curve for random forest model using oversampling method and 70–30% split.

Figure 7

Receiver operating characteristics (ROC) curve for random forest model using both resampling method and 10-K cross-validation.

Figure 8

Receiver operating characteristics (ROC) curve for random forest model using undersampling method and 10-K cross-validation.

Receiver operating characteristics (ROC) curve for random forest model using both resampling method and 70–30% split. Receiver operating characteristics (ROC) curve for random forest model using oversampling method and 70–30% split. Receiver operating characteristics (ROC) curve for random forest model using both resampling method and 10-K cross-validation. Receiver operating characteristics (ROC) curve for random forest model using undersampling method and 10-K cross-validation.

Discussion

The objective of this study was to build a model that predicts whether a patient’s postoperative length of stay (PLoS) following iCABG will be longer or shorter than average. We used area under the ROC curve (AUC) as the main metric of model performance as it has the ability to present the optimal performing classifier.45 Moreover, each model was also evaluated in terms of accuracy, recall, precision, and F1 score. Random forest classifier using different resampling methods produced models with the highest AUC (Table 5). Random Forest classifier with Both resampling method using 10-K cross-validation technique was selected as the best fit model based on several factors. First, it produced an acceptable AUC score (0.81). Second, the Both resampling method introduces less bias as compared to the oversampling method. In the Both method, the minority class AA that presents the positive events was increased by replacement with only a very limited number of instances – only 16 records were replicated, whereas the oversampling method increased the AA class by 189 records. Third, this model is highly sensitive with a recall score of 0.82, meaning that the model has 82% probability of predicting AA events correctly. Finally, it has an F1 score of 0.82, indicating that this model has a good balance between precision and recall.

Table 5

Random Forest Models with Highest AUC, F1 and Recall

	70–30% Split			10-K Cross-Validation
	Both	Over	ROSE	Under	Both	Over	ROSE
AUC	0.80	0.89	0.70	0.81	0.81	0.80	0.80
F1	0.77	0.79	0.60	0.80	0.82	0.79	0.78
Recall	0.74	0.82	0.63	0.82	0.82	0.80	0.80

Random Forest Models with Highest AUC, F1 and Recall Even though Random Forest with oversampling (70–30%) split produced the highest AUC (0.89), it was not selected as the best fit model. The reason for this high AUC is due to overfitting where we used oversampling with random replacement before the data splitting. Indeed, tree-based classification type models are particularly susceptible to overfitting. As explained by Chawla et al,37 the tree splits will be highly concentrated in the replicated minority class and limiting the splits boundary in the majority class, which increases bias. In addition, because we used a technique for random oversampling with replacement before splitting the data, that could lead to the presence of the same data point in both the training set and the testing set. This leads to a highly fitted model with high prediction power but with reduced generalizability.46 This effect could also underlie the high recall (sensitivity) in the oversampled model compared to its F1 measure which indicates the high ability of the model to predict replicated Above Average (AA) points compared to the balanced ability perdition measured by F1 score.47 Thus, the high AUC for oversampling method in Random Forest classifier with (70–30%) split could be overoptimistic, especially when it is compared with the AUC of oversampling model using cross-validation which resulted in AUC dropping to 0.80 from 0.89. Moreover, a study by Khalilia, Chakraborty and Popescu51 developed a model to predict disease risk using random forest with oversampling. To overcome the oversampling effect, they used Repeated Random Sub-Sampling, which is similar to cross-validation in mechanism. As a result, the classifier produced an AUC of 0.89, which was the highest when compared with other classifiers in the same study. Other related studies have also found that models developed using Random Forest with cross-validation tends to perform best. Daghistani et al39 found that a model developed with Random Forest with cross-validation produced the highest AUC (0.94) as compared to other classifiers in order to predict in-hospital stay for patients with cardiac problems. Similarly, Alghamdi et al48 and Sakr et al49 built models to predict diabetes mellitus and hypertension, respectively, and both studies found that Random Forest provided the highest AUC (0.92 and 0.93, respectively) when compared to other classifiers. Likewise there were several studies in which the use of synthetic records in resampling method tend to produce more confident models.37 In our study, ROSE sampling with Random Forest using cross-validation produced an AUC of 0.8. This agrees with the findings of Navaz et al50 showing that the SMOTE method (another method similar to ROSE resampling) improved their model’s ability to predict length of stay for all types of ICU admissions.51 In summary, our research found that the best fit model produced an AUC of 0.81, which is comparable to other similar studies (Table 6).

Table 6

Matrix of Studies with Relative Medical Conditions to Compare with The Study Results

Article	Authors	Year	Country	Setting	Medical Condition	Data Balance	Model Evaluation	Classifier	AUC
Predictors of in-hospital length of stay among cardiac patients: A machine learning approach	Daghistani et al39	2019	Saudi Arabia	King Abdulaziz Medical City Complex in Riyadh	Predict LoS for cardiac patients	Smote	Cross-validation	Random Forest	0.94
Neural Network Prediction of ICU Length ofStay Following Cardiac Surgery Based on Pre-Incision Variables	LaFaro et al41	2015	USA	New York Medical College	Predict ICU LoS after cardiac surgery	–	Cross-validation	Ensemble of Neural Network	0.90
Using machine learning for predicting severe postoperative complications after cardiac surgery	Lapp et al57	2018	UK	Golden Jubilee National Hospital	Predict complications after cardiac surgery	–	–	Random Forest	0.71
Prediction of In-Hospital Mortality And Length of Stay in Acute Coronary Syndrome Patients Using Machine-Learning Methods	Yakovlev et al40	2018	Russia	–	Predict mortality and LoS for acute coronary syndrome patients	–	Cross-validation	Naïve Bayes	0.90
This study		2019	Saudi Arabia	Saud Albabtain Cardiac Center	Predict LoS for iCABG patients	Both method	Cross-validation	Random Forest	0.81

Matrix of Studies with Relative Medical Conditions to Compare with The Study Results We found that six attributes had considerable influence in predicting which post-operative length of stay (PLOS) category (Above Average (AA), Average or Below (AB)) any given iCABG patient would fall into. These attributes are: EuroScore II, complications during operation, balloon pump used, pulmonary artery systolic pressure, height, and age. They are similar to risk factors for extended PLoS found in several other studies. Biancari et al52 found EuroScore II is not only highly predictive for in-hospital mortality for isolated CABG, but also predicted PLoS for similar kinds of surgeries.52 A patient experiencing complications during operation related to surgery was a powerful predictor in our model, which is similar to the findings of Lazar et al53 reporting that patients with preoperative risk factors and patients who develop complications postoperatively have the longest PLoS.53 Other studies found that a history of previous heart surgery,54 age, and systolic pressure39 predicted longer PLoS, consistent with our results. However, in our study we found a lower predictive power (minimal) associated with diabetes, which contradicts with the findings of Ali et al.54 Finally, the model selected as a best performer can be deployed and used in the study setting after the surgery to provide additional insight into the factors contributing to prolonged length of stay for patients undergoing iCABG. Thus, it could be useful to optimize bed management, resource utilization, and even infection control.55 Being able to predict when a patient is likely to experience a longer than average PLoS also presents the opportunity for psychosocial preparation, both for the patients and their families.6 It also can be a great addition clinically where it can improve the decision-making process and provide the proper care needed for patient predicted to stay longer after surgery.56

Limitations and Future Work

The sample size used in this study is relatively small, particularly as compared to the modern standards of “big data”. However, two studies in the literature demonstrate that smaller datasets can result in better model performance compared to larger sets. Amarasingham et al58 developed a model to identify risk of readmission for 30 days or death for patients with heart failure using electronic medical record data. They used 1372 records to build a model that produced results with c statistics (equivalent to AUC) of 0.86 for mortality and 0.72 for readmission. This model outperformed the model created by the Center for Medicaid and Medicare Services, which used larger datasets (c statistics were 0.73 and 0.66 for mortality and readmission, respectively). Another study found good performing models can be obtained from reduced datasets.59 One of the challenges we faced in conducting this study was removal of several attributes that were missing more than 80% of the data (ICU stay, anesthesia timings, and duration of the procedures). Though unlikely given the relatively good performance of our final model, removing these attributes may have impacted overall performance. Focusing this study only on patients undergoing isolated CABG surgery was advantageous to our proximal objective of specifically predicting outcomes for this patient population. However, doing so, limits the generalizability of the final model to other types of open-heart surgeries. To deal with this limitation, future work should consider the PLoS associated with other open-heart procedures. In addition, future studies might add several other factors that have been hypothesized to be important, such as those related to physicians’ profiles (qualification, experience, etc.). Another promising feature to examine would be procedure volume conducted by the hospital –Shinjo & Fushimi60 found that hospitals with a high volume of iCABG procedure have a shorter length of stay for patients with open-heart surgery. Finally, we designed this model to predict PLoS in binary terms, ie, Above Average/Average or Below. Arguably, the ability to predict PLoS on a finer scale in the form of continuous data might enhance efforts to mitigate the impact of prolonged PLoS on these patients.

Conclusion

In this research, our main objective was to develop the best fit model that would predict the level of postoperative length of stay of patients undergoing isolated coronary artery bypass grafting using supervised machine learning classifiers. The result of this study showed that Random Forest classifier with Both resampling using 10-K cross-validation outperformed other classifiers.

24 in total

1. Validation of EuroSCORE II in patients undergoing coronary artery bypass surgery.

Authors: Fausto Biancari; Francesco Vasques; Reija Mikkola; Marta Martin; Jarmo Lahtinen; Jouni Heikkinen
Journal: Ann Thorac Surg Date: 2012-04-18 Impact factor: 4.330

2. An automated model to identify heart failure patients at risk for 30-day readmission or death using electronic medical record data.

Authors: Ruben Amarasingham; Billy J Moore; Ying P Tabak; Mark H Drazner; Christopher A Clark; Song Zhang; W Gary Reed; Timothy S Swanson; Ying Ma; Ethan A Halm
Journal: Med Care Date: 2010-11 Impact factor: 2.983

3. Preoperative state of mind among patients undergoing CABG: effect on length of stay and postoperative complications.

Authors: Linda S Halpin; Scott D Barnett
Journal: J Nurs Care Qual Date: 2005 Jan-Mar Impact factor: 1.597

4. An original model to predict Intensive Care Unit length-of stay after cardiac surgery in a competing risk framework.

Authors: Fabio Barili; Nicoletta Barzaghi; Faisal H Cheema; Antonio Capo; Jeffrey Jiang; Enrico Ardemagni; Michael Argenziano; Claudio Grossi
Journal: Int J Cardiol Date: 2012-10-03 Impact factor: 4.164

5. Concurrence of big data analytics and healthcare: A systematic review.

Authors: Nishita Mehta; Anil Pandit
Journal: Int J Med Inform Date: 2018-03-26 Impact factor: 4.046

6. Hyperglycemia predicts mortality after CABG: postoperative hyperglycemia predicts dramatic increases in mortality after coronary artery bypass graft surgery.

Authors: Kent W Jones; A Steven Cain; John H Mitchell; Roger C Millar; Holly L Rimmasch; Thomas K French; Samuel L Abbate; Colleen A Roberts; Shane R Stevenson; Diane Marshall; Donald L Lappé
Journal: J Diabetes Complications Date: 2008-04-16 Impact factor: 2.852

7. Neural Network Prediction of ICU Length of Stay Following Cardiac Surgery Based on Pre-Incision Variables.

Authors: Rocco J LaFaro; Suryanarayana Pothula; Keshar Paul Kubal; Mario Emil Inchiosa; Venu M Pothula; Stanley C Yuan; David A Maerz; Lucresia Montes; Stephen M Oleszkiewicz; Albert Yusupov; Richard Perline; Mario Anthony Inchiosa
Journal: PLoS One Date: 2015-12-28 Impact factor: 3.240

8. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

Authors: Manal Alghamdi; Mouaz Al-Mallah; Steven Keteyian; Clinton Brawner; Jonathan Ehrman; Sherif Sakr
Journal: PLoS One Date: 2017-07-24 Impact factor: 3.240

9. Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford ExercIse Testing (FIT) Project.

Authors: Sherif Sakr; Radwa Elshawi; Amjad Ahmed; Waqas T Qureshi; Clinton Brawner; Steven Keteyian; Michael J Blaha; Mouaz H Al-Mallah
Journal: PLoS One Date: 2018-04-18 Impact factor: 3.240

10. Measuring performance on the Healthcare Access and Quality Index for 195 countries and territories and selected subnational locations: a systematic analysis from the Global Burden of Disease Study 2016.

Authors:
Journal: Lancet Date: 2018-06-01 Impact factor: 202.731

2 in total

1. Parsimonious machine learning models to predict resource use in cardiac surgery across a statewide collaborative.

Authors: Arjun Verma; Yas Sanaiha; Joseph Hadaya; Anthony Jason Maltagliati; Zachary Tran; Ramin Ramezani; Richard J Shemin; Peyman Benharash
Journal: JTCVS Open Date: 2022-04-20

2. Personalized Preoperative Prediction of the Length of Hospital Stay after TAVI Using a Dedicated Decision Tree Algorithm.

Authors: Maria Zisiopoulou; Alexander Berkowitsch; Ralf Neuber; Haralampos Gouveris; Stephan Fichtlscherer; Thomas Walther; Mariuca Vasa-Nicotera; Philipp Seppelt
Journal: J Pers Med Date: 2022-02-24

2 in total