Literature DB >> 35070381

Machine learning methods for perioperative anesthetic management in cardiac surgery patients: a scoping review.

Santino R Rellum1,2, Jaap Schuurmans1,2, Ward H van der Ven1,2, Susanne Eberl1, Antoine H G Driessen3, Alexander P J Vlaar2, Denise P Veelo1.   

Abstract

BACKGROUND: Machine learning (ML) is developing fast with promising prospects within medicine and already has several applications in perioperative care. We conducted a scoping review to examine the extent and potential limitations of ML implementation in perioperative anesthetic care, specifically in cardiac surgery patients.
METHODS: We mapped the current literature by searching three databases: MEDLINE (Ovid), EMBASE (Ovid), and Cochrane Library. Articles were eligible if they reported on perioperative ML use in the field of cardiac surgery with relevance to anesthetic practices. Data on the applicability of ML and comparability to conventional statistical methods were extracted.
RESULTS: Forty-six articles on ML relevant to the work of the anesthesiologist in cardiac surgery were identified. Three main categories emerged: (I) event and risk prediction, (II) hemodynamic monitoring, and (III) automation of echocardiography. Prediction models based on ML tend to behave similarly to conventional statistical methods. Using dynamic hemodynamic or ultrasound data in ML models, however, shifts the potential to promising results.
CONCLUSIONS: ML in cardiac surgery is increasingly used in perioperative anesthetic management. The majority is used for prediction purposes similar to conventional clinical scores. Remarkable ML model performances are achieved when using real-time dynamic parameters. However, beneficial clinical outcomes of ML integration have yet to be determined. Nonetheless, the first steps introducing ML in perioperative anesthetic care for cardiac surgery have been taken. 2021 Journal of Thoracic Disease. All rights reserved.

Entities:  

Keywords:  Cardiac surgery; anesthesiology; artificial intelligence; machine learning; perioperative care

Year:  2021        PMID: 35070381      PMCID: PMC8743411          DOI: 10.21037/jtd-21-765

Source DB:  PubMed          Journal:  J Thorac Dis        ISSN: 2072-1439            Impact factor:   2.895


Introduction

Cardiac surgery has gone through many advancements since the first use of cardiopulmonary bypass in 1953 (1). Milestones have been achieved with mechanical circulatory support devices and improved surgical techniques like the introduction of minimally invasive hybrid cardiac surgery. These innovations have enabled the inclusion of elderly and more high-risk patients. In addition, the improvement of perioperative management has further reduced the risk of complications. For example, the assessment of cardiovascular performance has improved with continuous ventricular function monitoring using miniaturized transesophageal echocardiography (TEE) probes (2). Also, the implementation of goal-directed therapy (GDT) has yielded beneficial effects in cardiac surgery (3). These innovations provide crucial diagnostic information needed to aid perioperative supportive and preventive care. In recent years, machine learning (ML), a subset of artificial intelligence (AI), is the cause for a revolution in several medical fields (4-6), suggesting new possibilities within cardiac surgery. ML’s exponential growth in medicine is made possible by the availability of large datasets and improvement in computing power, as it is a computer-controlled technique that automates analytical model building. Three main types of machine learning are distinguished (7): (I) supervised ML is concerned with the training of a model towards a known target variable (outcome). By differing the weighted effect of given labeled inputs (e.g., age, sex, cholesterol level, smoking status), it minimizes the prediction error of the desired output (for example, having cardiovascular disease or not). Most applications in medicine apply this principle of machine learning, using either a classification or regression model. (II) Unsupervised ML is when the algorithm obtains unlabeled data (e.g., large sets of radiological or histological images) and attempts to find patterns. This is a more exploratory method as the algorithm decides what classes and patterns best describe the data. (III) Reinforcement learning is the technique that is perhaps key to surpassing human capability. This method learns what actions lead to the highest possible reward. This reward is predefined and usually custom-tailored to the problem at hand. In this case, a training set is absent, but it is created by the inputs the model receives through interaction with the environment. An example of such a reward is each time an autonomous vehicle stays within its lane. Through positive and negative reinforcement, the self-driving model learns what the required behavior is and what actions lead to that scenario. For now, the use of ML in medicine is mainly limited to supervised methods. The large quantities of data obtained during the perioperative phases of cardiac surgery are possibly suitable for a versatility of ML applications. Therefore, we have conducted a scoping review with the goal of identifying the full extent of current machine learning applications and their possible limitations for perioperative anesthetic management and risk assessment in cardiac surgery patients. We present the following article in accordance with the PRISMA-ScR reporting checklist (available at https://dx.doi.org/10.21037/jtd-21-765).

Methods

We performed a scoping review methodology as defined by Arksey and O’Malley (8) to examine the extent and nature of the currently employed machine learning methods within perioperative management of cardiac surgery patients. The preferred reporting items for systematic reviews and meta-analyzes extension for scoping reviews (PRISMA-ScR) checklist (9) was used to guide detailed reporting.

Search strategy

Searches were compiled by a clinical librarian for three databases [MEDLINE (Ovid), EMBASE (Ovid), and Cochrane Library], using the following keywords: cardiothoracic surgery, anesthesiology, and artificial intelligence, including all synonyms (complete list of keywords is included in Appendix 1). Three reviewers (SR, JS, and WV) independently selected the articles, reaching a consensus on all included studies.

Study selection

Articles were included in the review if the following criteria were met: (I) reported on perioperative applications of ML in the field of cardiac surgery with relevance to the work of the anesthetist or the postoperative intensive care unit (ICU) admission; (II) evaluated the performance of the applied AI technique in non-simulated datasets; (III) available in English; (IV) published in the last thirty years (1991 to present). Machine learning application (i.e., advanced method) was considered if a traditional statistical technique (i.e., conventional method) was trained and subsequently validated in different datasets. We excluded studies that solely focus on conventional methods, studies performed in children (age <18 years), or involving animals. In addition, editorials, commentary letters, and case reports were excluded.

Data extraction

For each article, data were extracted regarding: (I) perioperative phase; (II) type of cardiac surgery; (III) size of datasets; (IV) type of ML methods used; (V) area under the receiver operating characteristic curve (AUC). AUC, as a generalization referred to as the C-index, is used in this paper to provide insight into the mutual relationships of different models. An AUC value between 0.7 and 0.8 is considered satisfactory in this scoping review (10). A meta-analysis was not deemed appropriate given the wide variety of included subjects and ML techniques in the included studies.

Results

A total of 1,566 articles were identified, with a remainder of 1,142 articles after deduplication. Of these, 51 full-text studies were assessed for inclusion after the screening of titles and abstracts. Eighteen full-text articles were excluded for various reasons (). We found thirteen additional articles through citation tracking and non-systematic searches, resulting in 46 included articles. We identified three distinguishable categories: (I) event and risk prediction, (II) hemodynamic monitoring, and (III) automation of echocardiography.
Figure 1

Flow chart of the literature selection process for the present article.

Flow chart of the literature selection process for the present article.

Preoperative

Risk scores enable the assessment of preoperative risks to help in the stratification of patients. Additionally, they inform and guide patients and their relatives in shared-decision making and are used in cost-benefit analyzes. However, a known drawback of widely used risk scores in cardiac surgery is that they do not fit all (sub)populations, especially causing underperformance in high-risk patients (11).

Prediction of mortality

The established European System for Cardiac Operative Risk Evaluation (EuroSCORE) and the Society of Thoracic Surgeons (STS) score are based on conventional logistic regression analysis. These two clinical scores are among the most commonly used mortality risk scores, with AUCs ranging from 0.74 to 0.80 (). Important to note are the predictive discrepancies that persist for these scores in a few cardiothoracic operations and some subpopulations, especially in high-risk patients (33-38). Contrary to this underperformance, several advanced machine learning models demonstrated their added value in elderly and rheumatic heart disease (RHD) subpopulations. Within the elderly population, six perioperative variables (not further specified by the authors) were found to be strongly correlated with mortality. Based on those variables, a logistic regression (LR) model, Bayesian network (BN), and an artificial neural network (ANN) produced AUCs of respectively 0.854, 0.931, and 0.941, clearly outperforming the EuroSCORE that had an AUC of 0.648 in this population (19). Overall, the main mortality predictors in RHD were found to be left atrium size, high creatinine, tricuspid procedure, reoperation, and pulmonary hypertension. Using a random forest (RF) model, a new clinical score, the RheSCORE, was built on those predictors. With an area of 0.98, it outperforms the EuroSCORE II, which produces an AUC of 0.857 based on essentially the same predictors (25).
Table 1

Area under the curve values in validation datasets for mortality prediction at different time-points

StudySurgeryDatasetsPhaseaModel typeb (clinical score)AUCcDefinition mortality
TrainingTestCategorySubtype
Mixed surgical population
   Allyn et al. (12)MixN/APreoperativeConventionalLR (EuroSCORE II)0.737Postoperative, time point not specified
4,5641,956PreoperativeAdvancedLR0.742
RF + NB + GBM + SVM0.795
   Nilsson et al. (13)MixN/APreoperativeConventional LR (EuroSCORE I)0.79Death during hospitalization or ≤30 days after cardiac surgery
13,7714,591PreoperativeAdvancedLR0.78
ANN0.80
   Peng, Peng (14)MixN/APreoperativeConventionalLR (Parsonnet)0.829Postoperative, time point not specified
637315Pre-, and postoperativeAdvancedLR0.852
ANN0.873
   Orr (15)Mix732380Pre-, and postoperativeAdvancedPNN0.81Not specified
   Benedetto et al. (16)MixN/APreoperativeConventionalLR (EuroSCORE I)0.76Postoperative, in-hospital
LR (EuroSCORE II)0.77
20,1338,628PreoperativeAdvancedLR0.80
RF0.80
Naïve Bayes0.77
ANN0.77
   Fernandes et al. (17)Mix3,7611,254Pre-, and intraoperativeAdvancedLR0.80Postoperative, time point not specified
RF0.83
XGB0.85
SVM0.66
ANN0.70
   Zhong et al. (18)Mix5,4751,369Pre-, intra-, postoperativeAdvancedLR0.8630-day mortality
RF0.88
XGBoost0.90
ANN0.64
   Celi et al. (19)Mix in elderlyˆN/APreoperativeConventionalLR (EuroSCORE I)0.648In-hospital, time point not specified
11649Pre-, intra-, postoperativeAdvancedLR 0.854
BN0.931
ANN0.941
CABG and/or valve surgery
   Kilic et al. (20)CABG + valve N/APreoperativeConventionalLR (STS PROM) 0.795Death during hospitalization or ≤30 days after cardiac surgery
10,0711,119PreoperativeAdvancedXGBoost0.808
   Lippmann, Shahian (21)CABG40,48040,126PreoperativeAdvancedLR0.762Not specified
Bayesian model0.748
Committee classifier0.764
Single-layer MLP0.754
Two-layer MLP0.761
Three-layer MLP0.761
   Mendes et al. (22)CABG1,053262Pre-, intra-, postoperativeAdvancedLR0.86Death 30-day after CABG
ANN0.85
   Tu, Guerriere (23)CABG4,7825,517PreoperativeAdvancedLR0.77Postoperative, time point not specified
ANN 0.78
   Lippmann (24)CABG1,257Pre-, intra-, postoperativeAdvancedLR0.705Not specified
Single-layer MLP0.760
MLP
MLP-Committee
   Mejia et al. (25)Valve in RHDN/APreoperativeConventionalLR (B-Parsonnet)0.876Death during hospitalization or ≤30 days after cardiac surgery
LR (EuroSCORE II)0.857
LR (InsCor)0.835
LR (AmblerSCORE)0.831
LR (Guaragna)0.816
LR (New York)0.834
2,919PreoperativeAdvancedRheSCORE10.98
Heart transplantation
   Yoon et al. (26)HTxN/APreoperativeConventionalLR (DRI)0.529Generalization of four time point at 3-month, 1-, 3-, and 10-year
LR (IMPACT)0.527
LR (RSS) 0.544
66,30616,576PreoperativeAdvancedToPs/R20.577
   Nilsson et al. (27)HTxN/APreoperativeConventionalLR (DRI)0.561-year mortality
LR (IMPACT)0.61
LR (RSS) 0.61
41,7808,569PreoperativeAdvancedIHTSA30.650
   Shah et al. (28)HTx4,054PreoperativeAdvancedLR0.601-year mortality or retransplantation
ML model not specified0.64
   Villela et al. (29)HTx18,612PreoperativeAdvancedLR0.621-year mortality or retransplantation
Stacking of GBM0.66
   Bravo et al. (30)HTx after LVAD7,700PreoperativeAdvancedLR0.631-year mortality or retransplantation
ML model not specified0.61
   Miller et al. (31)HTx45,18211,295PreoperativeAdvancedLR0.651-year mortality
Ridge regression0.65
Regression LASSO0.65
RF0.63
NB0.61
TA-NB0.62
SVM0.52
SGB0.64
ANN0.66
   Agasthi et al. (32)HTx12,1893,047Pre-, intra-, postoperativeAdvancedGBM0.7175-year mortality

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, ensemble of thirteen advanced models; 2, trees of predictors based on three regression methods (cox regression, linear perceptron, and logistic regression); 3, international Heart Transplant Survival Algorithm based on an artificial neural network model. †, ratio between training and validation set not reported; ‡, not all values are extractable as they are mainly displayed in bar graphs; ˆ, ≥80 years. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.); AUC, area under the receiving operating characteristics curve for the validation sets; BN, Bayesian network; B-Parsonnet, 2000 Bernstein-Parsonnet score; CABG, coronary artery bypass graft surgery; GBM, gradient-boosted machine; HTx, heart transplantation; LASSO, least absolute shrinkage and selection operator; LVAD, left ventricular assist device; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; ML, machine learning model; MLP, multilayer sigmoid neural network; TA-NB, tree-augmented NB; NB, Naïve Bayes; PNN, probabilistic neural network; RF, random forest; RHD, rheumatic heart disease; SGB, stochastic gradient boosting; SVM, support-vector machines; Valve, heart valve surgery; XGBoost, extreme gradient boosting.

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, ensemble of thirteen advanced models; 2, trees of predictors based on three regression methods (cox regression, linear perceptron, and logistic regression); 3, international Heart Transplant Survival Algorithm based on an artificial neural network model. †, ratio between training and validation set not reported; ‡, not all values are extractable as they are mainly displayed in bar graphs; ˆ, ≥80 years. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.); AUC, area under the receiving operating characteristics curve for the validation sets; BN, Bayesian network; B-Parsonnet, 2000 Bernstein-Parsonnet score; CABG, coronary artery bypass graft surgery; GBM, gradient-boosted machine; HTx, heart transplantation; LASSO, least absolute shrinkage and selection operator; LVAD, left ventricular assist device; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; ML, machine learning model; MLP, multilayer sigmoid neural network; TA-NB, tree-augmented NB; NB, Naïve Bayes; PNN, probabilistic neural network; RF, random forest; RHD, rheumatic heart disease; SGB, stochastic gradient boosting; SVM, support-vector machines; Valve, heart valve surgery; XGBoost, extreme gradient boosting. However, in a mixture of cardiac surgery procedures, the two aforementioned clinical scores perform similarly or slightly less than advanced models (12,13,20). An ANN yielded comparable predictive properties to the EuroSCORE (AUC 0.80 vs. 0.79), with only a small advantage in the case of valve procedures (AUC 0.76 vs. 0.72, P value =0.0001) (13). Assembling four ML models [gradient boosting machines (GBM), RF, support vector machines (SVM), and Naïve Bayes (NB)] created a significant but modest benefit with an AUC of 0.795 versus 0.737 for the EuroSCORE II (12). Similarly, modest advantages in accuracy and AUC were seen comparing an advanced ML model [extreme gradient boosting machine (XGBoost)] to the STS clinical score. Interestingly, despite both the STS score and the XGBoost being well-calibrated and having a high area under the curve (respectively 0.808 and 0.795), they identified a large proportion of different patients as being at risk (20). Even one of the first clinical scores, the Parsonnet score, still holds value in predicting in-hospital mortality with a comparable AUC to an advanced LR and ANN model (0.829, 0.852, and 0.873, respectively) (14). Also, when comparing advanced ML methods, little difference in predicting performance is seen (14-16,21,22,39,40), with only a slight advantage for nonlinear models [ANN, BN, and multilayer sigmoid perceptron (MLP)] over linear LR models (13,19,24). The majority of these studies use a set of preoperative data, including demographic characteristics, medical history, and type of surgery performed. Adding intraoperative hypotension as a dynamic parameter to these preoperative data showed improved AUCs for advanced LR, RF, and XGBoost models. At the same time, an SVM and ANN did not benefit from this added parameter, outputting AUCs of 0.66 and 0.70, respectively (17).

Risk survival scores in heart transplantation

There are currently three main risk scores for heart transplant patients based on conventional logistic regression methods: the Donor Risk Index (DRI) (41), the Index for Mortality Prediction After Cardiac Transplantation (IMPACT) (42), and the Risk-Stratification Score (RSS) (43). These produce C-indices ranging from 0.55 to 0.57 for overall survival (). We identified two studies that compared an advanced model directly to these risk scores, obtaining slightly better performances (AUCs between 0.62 to 0.66) (26,27). Comparing only advanced models in their ability to predict 1-year mortality, both linear and nonlinear models show similar results with moderate AUCs consistently ≤0.66 (28-31). Only in predicting 5-year mortality after heart transplantation, an advanced GBM model transcended other machine learning models, generating an area of 0.717 (32). A different application of advanced modeling was used to stratify patients on a heart transplant waiting list. The applied neural network only moderately determined the most likely patient status: still waiting, transplanted, or deceased at three different time points (44).

Intra-operative

A staggering eighty percent of all intra-operative alarms in cardiac surgery, mainly hemodynamic warnings, do not require therapeutic intervention (45). Many redundant alarms involve artifacts or expected procedure-specific events. This is not fully acceptable as it can cause distraction or alarm fatigue (46). Advancements can be made to reduce the multitude of distracting alarms, as shown in a few AI applications in this chapter.

Predictions of hemodynamic instability

In 1997, Becker et al. (47) developed a monitoring system based on fuzzy logic to provide a continuous intuitive descriptive overview of a patient’s hemodynamic status (e.g., ‘preload is too high’). These hemodynamic interpretations were based on vital parameters and administered anesthetics. The validation process demonstrated promising results with a predictability of 99.5%. Compared to a simple threshold alarm, this system can help the physician to interpret changes quickly. The hypotension prediction index (HPI) (48) is another monitoring application. It is an advanced logistic regression-based model that can predict a hypotensive event (mean arterial pressure <65 mmHg for at least one minute), regardless of current blood pressure, up to 15 minutes in advance (48). The model was developed using large datasets, including cardiac surgery patients. A recently published study demonstrated the high predictive capability of the HPI solely in cardiac surgery (49). ML can also be used to identify relationships between risks, as demonstrated with three advanced RF models that adequately found cardiopulmonary bypass associated factors contributing to a reduction in right ventricular (RV) function (50).

Automation of intraoperative echocardiography (IOE)

Two articles were identified on the automation of ultrasound assessments that have the potential to enable a more efficient intraoperative workflow (51,52). As RV function analysis is both challenging and time-consuming, an AI-based automated RV strain assessment was compared with the most commonly used parameters: tricuspid annular plane systolic excursion (TAPSE), tissue Doppler-derived systolic tricuspid annulus motion velocity (S’), and RV fractional area change (FAC). A strong correlation was found between FAC and global longitudinal strain (GLS) over various RV function measurements on three different ultrasound machines (51). The second AI application in ultrasound automation relates to the analysis of the mitral valve (MV) (52). Patients with a normal biventricular function who underwent an elective CABG surgery were included for ultrasound imaging to evaluate the clinical applicability and accuracy of an AI-based MV analysis software. An experienced echocardiographer captured three end-systolic frames of the MV in each patient. Postoperatively, these frames were analyzed with the AI software. The software automatically traced the valves, and three experienced examiners independently verified the valve tracings. Thus, creating three separate datasets for all frames, as the examiners could administer minor manual adjustments when deemed necessary. Subsequently, the software’s six clinically relevant geometric parameters were calculated from the verified MV tracings (annulus anterolateral posteromedial diameter, annulus anteroposterior diameter, annular area, annulus nonplanarity angle, annulus total perimeter, and anterior and posterior leaflet areas). Statistical analyses showed a high precision for the calculated parameters in corresponding end-systolic frames in which only the valve tracings were verified by different examiners. Meaning that the latter did not affect the outcome (52).

Postoperative

Traditionally, risk scores were developed for mortality prediction alone. Only recently, morbidity has been incorporated in these models as they provide a marker for the quality of life (53). So far, most risk scores for postoperative outcomes are based on preoperative inputs and lack incorporation of intraoperative variables to improve on performance (54).

Morbidity in the ICU

The previously mentioned Parsonnet score initially developed for mortality prediction also generates acceptable AUCs in morbidity prediction concerning cardiovascular, respiratory, and neurological complications. Addressing these same outcomes, two advanced models (LR and ANN) show even better predictive capability in comparison, with the most significant advantage in predictive power for the ANN model with an AUC of 0.85 (14) (). In a recent study comparing advanced models reciprocally, an XGBoost model had the upper hand over ANN (18). However, these are outliers in terms of morbidity prediction. Most comparative studies in cardiac surgery show a reasonable predictive value for all advanced ML models with AUCs around 0.77 (55,56,62).
Table 2

Area under the curve values in validation datasets for postoperative morbidity prediction

SurgeryDatasetsPhaseaModel typeb (clinical score)AUCc
TrainingTestCategorySubtype
Miscellaneous1
   Cevenini et al. (55)CABG545545Pre-, intra-, postoperativeAdvancedLR0.781
BL0.778
BQ0.785
HS0.768
DS0.779
k-NN0.772
ANN10.776
ANN20.778
   Chong et al. (56)CABGN/APreoperativeConventionalLR (QMMI score)0.752
423140PreoperativeAdvancedLR0.807
ANN0.886
   Peng, Peng (14)MixN/APreoperativeConventionalLR (Parsonnet)0.727
637315Pre-, and postoperativeAdvancedLR0.789
ANN0.852
Secluded morbidities
   Zhong et al. (18)Mix5,4751,369Septic shock
   Pre-, intra-, postoperativeAdvancedLR0.93
RF0.81
XGBoost0.96
ANN0.88
Thrombocytopenia
   Pre-, intra-, postoperativeAdvancedLR0.87
RF0.89
XGBoost0.89
ANN0.83
Liver dysfunction
   Pre-, intra-, postoperativeAdvancedLR0.82
RF0.89
XGBoost0.89
ANN0.70
   Mufti et al. (57)Mix4,4761,117Agitated delirium
   Pre-, intra-, postoperative AdvancedLR0.814
RF0.813
NB0.799
BN0.774
SVM0.811
DT0.772
ANN0.804
Acute kidney injury
   Lei et al. (58)Aortic arch 627270Pre-, intra-, postoperative AdvancedLR0.65
RF0.71
SVM0.64
LGM0.80
   Tseng et al. (59)Mix470201Pre-, and intraoperative AdvancedLR0.806
RF0.839
DT0.781
XGboost0.837
SVM0.825
RF+XGBoost0.843
   Lee et al. (60)Mix1,0051,005Pre-, intra-, postoperativeAdvancedLR0.70
RF0.68
DT0.71
XGBoost0.78
SVM0.69
NN classifier0.64
Deep learning 0.55
   Penny-Dimri et al. (61)MixN/APreoperativeConventionalLR (Cleveland Clinic)0.71
LR (Risk score) 0.74
LR (Risk score)0.75
77,32219,331PreoperativeAdvancedLR 0.76
GBM0.76
k-NN0.66
ANN0.76
Pre-, and intraoperative AdvancedLR 0.77
GBM0.78
k-NN0.67
ANN0.77

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, Mix of cardiovascular, respiratory, neurological, renal, infectious, and hemorrhagic complications. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.). AUC, area under the receiving operating characteristics curve for the validation sets; BL, Bayes linear; BN, Bayesian network; BQ, Bayes quadratic; CABG, coronary artery bypass graft surgery; DS, direct score; DT, decision trees; GBM, gradient-boosted machine; HS, Higgins score; k-NN, k-nearest neighbor; LGM, light gradient machine; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; NN, neural network; NB, Naïve Bayes; RF, random forest; SVM, support-vector machines; XGBoost, extreme gradient boosting.

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, Mix of cardiovascular, respiratory, neurological, renal, infectious, and hemorrhagic complications. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.). AUC, area under the receiving operating characteristics curve for the validation sets; BL, Bayes linear; BN, Bayesian network; BQ, Bayes quadratic; CABG, coronary artery bypass graft surgery; DS, direct score; DT, decision trees; GBM, gradient-boosted machine; HS, Higgins score; k-NN, k-nearest neighbor; LGM, light gradient machine; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; NN, neural network; NB, Naïve Bayes; RF, random forest; SVM, support-vector machines; XGBoost, extreme gradient boosting. Acute kidney injury (AKI) is a common complication after cardiac surgery (63). Isolating patients at risk for AKI or renal replacement therapy (RRT) could guide perioperative treatment. Advanced predictive models based on GBM, LR, and ANN showed superior ability in identifying patients at risk for AKI and RRT as opposed to conventional risk scores based on LR (64,65). Overall, reasonable predictions in AKI prediction are seen for conventional and advanced models in most articles with AUCs ranging from 0.66 to 0.84 (61). Looking at individual studies in which various advanced models are directly compared with each other, it is noticeable that both MLP and XGBoost models are often better (24,58-60). Lastly, promising results have been found in evaluating the need for early continuous venovenous hemofiltration (CVVH) after cardiac surgery with comparable and accurate predictive results for both an ANN and an advanced LR model (66). Prevention and early recognition of delirium are essential as it is associated with poor outcomes (67). We identified one study on this topic. It cross-examined seven advanced models comparing their performance in an imbalanced dataset (integral dataset) to their performance in a balanced dataset (i.e., 10-fold cross-validation applied). This was done in order to reduce overestimation. In line with their expectation, the predictive values of the models showed better performance in the balanced sets, with the best predictions for an LR and RF model and the least for a BN model that still performed sufficiently with an AUC of 0.77 (57).

Length-of-stay

Accurate estimation of ICU length-of-stay (LOS) is not only advantageous in the counseling of patients and their families but even so in the organization of the bed capacity and scheduling of the operating rooms. More so in recent times, with the increased scarcity of ICU beds due to the ongoing COVID-19 pandemic (68). The conventional EuroSCORE I is positively correlated with prolonged LOS, making it a suitable predicting tool for LOS (69). We identified one article demonstrating the superiority of an advanced ML model to the EuroSCORE I. It outperformed other advanced models as well and showed similar distinctiveness to physicians’ LOS predictions (70) (). Other comparing data suggest that ANNs outperform other advanced models regarding LOS (76). By itself, an ANN developed in 1993 showed that it successfully stratified cardiothoracic surgery patients at risk of extended stay (>2 days) with an AUC of 0.70 (23). These promising results are even outperformed when ANNs are ensembled (72). Although slightly more modest in performance, advanced regression models still produce acceptable LOS predictions with AUCs ranging from 0.83 to 0.87 (73).
Table 3

Area under the curve values in validation datasets for prediction of additional-, prolonged-, or re-intervention and/or care

SurgeryDatasetsPhaseaModel typeb (clinical score)AUCc
TrainingTestCategorySubtype
Renal replacement and CVVH
   Penny-Dimri et al. (61)MixN/APreoperativeConventionalLR (Cleveland Clinic)0.80d
LR (Risk score) 0.80d
LR (Risk score)0.81d
77,32219,331PreoperativeAdvancedLR 0.82d
GBM0.83d
k-NN0.68d
ANN0.82d
Pre-, and intraoperative AdvancedLR 0.84d
GBM0.85d
k-NN0.69d
ANN0.84 d
   Bent et al. (66)CABG + valve surgery3035PerioperativeAdvancedLR0.89e
ANN0.90e
Prolonged mechanical ventilation and reintubation
   Wise et al. (71)CABGN/APreoperativeConventionalLR0.698f
590148PreoperativeAdvancedANN0.714f
   Mendes et al. (22)CABG1,053262Pre-, intra-, postoperativeAdvancedLR0.67f
ANN0.72f
LR0.62g
ANN0.65g
Length of stay
   Rowan et al. (72)Mix480240Pre-, intra-, postoperativeAdvancedEnsemble ANNs 0.901
   Barbini et al. (73)CABG + valve surgery2,605651Pre-, intra-, postoperativeAdvancedNB0.859
   Meyfroidt et al. (70)MixN/APreoperativeConventionalLR (EuroSCORE I)0.726
Pre-, intra-, postoperativeNurses’ prediction0.695
Physician’s prediction0.758
461499AdvancedGaussian processes0.758
30-day readmission
   Manyam et al. (74)CABG1,042261Time-independent1 AdvancedXGBoost0.627
Time-dependent + time-independent1AdvancedXGBoost0.868
   Engoren et al. (75)CABG2,6442,711Pre-, intra-, postoperativeAdvancedLR0.644
Genetic programs0.654
ANN0.537
Graft failure at 5 years
   Agasthi et al. (32)HTx12,1893,047Pre-, intra-, postoperativeAdvancedGBM0.716

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section; d, need for renal replacement therapy; e, need for early continuous venovenous hemofiltration; f, prolonged mechanical ventilation; g, reintubation. 1, perioperative variables. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.). AUC, area under the receiving operating characteristics curve for the validation sets; CABG, coronary artery bypass graft surgery; GBM, gradient-boosted machine; HTx, heart transplantation; k-NN, k-nearest neighbor; LGM, light gradient machine; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; NB, Naïve Bayes; XGBoost, extreme gradient boosting.

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section; d, need for renal replacement therapy; e, need for early continuous venovenous hemofiltration; f, prolonged mechanical ventilation; g, reintubation. 1, perioperative variables. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.). AUC, area under the receiving operating characteristics curve for the validation sets; CABG, coronary artery bypass graft surgery; GBM, gradient-boosted machine; HTx, heart transplantation; k-NN, k-nearest neighbor; LGM, light gradient machine; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; NB, Naïve Bayes; XGBoost, extreme gradient boosting.

Mechanical ventilation

We identified two studies that elaborate on the prediction of prolonged mechanical ventilation and the chance of re-intubation. Both studies performed in a CABG subpopulation show minor differences in accuracy, sensitivity, and specificity in favor of an ANN over an advanced LR model (22,71).

Readmission

Given the high costs associated with readmission after hospital discharge, the ability to stratify the risk is essential for preventive measures. Improving upon existing conventional LR prediction models solely based on time-independent variables (e.g., 1-point lab values only postoperative) (77-79), an advanced XGBoost algorithm incorporating time-dependent factors (e.g., lab values at several time-points) demonstrated a better accuracy in predictive ability (74). Another but more complex ML tool called genetic programs performed equally well in accuracy to an advanced LR method. In contrast, an advanced ANN model in the same study showed a significantly worse predictive ability (75).

Discussion

This scoping review includes forty-six articles describing various ML techniques in cardiac surgery patients with relevance to perioperative anesthetic management and risk assessment. We identified three specific applications with the majority (n=41) on prediction analyses (e.g., mortality, AKI, readmission), three articles on hemodynamic monitoring, including a form of prediction, and two studies that elaborate on ultrasound guidance. The combined overall data suggest that the current applications of ML techniques on stationary variables (e.g., hemodynamic parameter at one time-point) in the cardiac surgery population perform similar to conventional statistical methods (not using a training and validation set) concerning predictive capability. In between ML methods, complex or straightforward in construction, only GBM more often shows superior outcomes than others. For one study, that can in part be attributed to the relatively updated registry it used (32). Major differences, however, are absent, with also high correlations in between these models, suggesting that they find similar patterns. Although major predictive improvements are not seen for single ANNs, it is beneficial to use them in an ensemble (27,72). However, the true power of ML seems to be triumphant when applied to more complex data such as full dynamic arterial waveforms or complex ultrasound images as opposed to stationary perioperative variables. Using these parameters yields real-time clinically insightful results (48,51,52) that are a valuable addition to current dynamic parameters (e.g., heart rate, stroke volume variation). Contemporary literature lacks data on clinical outcomes from ML implementation in the cardiac surgery population. Still, beneficial results are probably not long in coming, given the effectiveness seen in a different surgical population using real-time dynamic data in an ML model (80). In contrast, comparable performances or modest improvements in prediction models are similar to other medical fields (81,82). An explanatory factor in this may be that these implementations are often based on manageable datasets that do not use an uncountable number of variables. While the strength of some advanced ML models is attributed to their ability to establish nonlinear relationships between variables in complex datasets that conventional methods have not previously demonstrated. In line with this, one of our included studies showed that the predicted risk correlation between an advanced and conventional model was very low. Although comparable in prediction, this suggests that they did not assign their prediction to the same features. Besides, the high complexity of ML models in small datasets poses a risk for overfitting (83). This occurs when irrelevant characteristics in the training data are marked as predictive parameters, causing underperformance in the test set (84). This problem might be avoided by implementing cross-validation as demonstrated for delirium in cardiac surgery (57). Still, traditional linear statistics may be more suitable for risk prediction models (85), as advanced models perform similarly but are more complex to develop. Although not convincing in risk prediction models, ML does excel in datasets consisting of dynamic parameters. Future research should focus on these real-time applications of ML exploring patterns in complex datasets. Then promising results can be achieved, as demonstrated by the effective hypotension early warning system by Hatib et al. (48,49) and the automation of echocardiography in two other studies (51,52). Not only aimed at the development and validation of such models but also their clinical effectiveness in randomized controlled trials should be addressed.

Future directions and challenges

Anesthesia is pre-eminently the field where many dynamic physiological data can be collected digitally, especially in cardiac surgery, where the mean operative time is about three hours (86). The current application of anesthesia information management systems (AIMS) is expected to be well above 80% in academic centers (87). Future incorporation of machine learning into AIMS could facilitate the continuous development of these models, unlocking their full potential based on a regularly updating and expanding dataset. Still, it might be safer only to update ML models in controlled research settings. Especially as neural networks, in particular, are not transparent and, at best opaque in how and what variables are processed in these algorithms (7,88). Not without reason, there is growing interest in explainable artificial intelligence, an AI in which decisions made by the model are transparent and better interpretable (89). Nowadays, the application of ML models is approved by the US Food & Drugs Administration (FDA) and the European Commission when the algorithm in patients cannot improve on its capabilities (so-called locked models). This is done to ensure consistent, reproducible results and safety from the algorithm (90). The FDA is currently assessing regulatory modification possibilities (91) that enable the use of “unlocked” ML, taking into account potential safety issues. There is still plenty of work to be done in the application and clinical evaluation of promising “locked” ML methods based on various perioperative dynamic variables. We suggest that future clinical trials implementing ML models address the following three primary outcomes: (I) will patient outcomes improve with ML-based diagnostic and treatment guidance, (II) does it improve workflow efficiency, and (III) is it cost-effective.

Limitations

Although we conducted a systematic search, we might have missed articles due to the broad range of included topics and acronyms in the literature. This may have led to the incorrect exclusion of studies from the initial selection. Another limitation of our article is that we only descriptively summarized the data without a meta-analysis. Therefore, definitive conclusions cannot be drawn about AUC differences across different methodologies. Nevertheless, this article provides an overview of the current ML applications per perioperative phase in cardiac surgery, showcasing where research is still needed.

Conclusion

Machine learning in cardiac surgery is being applied in perioperative anesthetic management and risk assessment. They are generally yielding comparable predictive outcomes to existing clinical scores. With the exception that models implementing dynamic variables obtain promising results. However, there is still a need for data on clinical outcomes after using ML-based models for diagnostic and treating guidance. The article’s supplementary files as
  85 in total

1.  Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks.

Authors:  Johan Nilsson; Mattias Ohlsson; Lars Thulin; Peter Höglund; Samer A M Nashef; Johan Brandt
Journal:  J Thorac Cardiovasc Surg       Date:  2006-07       Impact factor: 5.209

2.  Artificial Intelligence in Anesthesiology: Current Techniques, Clinical Applications, and Limitations.

Authors:  Daniel A Hashimoto; Elan Witkowski; Lei Gao; Ozanan Meireles; Guy Rosman
Journal:  Anesthesiology       Date:  2020-02       Impact factor: 7.892

3.  The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors.

Authors:  Michael Rowan; Thomas Ryan; Francis Hegarty; Neil O'Hare
Journal:  Artif Intell Med       Date:  2007-06-18       Impact factor: 5.326

Review 4.  High-performance medicine: the convergence of human and artificial intelligence.

Authors:  Eric J Topol
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

5.  Machine learning-based prediction of acute severity in infants hospitalized for bronchiolitis: a multicenter prospective study.

Authors:  Yoshihiko Raita; Carlos A Camargo; Charles G Macias; Jonathan M Mansbach; Pedro A Piedra; Stephen C Porter; Stephen J Teach; Kohei Hasegawa
Journal:  Sci Rep       Date:  2020-07-03       Impact factor: 4.379

6.  Acute kidney injury after cardiac surgery: prevalence, impact and management challenges.

Authors:  M Vives; A Hernandez; F Parramon; N Estanyol; B Pardina; A Muñoz; P Alvarez; C Hernandez
Journal:  Int J Nephrol Renovasc Dis       Date:  2019-07-02

7.  A Database-driven Decision Support System: Customized Mortality Prediction.

Authors:  Leo Anthony Celi; Sean Galvin; Guido Davidzon; Joon Lee; Daniel Scott; Roger Mark
Journal:  J Pers Med       Date:  2012-09-27

8.  A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

Authors:  Gabriele Cevenini; Emanuela Barbini; Sabino Scolletta; Bonizella Biagioli; Pierpaolo Giomarelli; Paolo Barbini
Journal:  BMC Med Inform Decis Mak       Date:  2007-11-22       Impact factor: 2.796

9.  Predicting reintubation, prolonged mechanical ventilation and death in post-coronary artery bypass graft surgery: a comparison between artificial neural networks and logistic regression models.

Authors:  Renata G Mendes; César R de Souza; Maurício N Machado; Paulo R Correa; Luciana Di Thommazo-Luporini; Ross Arena; Jonathan Myers; Ednaldo B Pizzolato; Audrey Borghi-Silva
Journal:  Arch Med Sci       Date:  2015-08-11       Impact factor: 3.318

10.  Industry ties and evidence in public comments on the FDA framework for modifications to artificial intelligence/machine learning-based medical devices: a cross sectional study.

Authors:  James Andrew Smith; Roxanna E Abhari; Zain Hussain; Carl Heneghan; Gary S Collins; Andrew J Carr
Journal:  BMJ Open       Date:  2020-10-14       Impact factor: 2.692

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.