Literature DB >> 35070381

Machine learning methods for perioperative anesthetic management in cardiac surgery patients: a scoping review.

Santino R Rellum^1,2, Jaap Schuurmans^1,2, Ward H van der Ven^1,2, Susanne Eberl¹, Antoine H G Driessen³, Alexander P J Vlaar², Denise P Veelo¹.

Abstract

BACKGROUND: Machine learning (ML) is developing fast with promising prospects within medicine and already has several applications in perioperative care. We conducted a scoping review to examine the extent and potential limitations of ML implementation in perioperative anesthetic care, specifically in cardiac surgery patients.
METHODS: We mapped the current literature by searching three databases: MEDLINE (Ovid), EMBASE (Ovid), and Cochrane Library. Articles were eligible if they reported on perioperative ML use in the field of cardiac surgery with relevance to anesthetic practices. Data on the applicability of ML and comparability to conventional statistical methods were extracted.
RESULTS: Forty-six articles on ML relevant to the work of the anesthesiologist in cardiac surgery were identified. Three main categories emerged: (I) event and risk prediction, (II) hemodynamic monitoring, and (III) automation of echocardiography. Prediction models based on ML tend to behave similarly to conventional statistical methods. Using dynamic hemodynamic or ultrasound data in ML models, however, shifts the potential to promising results.
CONCLUSIONS: ML in cardiac surgery is increasingly used in perioperative anesthetic management. The majority is used for prediction purposes similar to conventional clinical scores. Remarkable ML model performances are achieved when using real-time dynamic parameters. However, beneficial clinical outcomes of ML integration have yet to be determined. Nonetheless, the first steps introducing ML in perioperative anesthetic care for cardiac surgery have been taken. 2021 Journal of Thoracic Disease. All rights reserved.

Entities: Chemical

Keywords: Cardiac surgery; anesthesiology; artificial intelligence; machine learning; perioperative care

Year: 2021 PMID： 35070381 PMCID： PMC8743411 DOI： 10.21037/jtd-21-765

Source DB: PubMed Journal: J Thorac Dis ISSN： 2072-1439 Impact factor: 2.895

Introduction

Cardiac surgery has gone through many advancements since the first use of cardiopulmonary bypass in 1953 (1). Milestones have been achieved with mechanical circulatory support devices and improved surgical techniques like the introduction of minimally invasive hybrid cardiac surgery. These innovations have enabled the inclusion of elderly and more high-risk patients. In addition, the improvement of perioperative management has further reduced the risk of complications. For example, the assessment of cardiovascular performance has improved with continuous ventricular function monitoring using miniaturized transesophageal echocardiography (TEE) probes (2). Also, the implementation of goal-directed therapy (GDT) has yielded beneficial effects in cardiac surgery (3). These innovations provide crucial diagnostic information needed to aid perioperative supportive and preventive care. In recent years, machine learning (ML), a subset of artificial intelligence (AI), is the cause for a revolution in several medical fields (4-6), suggesting new possibilities within cardiac surgery. ML’s exponential growth in medicine is made possible by the availability of large datasets and improvement in computing power, as it is a computer-controlled technique that automates analytical model building. Three main types of machine learning are distinguished (7): (I) supervised ML is concerned with the training of a model towards a known target variable (outcome). By differing the weighted effect of given labeled inputs (e.g., age, sex, cholesterol level, smoking status), it minimizes the prediction error of the desired output (for example, having cardiovascular disease or not). Most applications in medicine apply this principle of machine learning, using either a classification or regression model. (II) Unsupervised ML is when the algorithm obtains unlabeled data (e.g., large sets of radiological or histological images) and attempts to find patterns. This is a more exploratory method as the algorithm decides what classes and patterns best describe the data. (III) Reinforcement learning is the technique that is perhaps key to surpassing human capability. This method learns what actions lead to the highest possible reward. This reward is predefined and usually custom-tailored to the problem at hand. In this case, a training set is absent, but it is created by the inputs the model receives through interaction with the environment. An example of such a reward is each time an autonomous vehicle stays within its lane. Through positive and negative reinforcement, the self-driving model learns what the required behavior is and what actions lead to that scenario. For now, the use of ML in medicine is mainly limited to supervised methods. The large quantities of data obtained during the perioperative phases of cardiac surgery are possibly suitable for a versatility of ML applications. Therefore, we have conducted a scoping review with the goal of identifying the full extent of current machine learning applications and their possible limitations for perioperative anesthetic management and risk assessment in cardiac surgery patients. We present the following article in accordance with the PRISMA-ScR reporting checklist (available at https://dx.doi.org/10.21037/jtd-21-765).

Methods

We performed a scoping review methodology as defined by Arksey and O’Malley (8) to examine the extent and nature of the currently employed machine learning methods within perioperative management of cardiac surgery patients. The preferred reporting items for systematic reviews and meta-analyzes extension for scoping reviews (PRISMA-ScR) checklist (9) was used to guide detailed reporting.

Search strategy

Searches were compiled by a clinical librarian for three databases [MEDLINE (Ovid), EMBASE (Ovid), and Cochrane Library], using the following keywords: cardiothoracic surgery, anesthesiology, and artificial intelligence, including all synonyms (complete list of keywords is included in Appendix 1). Three reviewers (SR, JS, and WV) independently selected the articles, reaching a consensus on all included studies.

Study selection

Articles were included in the review if the following criteria were met: (I) reported on perioperative applications of ML in the field of cardiac surgery with relevance to the work of the anesthetist or the postoperative intensive care unit (ICU) admission; (II) evaluated the performance of the applied AI technique in non-simulated datasets; (III) available in English; (IV) published in the last thirty years (1991 to present). Machine learning application (i.e., advanced method) was considered if a traditional statistical technique (i.e., conventional method) was trained and subsequently validated in different datasets. We excluded studies that solely focus on conventional methods, studies performed in children (age <18 years), or involving animals. In addition, editorials, commentary letters, and case reports were excluded.

Data extraction

For each article, data were extracted regarding: (I) perioperative phase; (II) type of cardiac surgery; (III) size of datasets; (IV) type of ML methods used; (V) area under the receiver operating characteristic curve (AUC). AUC, as a generalization referred to as the C-index, is used in this paper to provide insight into the mutual relationships of different models. An AUC value between 0.7 and 0.8 is considered satisfactory in this scoping review (10). A meta-analysis was not deemed appropriate given the wide variety of included subjects and ML techniques in the included studies.

Results

A total of 1,566 articles were identified, with a remainder of 1,142 articles after deduplication. Of these, 51 full-text studies were assessed for inclusion after the screening of titles and abstracts. Eighteen full-text articles were excluded for various reasons (). We found thirteen additional articles through citation tracking and non-systematic searches, resulting in 46 included articles. We identified three distinguishable categories: (I) event and risk prediction, (II) hemodynamic monitoring, and (III) automation of echocardiography.

Figure 1

Flow chart of the literature selection process for the present article.

Preoperative

Risk scores enable the assessment of preoperative risks to help in the stratification of patients. Additionally, they inform and guide patients and their relatives in shared-decision making and are used in cost-benefit analyzes. However, a known drawback of widely used risk scores in cardiac surgery is that they do not fit all (sub)populations, especially causing underperformance in high-risk patients (11).

Prediction of mortality

The established European System for Cardiac Operative Risk Evaluation (EuroSCORE) and the Society of Thoracic Surgeons (STS) score are based on conventional logistic regression analysis. These two clinical scores are among the most commonly used mortality risk scores, with AUCs ranging from 0.74 to 0.80 (). Important to note are the predictive discrepancies that persist for these scores in a few cardiothoracic operations and some subpopulations, especially in high-risk patients (33-38). Contrary to this underperformance, several advanced machine learning models demonstrated their added value in elderly and rheumatic heart disease (RHD) subpopulations. Within the elderly population, six perioperative variables (not further specified by the authors) were found to be strongly correlated with mortality. Based on those variables, a logistic regression (LR) model, Bayesian network (BN), and an artificial neural network (ANN) produced AUCs of respectively 0.854, 0.931, and 0.941, clearly outperforming the EuroSCORE that had an AUC of 0.648 in this population (19). Overall, the main mortality predictors in RHD were found to be left atrium size, high creatinine, tricuspid procedure, reoperation, and pulmonary hypertension. Using a random forest (RF) model, a new clinical score, the RheSCORE, was built on those predictors. With an area of 0.98, it outperforms the EuroSCORE II, which produces an AUC of 0.857 based on essentially the same predictors (25).

Table 1

Area under the curve values in validation datasets for mortality prediction at different time-points

Study	Surgery	Datasets		Phase^a	Model type^b (clinical score)		AUC^c	Definition mortality
Study	Surgery	Training	Test	Phase^a	Category	Subtype	AUC^c	Definition mortality
Mixed surgical population
Allyn et al. (12)	Mix	N/A		Preoperative	Conventional	LR (EuroSCORE II)	0.737	Postoperative, time point not specified
		4,564	1,956	Preoperative	Advanced	LR	0.742
			1,956	Preoperative		RF + NB + GBM + SVM	0.795
Nilsson et al. (13)	Mix	N/A		Preoperative	Conventional	LR (EuroSCORE I)	0.79	Death during hospitalization or ≤30 days after cardiac surgery
		13,771	4,591	Preoperative	Advanced	LR	0.78
			4,591	Preoperative		ANN	0.80
Peng, Peng (14)	Mix	N/A		Preoperative	Conventional	LR (Parsonnet)	0.829	Postoperative, time point not specified
		637	315	Pre-, and postoperative	Advanced	LR	0.852
			315	Pre-, and postoperative		ANN	0.873
Orr (15)	Mix	732	380	Pre-, and postoperative	Advanced	PNN	0.81	Not specified
Benedetto et al. (16)	Mix	N/A		Preoperative	Conventional	LR (EuroSCORE I)	0.76	Postoperative, in-hospital
				Preoperative		LR (EuroSCORE II)	0.77
		20,133	8,628	Preoperative	Advanced	LR	0.80
						RF	0.80
						Naïve Bayes	0.77
						ANN	0.77
Fernandes et al. (17)	Mix	3,761	1,254	Pre-, and intraoperative	Advanced	LR	0.80	Postoperative, time point not specified
						RF	0.83
						XGB	0.85
						SVM	0.66
						ANN	0.70
Zhong et al. (18)	Mix	5,475	1,369	Pre-, intra-, postoperative	Advanced	LR	0.86	30-day mortality
						RF	0.88
						XGBoost	0.90
						ANN	0.64
Celi et al. (19)	Mix in elderlyˆ	N/A		Preoperative	Conventional	LR (EuroSCORE I)	0.648	In-hospital, time point not specified
		116	49	Pre-, intra-, postoperative	Advanced	LR	0.854
						BN	0.931
						ANN	0.941
CABG and/or valve surgery
Kilic et al. (20)	CABG + valve	N/A		Preoperative	Conventional	LR (STS PROM)	0.795	Death during hospitalization or ≤30 days after cardiac surgery
Kilic et al. (20)	CABG + valve	10,071	1,119	Preoperative	Advanced	XGBoost	0.808
Lippmann, Shahian (21)	CABG	40,480	40,126	Preoperative	Advanced	LR	0.762	Not specified
						Bayesian model	0.748
						Committee classifier	0.764
						Single-layer MLP	0.754
						Two-layer MLP	0.761
						Three-layer MLP	0.761
Mendes et al. (22)	CABG	1,053	262	Pre-, intra-, postoperative	Advanced	LR	0.86	Death 30-day after CABG
Mendes et al. (22)	CABG		262	Pre-, intra-, postoperative		ANN	0.85	Death 30-day after CABG
Tu, Guerriere (23)	CABG	4,782	5,517	Preoperative	Advanced	LR	0.77	Postoperative, time point not specified
Tu, Guerriere (23)	CABG		5,517	Preoperative		ANN	0.78	Postoperative, time point not specified
Lippmann (24)	CABG	1,257^†		Pre-, intra-, postoperative	Advanced	LR	0.705^‡	Not specified
						Single-layer MLP	0.760^‡
						MLP
						MLP-Committee
Mejia et al. (25)	Valve in RHD	N/A		Preoperative	Conventional	LR (B-Parsonnet)	0.876	Death during hospitalization or ≤30 days after cardiac surgery
						LR (EuroSCORE II)	0.857
						LR (InsCor)	0.835
						LR (AmblerSCORE)	0.831
						LR (Guaragna)	0.816
						LR (New York)	0.834
		2,919^†		Preoperative	Advanced	RheSCORE¹	0.98
Heart transplantation
Yoon et al. (26)	HTx	N/A		Preoperative	Conventional	LR (DRI)	0.529	Generalization of four time point at 3-month, 1-, 3-, and 10-year
						LR (IMPACT)	0.527
						LR (RSS)	0.544
		66,306	16,576	Preoperative	Advanced	ToPs/R²	0.577
Nilsson et al. (27)	HTx	N/A		Preoperative	Conventional	LR (DRI)	0.56	1-year mortality
						LR (IMPACT)	0.61
						LR (RSS)	0.61
		41,780	8,569	Preoperative	Advanced	IHTSA³	0.650
Shah et al. (28)	HTx	4,054^†		Preoperative	Advanced	LR	0.60	1-year mortality or retransplantation
Shah et al. (28)	HTx			Preoperative		ML model not specified	0.64	1-year mortality or retransplantation
Villela et al. (29)	HTx	18,612^†		Preoperative	Advanced	LR	0.62	1-year mortality or retransplantation
Villela et al. (29)	HTx			Preoperative		Stacking of GBM	0.66	1-year mortality or retransplantation
Bravo et al. (30)	HTx after LVAD	7,700^†		Preoperative	Advanced	LR	0.63	1-year mortality or retransplantation
Bravo et al. (30)	HTx after LVAD			Preoperative		ML model not specified	0.61	1-year mortality or retransplantation
Miller et al. (31)	HTx	45,182	11,295	Preoperative	Advanced	LR	0.65	1-year mortality
						Ridge regression	0.65
						Regression LASSO	0.65
						RF	0.63
						NB	0.61
						TA-NB	0.62
						SVM	0.52
						SGB	0.64
						ANN	0.66
Agasthi et al. (32)	HTx	12,189	3,047	Pre-, intra-, postoperative	Advanced	GBM	0.717	5-year mortality

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, ensemble of thirteen advanced models; 2, trees of predictors based on three regression methods (cox regression, linear perceptron, and logistic regression); 3, international Heart Transplant Survival Algorithm based on an artificial neural network model. †, ratio between training and validation set not reported; ‡, not all values are extractable as they are mainly displayed in bar graphs; ˆ, ≥80 years. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.); AUC, area under the receiving operating characteristics curve for the validation sets; BN, Bayesian network; B-Parsonnet, 2000 Bernstein-Parsonnet score; CABG, coronary artery bypass graft surgery; GBM, gradient-boosted machine; HTx, heart transplantation; LASSO, least absolute shrinkage and selection operator; LVAD, left ventricular assist device; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; ML, machine learning model; MLP, multilayer sigmoid neural network; TA-NB, tree-augmented NB; NB, Naïve Bayes; PNN, probabilistic neural network; RF, random forest; RHD, rheumatic heart disease; SGB, stochastic gradient boosting; SVM, support-vector machines; Valve, heart valve surgery; XGBoost, extreme gradient boosting. However, in a mixture of cardiac surgery procedures, the two aforementioned clinical scores perform similarly or slightly less than advanced models (12,13,20). An ANN yielded comparable predictive properties to the EuroSCORE (AUC 0.80 vs. 0.79), with only a small advantage in the case of valve procedures (AUC 0.76 vs. 0.72, P value =0.0001) (13). Assembling four ML models [gradient boosting machines (GBM), RF, support vector machines (SVM), and Naïve Bayes (NB)] created a significant but modest benefit with an AUC of 0.795 versus 0.737 for the EuroSCORE II (12). Similarly, modest advantages in accuracy and AUC were seen comparing an advanced ML model [extreme gradient boosting machine (XGBoost)] to the STS clinical score. Interestingly, despite both the STS score and the XGBoost being well-calibrated and having a high area under the curve (respectively 0.808 and 0.795), they identified a large proportion of different patients as being at risk (20). Even one of the first clinical scores, the Parsonnet score, still holds value in predicting in-hospital mortality with a comparable AUC to an advanced LR and ANN model (0.829, 0.852, and 0.873, respectively) (14). Also, when comparing advanced ML methods, little difference in predicting performance is seen (14-16,21,22,39,40), with only a slight advantage for nonlinear models [ANN, BN, and multilayer sigmoid perceptron (MLP)] over linear LR models (13,19,24). The majority of these studies use a set of preoperative data, including demographic characteristics, medical history, and type of surgery performed. Adding intraoperative hypotension as a dynamic parameter to these preoperative data showed improved AUCs for advanced LR, RF, and XGBoost models. At the same time, an SVM and ANN did not benefit from this added parameter, outputting AUCs of 0.66 and 0.70, respectively (17).

Risk survival scores in heart transplantation

There are currently three main risk scores for heart transplant patients based on conventional logistic regression methods: the Donor Risk Index (DRI) (41), the Index for Mortality Prediction After Cardiac Transplantation (IMPACT) (42), and the Risk-Stratification Score (RSS) (43). These produce C-indices ranging from 0.55 to 0.57 for overall survival (). We identified two studies that compared an advanced model directly to these risk scores, obtaining slightly better performances (AUCs between 0.62 to 0.66) (26,27). Comparing only advanced models in their ability to predict 1-year mortality, both linear and nonlinear models show similar results with moderate AUCs consistently ≤0.66 (28-31). Only in predicting 5-year mortality after heart transplantation, an advanced GBM model transcended other machine learning models, generating an area of 0.717 (32). A different application of advanced modeling was used to stratify patients on a heart transplant waiting list. The applied neural network only moderately determined the most likely patient status: still waiting, transplanted, or deceased at three different time points (44).

Intra-operative

A staggering eighty percent of all intra-operative alarms in cardiac surgery, mainly hemodynamic warnings, do not require therapeutic intervention (45). Many redundant alarms involve artifacts or expected procedure-specific events. This is not fully acceptable as it can cause distraction or alarm fatigue (46). Advancements can be made to reduce the multitude of distracting alarms, as shown in a few AI applications in this chapter.

Predictions of hemodynamic instability

In 1997, Becker et al. (47) developed a monitoring system based on fuzzy logic to provide a continuous intuitive descriptive overview of a patient’s hemodynamic status (e.g., ‘preload is too high’). These hemodynamic interpretations were based on vital parameters and administered anesthetics. The validation process demonstrated promising results with a predictability of 99.5%. Compared to a simple threshold alarm, this system can help the physician to interpret changes quickly. The hypotension prediction index (HPI) (48) is another monitoring application. It is an advanced logistic regression-based model that can predict a hypotensive event (mean arterial pressure <65 mmHg for at least one minute), regardless of current blood pressure, up to 15 minutes in advance (48). The model was developed using large datasets, including cardiac surgery patients. A recently published study demonstrated the high predictive capability of the HPI solely in cardiac surgery (49). ML can also be used to identify relationships between risks, as demonstrated with three advanced RF models that adequately found cardiopulmonary bypass associated factors contributing to a reduction in right ventricular (RV) function (50).

Automation of intraoperative echocardiography (IOE)

Two articles were identified on the automation of ultrasound assessments that have the potential to enable a more efficient intraoperative workflow (51,52). As RV function analysis is both challenging and time-consuming, an AI-based automated RV strain assessment was compared with the most commonly used parameters: tricuspid annular plane systolic excursion (TAPSE), tissue Doppler-derived systolic tricuspid annulus motion velocity (S’), and RV fractional area change (FAC). A strong correlation was found between FAC and global longitudinal strain (GLS) over various RV function measurements on three different ultrasound machines (51). The second AI application in ultrasound automation relates to the analysis of the mitral valve (MV) (52). Patients with a normal biventricular function who underwent an elective CABG surgery were included for ultrasound imaging to evaluate the clinical applicability and accuracy of an AI-based MV analysis software. An experienced echocardiographer captured three end-systolic frames of the MV in each patient. Postoperatively, these frames were analyzed with the AI software. The software automatically traced the valves, and three experienced examiners independently verified the valve tracings. Thus, creating three separate datasets for all frames, as the examiners could administer minor manual adjustments when deemed necessary. Subsequently, the software’s six clinically relevant geometric parameters were calculated from the verified MV tracings (annulus anterolateral posteromedial diameter, annulus anteroposterior diameter, annular area, annulus nonplanarity angle, annulus total perimeter, and anterior and posterior leaflet areas). Statistical analyses showed a high precision for the calculated parameters in corresponding end-systolic frames in which only the valve tracings were verified by different examiners. Meaning that the latter did not affect the outcome (52).

Postoperative

Traditionally, risk scores were developed for mortality prediction alone. Only recently, morbidity has been incorporated in these models as they provide a marker for the quality of life (53). So far, most risk scores for postoperative outcomes are based on preoperative inputs and lack incorporation of intraoperative variables to improve on performance (54).

Morbidity in the ICU

The previously mentioned Parsonnet score initially developed for mortality prediction also generates acceptable AUCs in morbidity prediction concerning cardiovascular, respiratory, and neurological complications. Addressing these same outcomes, two advanced models (LR and ANN) show even better predictive capability in comparison, with the most significant advantage in predictive power for the ANN model with an AUC of 0.85 (14) (). In a recent study comparing advanced models reciprocally, an XGBoost model had the upper hand over ANN (18). However, these are outliers in terms of morbidity prediction. Most comparative studies in cardiac surgery show a reasonable predictive value for all advanced ML models with AUCs around 0.77 (55,56,62).

Table 2

Area under the curve values in validation datasets for postoperative morbidity prediction

	Surgery	Datasets		Phase^a	Model type^b (clinical score)		AUC^c
	Surgery	Training	Test	Phase^a	Category	Subtype	AUC^c
Miscellaneous¹
Cevenini et al. (55)	CABG	545	545	Pre-, intra-, postoperative	Advanced	LR	0.781
						BL	0.778
						BQ	0.785
						HS	0.768
						DS	0.779
						k-NN	0.772
						ANN1	0.776
						ANN2	0.778
Chong et al. (56)	CABG	N/A		Preoperative	Conventional	LR (QMMI score)	0.752
		423	140	Preoperative	Advanced	LR	0.807
		423	140	Preoperative	Advanced	ANN	0.886
Peng, Peng (14)	Mix	N/A		Preoperative	Conventional	LR (Parsonnet)	0.727
		637	315	Pre-, and postoperative	Advanced	LR	0.789
		637	315	Pre-, and postoperative	Advanced	ANN	0.852
Secluded morbidities
Zhong et al. (18)	Mix	5,475	1,369	Septic shock
				Pre-, intra-, postoperative	Advanced	LR	0.93
						RF	0.81
						XGBoost	0.96
						ANN	0.88
				Thrombocytopenia
				Pre-, intra-, postoperative	Advanced	LR	0.87
						RF	0.89
						XGBoost	0.89
						ANN	0.83
				Liver dysfunction
				Pre-, intra-, postoperative	Advanced	LR	0.82
						RF	0.89
						XGBoost	0.89
						ANN	0.70
Mufti et al. (57)	Mix	4,476	1,117	Agitated delirium
				Pre-, intra-, postoperative	Advanced	LR	0.814
						RF	0.813
						NB	0.799
						BN	0.774
						SVM	0.811
						DT	0.772
						ANN	0.804
Acute kidney injury
Lei et al. (58)	Aortic arch	627	270	Pre-, intra-, postoperative	Advanced	LR	0.65
						RF	0.71
						SVM	0.64
						LGM	0.80
Tseng et al. (59)	Mix	470	201	Pre-, and intraoperative	Advanced	LR	0.806
						RF	0.839
						DT	0.781
						XGboost	0.837
						SVM	0.825
						RF+XGBoost	0.843
Lee et al. (60)	Mix	1,005	1,005	Pre-, intra-, postoperative	Advanced	LR	0.70
						RF	0.68
						DT	0.71
						XGBoost	0.78
						SVM	0.69
						NN classifier	0.64
						Deep learning	0.55
Penny-Dimri et al. (61)	Mix	N/A		Preoperative	Conventional	LR (Cleveland Clinic)	0.71
						LR (Risk score)	0.74
						LR (Risk score)	0.75
		77,322	19,331	Preoperative	Advanced	LR	0.76
						GBM	0.76
						k-NN	0.66
						ANN	0.76
				Pre-, and intraoperative	Advanced	LR	0.77
						GBM	0.78
						k-NN	0.67
						ANN	0.77

a, perioperative phase: pre-, intra, postoperative used variables in prediction models; b, distinction between conventional and advanced models is explained in the methods section; c, definitions of both the AUC and C-index is given in the methods section. 1, Mix of cardiovascular, respiratory, neurological, renal, infectious, and hemorrhagic complications. ANN (1, 2, etc.), artificial neural network (one-layer, two-layer, etc.). AUC, area under the receiving operating characteristics curve for the validation sets; BL, Bayes linear; BN, Bayesian network; BQ, Bayes quadratic; CABG, coronary artery bypass graft surgery; DS, direct score; DT, decision trees; GBM, gradient-boosted machine; HS, Higgins score; k-NN, k-nearest neighbor; LGM, light gradient machine; LR, logistic regression; Mix, various cardiac surgery patients with/without heart transplantation; NN, neural network; NB, Naïve Bayes; RF, random forest; SVM, support-vector machines; XGBoost, extreme gradient boosting. Acute kidney injury (AKI) is a common complication after cardiac surgery (63). Isolating patients at risk for AKI or renal replacement therapy (RRT) could guide perioperative treatment. Advanced predictive models based on GBM, LR, and ANN showed superior ability in identifying patients at risk for AKI and RRT as opposed to conventional risk scores based on LR (64,65). Overall, reasonable predictions in AKI prediction are seen for conventional and advanced models in most articles with AUCs ranging from 0.66 to 0.84 (61). Looking at individual studies in which various advanced models are directly compared with each other, it is noticeable that both MLP and XGBoost models are often better (24,58-60). Lastly, promising results have been found in evaluating the need for early continuous venovenous hemofiltration (CVVH) after cardiac surgery with comparable and accurate predictive results for both an ANN and an advanced LR model (66). Prevention and early recognition of delirium are essential as it is associated with poor outcomes (67). We identified one study on this topic. It cross-examined seven advanced models comparing their performance in an imbalanced dataset (integral dataset) to their performance in a balanced dataset (i.e., 10-fold cross-validation applied). This was done in order to reduce overestimation. In line with their expectation, the predictive values of the models showed better performance in the balanced sets, with the best predictions for an LR and RF model and the least for a BN model that still performed sufficiently with an AUC of 0.77 (57).

Length-of-stay

Accurate estimation of ICU length-of-stay (LOS) is not only advantageous in the counseling of patients and their families but even so in the organization of the bed capacity and scheduling of the operating rooms. More so in recent times, with the increased scarcity of ICU beds due to the ongoing COVID-19 pandemic (68). The conventional EuroSCORE I is positively correlated with prolonged LOS, making it a suitable predicting tool for LOS (69). We identified one article demonstrating the superiority of an advanced ML model to the EuroSCORE I. It outperformed other advanced models as well and showed similar distinctiveness to physicians’ LOS predictions (70) (). Other comparing data suggest that ANNs outperform other advanced models regarding LOS (76). By itself, an ANN developed in 1993 showed that it successfully stratified cardiothoracic surgery patients at risk of extended stay (>2 days) with an AUC of 0.70 (23). These promising results are even outperformed when ANNs are ensembled (72). Although slightly more modest in performance, advanced regression models still produce acceptable LOS predictions with AUCs ranging from 0.83 to 0.87 (73).

Table 3

Area under the curve values in validation datasets for prediction of additional-, prolonged-, or re-intervention and/or care

	Surgery	Datasets		Phase^a	Model type^b (clinical score)		AUC^c
	Surgery	Training	Test	Phase^a	Category	Subtype	AUC^c
Renal replacement and CVVH
Penny-Dimri et al. (61)	Mix	N/A		Preoperative	Conventional	LR (Cleveland Clinic)	0.80^d
						LR (Risk score)	0.80^d
						LR (Risk score)	0.81^d
		77,322	19,331	Preoperative	Advanced	LR	0.82^d
						GBM	0.83^d
						k-NN	0.68^d
						ANN	0.82^d
				Pre-, and intraoperative	Advanced	LR	0.84^d
						GBM	0.85^d
						k-NN	0.69^d
						ANN	0.84 ^d
Bent et al. (66)	CABG + valve surgery	30	35	Perioperative	Advanced	LR	0.89^e
Bent et al. (66)	CABG + valve surgery	30	35	Perioperative		ANN	0.90^e
Prolonged mechanical ventilation and reintubation
Wise et al. (71)	CABG	N/A		Preoperative	Conventional	LR	0.698^f
Wise et al. (71)	CABG	590	148	Preoperative	Advanced	ANN	0.714^f
Mendes et al. (22)	CABG	1,053	262	Pre-, intra-, postoperative	Advanced	LR	0.67^f
						ANN	0.72^f
						LR	0.62^g
						ANN	0.65^g
Length of stay
Rowan et al. (72)	Mix	480	240	Pre-, intra-, postoperative	Advanced	Ensemble ANNs	0.901
Barbini et al. (73)	CABG + valve surgery	2,605	651	Pre-, intra-, postoperative	Advanced	NB	0.859
Meyfroidt et al. (70)	Mix	N/A		Preoperative	Conventional	LR (EuroSCORE I)	0.726
				Pre-, intra-, postoperative		Nurses’ prediction	0.695
						Physician’s prediction	0.758
		461	499		Advanced	Gaussian processes	0.758
30-day readmission
Manyam et al. (74)	CABG	1,042	261	Time-independent¹	Advanced	XGBoost	0.627
Manyam et al. (74)	CABG	1,042	261	Time-dependent + time-independent¹	Advanced	XGBoost	0.868
Engoren et al. (75)	CABG	2,644	2,711	Pre-, intra-, postoperative	Advanced	LR	0.644
						Genetic programs	0.654
						ANN	0.537
Graft failure at 5 years
Agasthi et al. (32)	HTx	12,189	3,047	Pre-, intra-, postoperative	Advanced	GBM	0.716

Mechanical ventilation

We identified two studies that elaborate on the prediction of prolonged mechanical ventilation and the chance of re-intubation. Both studies performed in a CABG subpopulation show minor differences in accuracy, sensitivity, and specificity in favor of an ANN over an advanced LR model (22,71).

Readmission

Given the high costs associated with readmission after hospital discharge, the ability to stratify the risk is essential for preventive measures. Improving upon existing conventional LR prediction models solely based on time-independent variables (e.g., 1-point lab values only postoperative) (77-79), an advanced XGBoost algorithm incorporating time-dependent factors (e.g., lab values at several time-points) demonstrated a better accuracy in predictive ability (74). Another but more complex ML tool called genetic programs performed equally well in accuracy to an advanced LR method. In contrast, an advanced ANN model in the same study showed a significantly worse predictive ability (75).

Discussion

This scoping review includes forty-six articles describing various ML techniques in cardiac surgery patients with relevance to perioperative anesthetic management and risk assessment. We identified three specific applications with the majority (n=41) on prediction analyses (e.g., mortality, AKI, readmission), three articles on hemodynamic monitoring, including a form of prediction, and two studies that elaborate on ultrasound guidance. The combined overall data suggest that the current applications of ML techniques on stationary variables (e.g., hemodynamic parameter at one time-point) in the cardiac surgery population perform similar to conventional statistical methods (not using a training and validation set) concerning predictive capability. In between ML methods, complex or straightforward in construction, only GBM more often shows superior outcomes than others. For one study, that can in part be attributed to the relatively updated registry it used (32). Major differences, however, are absent, with also high correlations in between these models, suggesting that they find similar patterns. Although major predictive improvements are not seen for single ANNs, it is beneficial to use them in an ensemble (27,72). However, the true power of ML seems to be triumphant when applied to more complex data such as full dynamic arterial waveforms or complex ultrasound images as opposed to stationary perioperative variables. Using these parameters yields real-time clinically insightful results (48,51,52) that are a valuable addition to current dynamic parameters (e.g., heart rate, stroke volume variation). Contemporary literature lacks data on clinical outcomes from ML implementation in the cardiac surgery population. Still, beneficial results are probably not long in coming, given the effectiveness seen in a different surgical population using real-time dynamic data in an ML model (80). In contrast, comparable performances or modest improvements in prediction models are similar to other medical fields (81,82). An explanatory factor in this may be that these implementations are often based on manageable datasets that do not use an uncountable number of variables. While the strength of some advanced ML models is attributed to their ability to establish nonlinear relationships between variables in complex datasets that conventional methods have not previously demonstrated. In line with this, one of our included studies showed that the predicted risk correlation between an advanced and conventional model was very low. Although comparable in prediction, this suggests that they did not assign their prediction to the same features. Besides, the high complexity of ML models in small datasets poses a risk for overfitting (83). This occurs when irrelevant characteristics in the training data are marked as predictive parameters, causing underperformance in the test set (84). This problem might be avoided by implementing cross-validation as demonstrated for delirium in cardiac surgery (57). Still, traditional linear statistics may be more suitable for risk prediction models (85), as advanced models perform similarly but are more complex to develop. Although not convincing in risk prediction models, ML does excel in datasets consisting of dynamic parameters. Future research should focus on these real-time applications of ML exploring patterns in complex datasets. Then promising results can be achieved, as demonstrated by the effective hypotension early warning system by Hatib et al. (48,49) and the automation of echocardiography in two other studies (51,52). Not only aimed at the development and validation of such models but also their clinical effectiveness in randomized controlled trials should be addressed.

Future directions and challenges

Anesthesia is pre-eminently the field where many dynamic physiological data can be collected digitally, especially in cardiac surgery, where the mean operative time is about three hours (86). The current application of anesthesia information management systems (AIMS) is expected to be well above 80% in academic centers (87). Future incorporation of machine learning into AIMS could facilitate the continuous development of these models, unlocking their full potential based on a regularly updating and expanding dataset. Still, it might be safer only to update ML models in controlled research settings. Especially as neural networks, in particular, are not transparent and, at best opaque in how and what variables are processed in these algorithms (7,88). Not without reason, there is growing interest in explainable artificial intelligence, an AI in which decisions made by the model are transparent and better interpretable (89). Nowadays, the application of ML models is approved by the US Food & Drugs Administration (FDA) and the European Commission when the algorithm in patients cannot improve on its capabilities (so-called locked models). This is done to ensure consistent, reproducible results and safety from the algorithm (90). The FDA is currently assessing regulatory modification possibilities (91) that enable the use of “unlocked” ML, taking into account potential safety issues. There is still plenty of work to be done in the application and clinical evaluation of promising “locked” ML methods based on various perioperative dynamic variables. We suggest that future clinical trials implementing ML models address the following three primary outcomes: (I) will patient outcomes improve with ML-based diagnostic and treatment guidance, (II) does it improve workflow efficiency, and (III) is it cost-effective.

Limitations

Although we conducted a systematic search, we might have missed articles due to the broad range of included topics and acronyms in the literature. This may have led to the incorrect exclusion of studies from the initial selection. Another limitation of our article is that we only descriptively summarized the data without a meta-analysis. Therefore, definitive conclusions cannot be drawn about AUC differences across different methodologies. Nevertheless, this article provides an overview of the current ML applications per perioperative phase in cardiac surgery, showcasing where research is still needed.

Conclusion

Machine learning in cardiac surgery is being applied in perioperative anesthetic management and risk assessment. They are generally yielding comparable predictive outcomes to existing clinical scores. With the exception that models implementing dynamic variables obtain promising results. However, there is still a need for data on clinical outcomes after using ML-based models for diagnostic and treating guidance. The article’s supplementary files as

85 in total

1. Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks.

Authors: Johan Nilsson; Mattias Ohlsson; Lars Thulin; Peter Höglund; Samer A M Nashef; Johan Brandt
Journal: J Thorac Cardiovasc Surg Date: 2006-07 Impact factor: 5.209

2. Artificial Intelligence in Anesthesiology: Current Techniques, Clinical Applications, and Limitations.

Authors: Daniel A Hashimoto; Elan Witkowski; Lei Gao; Ozanan Meireles; Guy Rosman
Journal: Anesthesiology Date: 2020-02 Impact factor: 7.892

3. The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors.

Authors: Michael Rowan; Thomas Ryan; Francis Hegarty; Neil O'Hare
Journal: Artif Intell Med Date: 2007-06-18 Impact factor: 5.326

Review 4. High-performance medicine: the convergence of human and artificial intelligence.

Authors: Eric J Topol
Journal: Nat Med Date: 2019-01-07 Impact factor: 53.440

5. Machine learning-based prediction of acute severity in infants hospitalized for bronchiolitis: a multicenter prospective study.

Authors: Yoshihiko Raita; Carlos A Camargo; Charles G Macias; Jonathan M Mansbach; Pedro A Piedra; Stephen C Porter; Stephen J Teach; Kohei Hasegawa
Journal: Sci Rep Date: 2020-07-03 Impact factor: 4.379

6. Acute kidney injury after cardiac surgery: prevalence, impact and management challenges.

Authors: M Vives; A Hernandez; F Parramon; N Estanyol; B Pardina; A Muñoz; P Alvarez; C Hernandez
Journal: Int J Nephrol Renovasc Dis Date: 2019-07-02

7. A Database-driven Decision Support System: Customized Mortality Prediction.

Authors: Leo Anthony Celi; Sean Galvin; Guido Davidzon; Joon Lee; Daniel Scott; Roger Mark
Journal: J Pers Med Date: 2012-09-27

8. A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

Authors: Gabriele Cevenini; Emanuela Barbini; Sabino Scolletta; Bonizella Biagioli; Pierpaolo Giomarelli; Paolo Barbini
Journal: BMC Med Inform Decis Mak Date: 2007-11-22 Impact factor: 2.796

9. Predicting reintubation, prolonged mechanical ventilation and death in post-coronary artery bypass graft surgery: a comparison between artificial neural networks and logistic regression models.

Authors: Renata G Mendes; César R de Souza; Maurício N Machado; Paulo R Correa; Luciana Di Thommazo-Luporini; Ross Arena; Jonathan Myers; Ednaldo B Pizzolato; Audrey Borghi-Silva
Journal: Arch Med Sci Date: 2015-08-11 Impact factor: 3.318

10. Industry ties and evidence in public comments on the FDA framework for modifications to artificial intelligence/machine learning-based medical devices: a cross sectional study.

Authors: James Andrew Smith; Roxanna E Abhari; Zain Hussain; Carl Heneghan; Gary S Collins; Andrew J Carr
Journal: BMJ Open Date: 2020-10-14 Impact factor: 2.692