Literature DB >> 35928743

Artificial intelligence and anesthesia: a narrative review.

Valentina Bellini¹, Emanuele Rafano Carnà¹, Michele Russo¹, Fabiola Di Vincenzo², Matteo Berghenti², Marco Baciarello¹, Elena Bignami¹.

Abstract

Background and Objective: The aim of this narrative review is to analyze whether or not artificial intelligence (AI) and its subsets are implemented in current clinical anesthetic practice, and to describe the current state of the research in the field. AI is a general term which refers to all the techniques that enable computers to mimic human intelligence. AI is based on algorithms that gives machines the ability to reason and perform functions such as problem-solving, object and word recognition, inference of world states, and decision-making. It includes machine learning (ML) and deep learning (DL).
Methods: We performed a narrative review of the literature on Scopus, PubMed and Cochrane databases. The research string comprised various combinations of "artificial intelligence", "machine learning", "anesthesia", "anesthesiology". The databases were searched independently by two authors. A third reviewer would mediate any disagreement the results of the two screeners. Key Content and Findings: The application of AI has shown excellent results in both anesthesia and in operating room (OR) management. In each phase of the perioperative process, pre-, intra- and postoperative ones, it is able to perform different and specific tasks, using various techniques. Conclusions: Thanks to the use of these new technologies, even anesthesia, as it is happening for other disciplines, is going through a real revolution, called Anesthesia 4.0. However, AI is not free from limitations and open issues. Unfortunately, the models created, provided they have excellent performance, have not yet entered daily practice. Clinical impact analyzes and external validations are needed before this happens. Therefore, qualitative research will be needed to better understand the ethical, cultural, and societal implications of integrating AI into clinical workflows. 2022 Annals of Translational Medicine. All rights reserved.

Entities: Chemical

Keywords: Artificial intelligence (AI); anesthesia; deep learning (DL); machine learning (ML); perioperative medicine

Year: 2022 PMID： 35928743 PMCID： PMC9347047 DOI： 10.21037/atm-21-7031

Source DB: PubMed Journal: Ann Transl Med ISSN： 2305-5839

Introduction

Artificial intelligence (AI) is generally accepted as having started with the invention of robots. The term robot itself entered the international vocabulary through the Czech writer Karel Capek’s play, “R.U.R” (Rossumovi univerzální roboti, Rossum’s Universal Robots, 1921) (1). AI is a general expression which refers to all the techniques that enable computers to mimic human intelligence. It is based on algorithms that gives machines the ability to reason and perform functions such as problem-solving, object and word recognition, inference of world states, and decision-making (2). It includes machine learning (ML) and deep learning (DL) (3). ML allows a computer to improve performance with experience and often involves training an algorithm by exposing it to ‘training data’. There are three types of ML algorithms: (I) supervised learning, that uses labeled datasets to train algorithms to classify data or predict outcomes accurately; it focuses on classification of new data and on prediction of unknown parameters. (II) Unsupervised learning, which refers to algorithms identifying patterns or structure within a dataset where there are no outputs to predict. These algorithms are useful to find novel ways of classifying patients, drugs, or other groups to generate hypotheses generation for future research. (III) Reinforcement learning based on algorithms that attempt certain tasks and learn from their subsequent successes and mistakes (4). DL is a subset of ML, using artificial neural networks (ANN) organized in several layers. ANN use multiple layers of calculations to imitate the concept of how the human brain interprets and draws conclusions from information. DL is characterized by multiple hidden node layers that learn representations of data by abstracting it in many ways. Where DL is differentiated from a simple neural network (NN) is that the number of layers of nodes is increased and the overall size of the network is larger, allowing for complex interrelationships to be represented more accurately (4). Medicine is essentially a continually in progress domain and most medical data are inherently imprecise. For these reasons Boolean or conventional logic, which uses sharp distinctions, i.e., 0 for false and 1 for true, is not always suitable for analyzing medical data. In 1965 Lofti Zadeh, an engineer from the University of California, popularized the ‘fuzzy’ logic which uses continuous set membership from 0 to 1. Fuzzy logic (FL) is a data handling methodology that permits ambiguity and hence is particularly suited to medical applications (2). The great potential of the use of AI in healthcare is widely recognized. Among the purposes of its exploitation in this context we find precision medicine, optimization of available resources and reduction of inequalities (5). AI is achieving excellent results in every field of medicine, including medical diagnosis, medical treatment, drug production, clinical management and medical education (6). For example, the usefulness of AI in reducing costs and sparing time has been demonstrated applying ML algorithms in patients with osteoporosis and Paget’s disease, managing to identify the best therapeutic combination possible, reducing drug-drug interactions (7). In this regard the field of anesthesia is no exception. Indeed, the wealth of data made available by continuous monitoring makes anesthesia a particularly favorable field for the application of new AI technologies. The aim of this narrative review is to analyze whether or not AI and its subsets are implemented in current clinical anesthetic practice, and to describe the current state of the research in the field. We present the following article in accordance with the Narrative Review reporting checklist (available at https://atm.amegroups.com/article/view/10.21037/atm-21-7031/rc).

Methods

We performed a narrative review of the literature on Scopus, PubMed and Cochrane databases. The research string comprised various combinations of “artificial intelligence”, “machine learning”, “anesthesia”, “anesthesiology”. Two authors searched the database independently. A third reviewer mediated any disagreements in the results of the two screeners. To be included, papers had to be focused on the application of AI-based algorithms in the practice of anesthesia, including preoperative, intraoperative, postoperative, and operating room (OR) management. All English-language papers from 2015 to December 2021, were eligible. Peer-reviewed, published literature, including narrative review papers, were eligible for inclusion. Studies involving animals, editorials, letters to the editor, and abstracts were excluded. Reference lists of included papers were hand-searched and included if the inclusion criteria were met. Search strategy is summarized in .

Table 1

The search strategy summary

Items	Specification
Date of search	18/12/2021
Databases and other sources searched	Scopus, PubMed and Cochrane databases
Search terms used	“Artificial intelligence”; “machine learning”; “anesthesia”; “anesthesiology”
Timeframe	From 2015 to December 2021
Inclusion and exclusion criteria	Inclusion criteria:⬥ Focus on the application of AI-based algorithms in the practice of anesthesia, including preoperative, intraoperative, postoperative, and OR management⬥ English-language papers⬥ Peer-reviewed, published literature, including narrative review papers
Inclusion and exclusion criteria	Exclusion criteria:⬥ Main topic not related to the application of AI in anesthesia⬥ Studies involving animals⬥ Editorials, letters to the editor, and abstracts⬥ Non-English-language articles
Selection process	Two authors searched the database independently. A third reviewer mediated any disagreements between the two researchers

OR, operating room.

Discussion

On the basis on the included articles, we identified the following categories of studies: (I) AI in pre-operative anesthesia; (II) AI in intra-operative anesthesia; (III) AI in post-operative anesthesia; (IV) AI in OR management ().

Table 2

Comparative table of the most representative studies on the use of AI in anesthesia included in our narrative review

Timing	Type	Main topic	AI technique	Journal/book	Authors	Year	Main results
Pre-operative	OP	Airways evaluation	ML	BMC Anesthesiol	Kim et al.	2021	Random forest algorithm was best AUROC =0.72–0.86, AUC-PR =0.27–0.37
	OP	Airways evaluation	DL	Comput Biol Med	Tavolara et al.	2021	Using convolutional NN and attention-based multiple instance learning models authors obtain AUC of 0.7105
	OP	Risk stratification	ML	Ann Surg	Bihorac et al.	2019	“MySurgeryRisk”: postoperative complications AUC 0.82–0.94, risk for death AUC 0.77–0.83
	OP	Risk stratification	ML	Surgery	Brennan et al.	2019	“MySurgeryRisk”: AUROC 0.73–0.85
	OP	Risk stratification	ML	JAMA Netw Open	Xue et al.	2021	AUROCs for pneumonia (0.903–0.907), AKI (0.846–0.851), DVT (0.878–0.884), pulmonary embolism (0.824–0.839) and delirium (0.759–0.765)
	OP	Risk stratification	ML	J Med Syst	Zhang et al.	2018	Random forest algorithm perform better than other’s and achieves AUC of 0.884 for distinguishing ASA PS 1–2 against 3–4
Intra-operative	GA	Closed-loop anesthesia	ML	Anesth Analg	West et al.	2018	The overall control performance indicator, global score, was a median (interquartile range) 18.3 (14.2–27.7) in phase I and 14.6 (11.6–20.7) in phase II (median difference, −3.25; 95% confidence interval: −6.35 to −0.52)
	GA	Managing intraoperative pain	ML	Comput Biol Med	Gonzalez-Cava et al.	2020	Efficiency of the SVM classifier using ANI as a guidance variable: accuracy: 86.21% (83.62–87.93%), precision: 86.11% (83.78–88.57%), recall: 91.18% (88.24–91.18%), specificity: 79.17% (75–83.33%), AUC: 0.89 (0.87–0.90) and kappa index: 0.71 (0.66–0.75)
	GA	Monitoring the DoA	DL and NN	IEEE J Biomed Heal Informatics	Afshar et al.	2021	The proposed methods achieves root mean square error of 5.59±1.04, mean absolute error of 4.3±0.87 and AUC of 81.11±5.27
	GA	Monitoring the DoA	NN	Sensors (Basel)	Gu et al.	2019	The accuracy of detecting each state was 86.4% (awake), 73.6% (light anesthesia), 84.4% (GA), and 14% (deep anesthesia). The correlation coefficient between BIS and the index of this method was 0.892 (P<0.001)
	GA	Monitoring the DoA	ML	Stud Health Technol Inform	Syed et al.	2021	XGBoost achieved AUROC of 0.762
	OR and ICU	Predicting adverse events	ML	NPJ Digit Med	Chen et al.	2021	PHASE performance expressed as average precision: hypoxemia (0.241), hypocapnia (0.300), hypotension (0.424), hypertension (0.161), phenylephrine (0.227), and epinephrine (0.129)
	OR	Predicting adverse events	ML	J Surg Res	Datta et al.	2020	ML models incorporating both preoperative and intraoperative data had better performance: postoperative complications and in-hospital mortality (accuracy: 88% vs. 77%; AUROC: 0.93 vs. 0.87; AUC-PR: 0.21 vs. 0.15). Overall reclassification improvement was 2.4–10.0% for complications and 11.2% for in-hospital mortality
	GA	Predicting anesthetic infusion events	ML	Sci Rep	Miyaguchi et al.	2021	Long short-term memory model when predicting the future increase in flow rate of remifentanil after 1 min, was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC
	GA	Predicting hypoxemia	ML	Nat Biomed Eng	Lundberg et al.	2018	Initial risk prediction: anesthesiologists AUC 0.60, with “Prescience” assistance AUC 0.76, “Prescience” alone AUC 0.83. Intraoperative real-time risk prediction: anesthesiologists AUC 0.66, with “Prescience” assistance AUC 0.78, “Prescience” alone AUC 0.81
	RA	Predicting hypotension	NN	BMC Anesthesiol	Gratz et al.	2020	NN approach AUC 0.89, discrete feature quantification approach AUC 0.87
	GA	Predicting hypotension	ML	PLoS One	Kang et al.	2020	Random-forest model showed the best performance AUROC 0.736–0.948; Naïve Bayes 0.65–0.898, logistic regression 0.630–0.881, artificial-neural-network 0.640–0.880
	PACU	Predicting hypotension	ML	Br J Anaesth	Palla et al.	2022	Hypotension prediction AUROC 0.81–0.83, average precision 0.38–0.42. Anesthesiologist performance improvement AUROC from 0.67 to 0.74
	GA	Predicting hypotension	ML	Br J Anaesth	Schenk et al.	2021	HPI guided care did not reduce the median duration of postoperative hypotension adjusted median difference, vs. standard of care: 0.118. HPI-guidance reduced the percentage of time with MAP <65 mmHg by 4.9%
	GA	Predicting hypotension	ML	JAMA	Wijnberge et al.	2020	The median difference time-weighted average of hypotension between the intervention group and the control group was 0.38 mmHg. The median difference time of hypotension was 16.7 min. In the intervention group, 0 serious adverse events resulting in death occurred vs. 2 (7%) in the control group
	GA	Predicting post-operative delirium	ML	CNS Neurosci Ther	Hu et al.	2022	Logistic regression model outperforms other classifier models AUC 0.804 and achieve the lowest Brier Score as well. Age (odds ratio: 1.054), extubation time (odds ratio: 1.027), ICU admission (odds ratio: 2.238), mini-mental state examination score (odds ratio: 0.929), Charlson comorbidity index (odds ratio: 1.197), and postoperative neutrophil-to-lymphocyte ratio (odds ratio: 1.029) were independent risk factors for postoperative delirium
	RA	US anatomical structure detection	AR	Ultrasound Med Biol	Ameri et al.	2019	Procedure success rate with the AR system 100%, US-only guidance 57%
	RA	US anatomical structure detection	NN	Int J Comput Assist Radiol Surg	Hetherington et al.	2017	The convolutional NN successfully discriminates US images achieving 88% 20-fold cross-validation accuracy
	RA	US anatomical structure detection	NN	IEEE Trans Med Imaging	Pesteie et al.	2018	3-D test data set: average lateral error (1 mm), average vertical error (0.4 mm). 2-D test data set: average lateral error (1.7 mm), average vertical error (0.8 mm)
Post-operative	PM	Managing postoperative pain	ML	Advances in Intelligent Systems and Computing	Gonzalez-Cava et al.	2017	In 81% of cases, ANI correctly predicted increase or decrease of drug
	PM	Managing postoperative pain in depressed patient	ML	PLoS One	Parthipan et al.	2019	Prediction of increase or decrease pain scores: discharge AUROC 0.87, 3-week follow-up AUROC 0.81, 8-week follow-up AUROC 0.69
	PACU	Predicting adverse events	ML	Comput Biol Med	Olsen et al.	2018	Algorithm detection ESODs: accuracy 92.2%, sensitivity 90.6%, specificity 93.0%, AUROC 96.9%, reduction in diagnostic time 26.4 min
	PM	Predicting pre-operative APS consultations	ML	Pain Med	Tighe et al.	2012	ML classifiers correctly predicted preoperative requests for APS consultations in 92.3% of all surgical cases. Bayesian methods yielded the highest AUROC 0.84–0.89 and lowest training times 0.0018 s
	PM	Predicting rebound pain after peripheral nerve block	ML	Br J Anaesth	Barry et al.	2021	Incidence of rebound pain was 49.6%. Factors independently associated with rebound pain: younger age (odds ratio: 0.98), female gender (odds ratio: 1.52), surgery involving bone (odds ratio: 1.82), and absence of perioperative i.v. dexamethasone (odds ratio: 1.78). Rates of patient satisfaction (83.2%) and return to daily activities (96.5%)
	MW	Predicting respiratory events	FL	J Clin Monit Comput	Ronen et al.	2017	IPI sensitivity 0.83–1.00 and specificity 0.96–0.74
OR management	OR	Predicting operating times	ML	Surg Endosc	Huang et al.	2017	Mean turnover time was 36 min, time from patient identification to procedure start was 11 min, time to bring a patient into the room after surgeon identification was 22 min on average
OR management	OR	Predicting operating times	ML	Can J Surg	Rozario et al.	2020	Reduction in nursing overtime of 21%, a theoretical cost savings of $469,000 over 3 years

OP, outpatient; ML, machine learning; AUROC, area under the receiver operating characteristics; AUC-PR, area under the precision-recall curve; DL, deep learning; NN, neural network; AUC, area under the curve; AKI, acute kidney injury; DVT, deep vein thrombosis; ASA PS, American Society of Anesthesiologists Physical Status; GA, general anesthesia; SVM, support vector machine; ANI, Analgesia Nociception Index; DoA, depth of anesthesia; BIS, bispectral index; OR, operating room; ICU, intensive care unit; RA, regional anesthesia; PACU, post anesthesia care unit; HPI, Hypotension Prediction Index; MAP, mean arterial pressure; US, ultrasound; AR, augmented reality; ESODs, early signs of deterioration; PM, pain management; APS, acute pain service; MW, medical ward; FL, fuzzy logic; IPI, Integrated Pulmonary Index.

AI in pre-operative anesthesia

Preoperative risk stratification is a fundamental moment for every anesthetist. In this context, the use of AI is achieving excellent results in this context. Among the most widely used scores is the American Society of Anesthesiologists Physical Status (ASA PS). This classification is subjective, requires manual clinician review to score, and has limited granularity. Zhang et al. (8) published an article with the aim of developing a system that automatically generates an ASA PS with finer granularity. Supervised ML methods were used to create a model which predicts a patient’s ASA PS on a continuous scale using the patient’s home medications and comorbidities. Three different types of predictive models were employed: regression models, ordinal models, and classification models. To assess model performance on continuous ASA PS, model rankings were compared to two anesthesiologists on a subset of ASA PS 3 case pairs. The results suggest that the random forest split classification model can predict ASA PS with agreement similar to that of anesthesiologists reported in literature and produce a continuous score in which an accurate agreement in judging granularity is fair to moderate. Authors concluded that the use of the continuous score may be able to aid anesthesiologists in identifying high risk patients who could benefit from additional preoperative assessment. The new technologies not only have the capability to improve existing scores, but they seem able to provide new highly personalized risk scores. One of the first is called “MySurgeryRisk” (9). It had its origin from the data of 51,457 surgical patients undergoing major inpatient surgery. The score is able to predict the risk for 8 postoperative complications [acute kidney injury (AKI), sepsis, venous thromboembolism, intensive care unit (ICU) admission >48 h, mechanical ventilation >48 h, wound, neurologic, and cardiovascular complications], with area under the curve (AUC) values ranging between 0.82 and 0.94, and the risk for death at 1, 3, 6, 12, and 24 months, with AUC values between 0.77 and 0.83. Furthermore, the “MySurgeryRisk” algorithm has been shown to have superior performance when compared to that of 20 physicians (10). Another excellent example is provided by the group of Xue et al. (11). The authors developed ML models capable of identifying the risk for 5 postoperative complications (AKI, delirium, deep vein thrombosis, pulmonary embolism and pneumonia) capable of exploiting only intraoperative data, only preoperative or combined data. Another essential aspect of the preoperative evaluation is the airways assessment. Various scores have been proposed in literature over time. However, in adults with apparently no anatomical airway abnormalities, who represent most of the patients we deal with, these tests are not so effective. From a specific review published in 2018, it emerged that all investigated index tests, although having a good specificity, had a relatively low sensitivity (12). The use of AI could be useful also in this area. Tavolara et al. (13), starting from frontal facial images, developed a DL model capable of identifying difficult to intubate patients, with performances superior to two conventional tests, Mallampati test and thyromental distance. In addition, the model can work at high sensitivity and low specificity (0.9079 and 0.4474) or low sensitivity and high specificity (0.3684 and 0.9605), exceeding the limits of low sensitivity of current tests. Kim et al. (14) proposed a predictive model of difficult laryngoscopy, defined as Grade 3 and 4 by the Cormack-Lehane classification. In this monocentric study, Balanced Random Forest (BRF) algorithm showed the best performance with area under the receiver operating characteristics (AUROC) of 0.79 (0.72–0.86). Furthermore, in this case too, models with high sensitivity (90% for BRF) and models with high specificity and accuracy, respectively 91% and 83%, if we consider light gradient boosting machines (LGBM), have been identified. Finally, Hayasaka et al. (15) developed a convolutional neural network (CNN) algorithm capable of evaluating the difficulty of intubation with an excellent AUC of 0.864 just by evaluating patients’ facial pictures, making it a promising tool for predicting these casualties in advice.

AI in intra-operative anesthesia

In literature, the phase of perioperative path that has aroused the most interest in the use of AI, together with the pre-operative one, is certainly the intra-operative. Several tasks have been carried out with good results by AI, with particular reference to: anatomical structures identification during regional anesthesia, sedation management, depth of anesthesia (DoA) monitoring, automating drugs administration, intraoperative pain management, prediction of adverse event, such as hypotension and hypoxemia, and postoperative complications risk prediction. Regarding anatomical structures identification, image guided procedures become a standard of diagnose and treatment in many medical aspects. Of all imaging modalities, ultrasound (US) is ubiquitously used due to its real-time, low-cost, and radiation-free capabilities. Anesthesiologists largely use US to perform safely and efficiently regional anesthesia. However, nerve tracking and accurate needle localization remains an ongoing challenging task due to the noise, artifacts, and anatomic structure variability (16,17). One study investigated the application of augmented reality (AR) to detect anatomical landmarks during simulated epidural anesthesia; the US transducer and the needle were viewed in a 3D-augmented environment, and the epidural space was identified using a single-element transducer at the needle tip. All attempts were successful in a phantom compared with only 50% of attempts using US alone (18). Pesteie et al. (19) used convolutional NN to automate identification of the anterior base of the vertebral lamina, whereas Hetherington et al. (20) used convolutional NN to automatically identify the sacrum and the L1-L5 vertebrae and vertebral spaces from US images in real time with up to 95% accuracy. One of the greatest difficulties for anesthesiologists lies in performing subarachnoid or epidural anesthesia in obese patients, particularly in obese pregnant women, where pregnancy-induced changes in the spine further reduce the chances of success. In this context In Chan et al. (21) developed a ML algorithm in order to determine the needle insertion point using automated spinal landmark US imaging of the lumbar spine; their results were quite impressive with a first-attempt success rate of around eighty percent. The management of patients undergoing gastroenterological procedures often require sedations to improve patient comfort and facilitate endoscopic performance. In 2021 Syed et al. (22) created a ML model (XGBoost) that predicts the grade of sedation required to successfully conduct a colonoscopy with an AUC of 0.762 after being tested on tested on 10,025 colonoscopies. Many other surgical procedures require general anesthesia providing patients with absence of consciousness, analgesia, and relaxation. Titrating hypnotic drugs prevent over- and under-sedation avoiding unwanted intraoperative awareness or excessive hemodynamic instability. An accurate DoA monitoring reduces mortality, morbidities, and postoperative recovery. Unfortunately, the hypnotic dose administered has not a linear relationship with DoA, including both volatile and intravenous anesthetics (23). The field of ML offers many different algorithms that could be used to build a reliable index to monitor the DoA (24). In this context Afshar et al. (25) proposed a combinatorial DL structure involving CNN, bidirectional long short-term memory (LSTM), and an attention layer. The proposed model uses the EEG signal to continuously predicts the bispectral index (BIS) (Medtronic, Minneapolis, MN, USA). It is trained over a large dataset, mostly from patients under general anesthesia with few cases receiving sedation/analgesia and spinal anesthesia. The resulting DoA values are discretized into four levels of anesthesia and the results demonstrated strong inter-subject classification accuracy of 88.7%. Similarly, in this study (26), authors proposed a method that combines multiple EEG-based features with ANN to assess the DoA. The correlation coefficient between BIS and the index of this method was 0.892 (P<0.001). The results showed that the proposed method could well distinguish between awake and other anesthesia states. Closed-loop control of anesthesia involves continual adjustment of drug infusion rates according to measured clinical effect. In real surgical situations environmental, however, many interferences can affect the reliability of the BIS signal, with potential total intravenous anesthesia (TIVA) complications because of the discrepancy between predicted effect-site concentration and measured BIS index ongoing propofol and remifentanil infusion (27). Considering this aspect, it could be useful to have other tools capable of providing useful information during drug administration. In this context, West et al. (28) utilized NeuroSENSE monitor which provides an electroencephalographic measure of depth of hypnosis [wavelet-based anesthetic value for central nervous system (WAVCNS) monitoring] to evaluate the feasibility of a closed-loop system for robust control of propofol and remifentanil infusions using WAVCNS feedback. Results demonstrated that this controller design offers a robust method to optimize the control of 2 drugs using a single sensor, but further research is required to determine the optimal constraints for these safe conditions. Moreover, Miyaguchi et al. (29) recently compared the performance of six ML methods [logistic regression, support vector machine (SVM), random forest, light gradient boosting machine (LGBM), ANN, and LSTM] in predicting remifentanil increase events. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; for the authors, these results demonstrated the future potential to predict the decisions made by anesthesiologists using ML. Regarding intraoperative pain management, quantifying the nociception level of the patients and adjusting analgesic drug infusion during anesthesia is still challenging. To this end, ML algorithms could be used to build index helping anesthesiologist to manage intraoperative pain as made by Gonzalez-Cava et al. (30) in this paper. They evaluate the Analgesia Nociception Index (ANI) as a guidance variable for opioid infusion rate modulation. The ANI monitor makes a Heart Rate Variability (HRV) analysis to measure the effect of the Respiratory Sinus Arrhythmia (RSA). ANI value together with the hemodynamic information outperformed non-specific traditional signs such as heart rate and blood pressure in order to quantify the nociception level and may anticipate a dose change to prevent hemodynamic events before they happen (30). AI is the basis of new clinical tools for predicting intraoperative adverse events. One of the most common is intraoperative hypotension, which is associated with increased morbidity and mortality. For these reasons predicting intraoperative blood pressure patterns has been a recent target of ML approaches in the intraoperative setting (31). In a South Korean study, authors found that ML models are able to predict hypotension occurring during the period between tracheal intubation and incision. In particular, the random forest model showed the best performance, with an AUC of 0.842 (32). In another study authors compared the capabilities of a single hidden layer NN of 12 nodes to those of a discrete-feature discrimination approach in predicting significant hypotension under spinal anesthesia during cesarean section (33). The results presented suggested that a NN approach may be superior to a discrete feature quantification approach. Moreover, a preliminary unblinded randomized clinical trial performed in a tertiary center in Amsterdam, called Hypotension Prediction (HYPE) trial, tested a ML-derived early warning system, the Hypotension Prediction Index (HPI), to predict hypotension shortly before it occurs (34). Patients were randomly assigned to receive either the early warning system or standard care, with a goal mean arterial pressure (MAP) of at least 65 mmHg in both groups. The median time of hypotension per patient was significantly shorter in the intervention group than in the control group reducing the depth and duration of intraoperative hypotension, without excess use of intravenous fluid, vasopressor, and/or inotropic therapies. However, in a sub-study of the HYPE study, HPI-guided care did not reduce the median duration of postoperative hypotension (35). For these reasons, we believe that further studies are required to understand the real usefulness in daily clinical practice and their impact in postoperative outcomes. Similarly, an ensemble-model-based ML tool, named “Prescience”, can be able to assist anesthesiologist in predicting intraoperative hypoxemia during anesthesia, and it was able to delineate the risk factors that contributed to the prediction (36). However, when provided with information generated by “Prescience”, anesthesiologists were able to significantly improve their ability to predict intraoperative hypoxemia. This represents an example where the complementary relationship between humans and machines can outperform either one alone. More recently Chen et al. (37) tested a transferable embedding method (i.e., a method to transform time series signals into input features for predictive ML models) named PHASE (PHysiologicAl Signal Embeddings) with a large amount of dataset from ORs and ICU. Results indicated that PHASE outperforms other state-of-the-art approaches in predicting six distinct outcomes: hypoxemia, hypocapnia, hypotension, hypertension, phenylephrine, and epinephrine (37). The integration of preoperative and intraoperative data to improve risk prediction is becoming material for debate in literature. It is certainly logical to assume that the exploitation of information from the intraoperative phase can lead to an improvement in knowledge of the perioperative period. This concept emerges from several studies, with the exception of a very recent article in which the addition of intraoperative data did not increase the performance of the model in predicting mortality after intra-abdominal surgery (38). On the contrary, in several other papers, the exploitation of intraoperative data provided relevant clinical information, as in the case of “MySurgeryRisk PostOp Extension” (39). In another recent article, Palla et al. (40) presented a model capable of predicting hypotension in post anesthesia care unit (PACU) using preoperative and intraoperative data of 88,446 surgical patients, with an AUROC of 0.82. The integration between the two data timings was also exploited to create effective automated ML prediction models of postoperative delirium (41). Different, but equally effective, example are the models of Xue et al. (11), previously cited. Their ability to employ data from different phases of surgery makes these models particularly adaptable to several clinical situations, from elective surgery, where all the data are available, to emergency surgery, where the only data available are the intraoperative ones. It is evident how precision medicine, provided by the use of intelligent clinical tools, moves away from the classic concept of patient tailoring, but also it manages to include the concept of diversity of clinical conditions.

AI in post-operative anesthesia

In the field of pain medicine, AI is turning out to be a valuable ally (42,43). Thanks to its complex analyzes, a better understanding of pain pathophysiology is becoming possible. Gonzalez-Cava et al. (44) used ML to analyze differences in functional magnetic resonance imaging data collected from human volunteers who were exposed to painful and nonpainful thermal stimuli. They demonstrated that ML analysis of whole brain scans succeeded in accurately identify pain than analysis of individual brain regions traditionally associated with nociception. Good results are also being achieved in the postoperative pain, probably in consideration of the complexity of the variables that are responsible for postoperative pain development, both in terms of numbers and relationships between them. Barry et al. (45), analyzing the factors associated with rebound pain after peripheral nerve block, showed how the ML technique, in particular the ‘logistic model tree attribute-selected classifier’, proved to have the best performance compared to other analyzes, in particular compared to multivariate logistic regression model, including new variables not previously considered. Parthipan et al. (46), instead, used ML techniques to better understand the relationships between postoperative pain and depression. They reached the conclusions that thanks to the exploitation these new analytical techniques, for the first breakthrough, the effect of the known ability of selective serotonin reuptake inhibitors (SSRIs) to inhibit prodrug opioid effectiveness on the worse pain control has been demonstrated. ML has not only proved useful in pain risk prediction, but also in supporting clinical decisions regarding acute pain service (APS). In the study by Tighe et al. (47), ML-classifiers predicted successfully a preoperative APS consultation in 92.5% of surgical cases. Another phase characterized by continuous monitoring and therefore capable of providing large amounts of important data is the PACU admission. It represents another delicate phase for the surgical patient, in which careful monitoring is maintained with the aim of identifying early complications. Olsen et al. (48) presented a predictive algorithm for detecting early signs of deterioration (ESODs) in the PACU; this system has had excellent results, being able to identify ESODs with an accuracy of 92.2%, associated with an important reduction in false alarms and missed ESODs. Unfortunately, however, data collected here have not yet been widely exploited and this represents one of the very few studies on the subject. Another important PACU evaluation to be performed is the assessment of post-surgical in-hospital mortality. In this context Lee et al. (49) have developed a generalized additive model with neural networks (GAM-NNs) capable of predicting mortality in patients undergoing general anesthesia with a high AUC, with numerous advantages over simple models like LR used in previous studies; for example is able to learn nonlinear patterns in the data, which is more clinically intuitive, and it can be interpreted easily with a notable AUC of 0.921. Acute renal failure after liver transplantation is a serious complication that frequently afflicts these patients in the postoperative period: with this in mind a retrospective single-center study set itself the goal of creating a risk predictor tool based on ML; the results shown are promising, with about 55% of cases predicted correctly, although the number of cases itself was not very high and further multicenter studies are, in our opinion, necessary before implementing such algorithms in clinical practice (50). Remaining in the field of renal failure, in a recent article the authors used a random forest model to evaluate the postoperative complications of patients with end-stage renal failure, identifying some of the most relevant impacting factors such as anesthesia time, operation time, crystal and colloid use. The model reached an F1 score of 0.797 ensuring good reliability in predictions making it a feasible guide for doctors in therapeutic choices for these patients (51). Respiratory status is a cornerstone of patient management. Continuous respiratory monitoring using both capnography (etCO2—end tidal CO2; RR—respiration rate) and pulse oximetry (SpO2—arterial oxygen saturation; PR—pulse rate) can reduce the number of severe respiratory events. Using a mathematical algorithm based on FL inference model is possible to combine these ventilatory and respiratory parameters into a single value. The Integrated Pulmonary Index (IPI) demonstrates high levels of sensitivity in order to recognize significant and severe respiratory events. The high specificity of the IPI prevent caregivers’ desensitization to the alarm sounds and phenomenon called ‘alarm fatigue’ (52). What certainly emerges is that for both the preoperative, intra and postoperative phases, AI is currently able to perform tasks, even very complex ones, with excellent results, providing the ability to build intelligent clinical decision-making tools. New technologies are allowing us to enter a new era of anesthesia, which we call Anesthesia 4.0. For many similarities that anesthesia shares with other professions, several concepts derived from other disciplines have always been exploited. As in industry, even the anesthesiologists are undergoing a real technological breakthrough. Industry 4.0 is not considered only an investment in new technology and tools to improve manufacturing efficiency, but is rather about subverting the way the entire business organization thinks and operates; it is primary a cultural revolution, not only a technological one. In our case too, a change in anesthesiology thinking must take place. Similarly, we can say we are facing a new phase, the Anesthesia 4.0, determinable as propensity of today’s anesthesiology to insert smart and autonomous systems fueled by solid big data system to improve quality, safety and efficiency (). Just think of the reactive maneuvers that have always characterized risk management in anesthesia; in a near future, they will be outclassed by clinical decision support systems based on AI models designed to intervene before the adverse event occurs, rather than once it is already happened (53). For these reasons, we propose that AI should become an essential technical and non-technical skill for the future anesthesiologists, in order to keep up with this current technological and cultural revolution.

Figure 1

Parallelism between the industrial revolution and the anesthesiological one. From numbers 1 to 4 all the stages in progression are identified. It should be noted that in the fourth revolution, both disciplines are characterized by the use of intelligent tools.

OR management

It is well known that surgery is one of the most expensive items for any hospital. Being able to make the most of available resources and spaces is a fundamental objective in the management of ORs. However, optimization does not only include improving the economic aspect, but above all implies the safety and quality of the work performed. For example, the cancellation of a surgery due to a mistake in surgical procedures scheduling not only results in a waste of economic resources, but also in a postponement of the surgery that could compromise the safety of that specific patient. Furthermore, optimizing the available resources means increasing the quality of the care provided, always getting the most out of the resources available at that specific historical moment. However, the management of an operating department is far from simple. It involves multidisciplinary management of healthcare professionals and instrumentation, coupled with a certain degree of unpredictability typical of medicine. It is for this reason that with the systems and logics that are currently used, the results obtained are not always optimal. The use of AI would seem to be able to provide valuable help. In a specific review it emerged that the use of ML appears useful for carrying out three important tasks: surgical cases cancellation identification, occupation of the PACU and estimation of surgical case duration (54). Rozario et al. (55) have shown how an high-level Python programming language combined with the open source OR-Tools software suite from Google AI, in era of COVID-19 where the resources available for surgery are even less, are able to accurately predict the operational booking times. The same authors also demonstrate the potential economic impact that could be obtained in case of use of these technologies, compared to the method conventionally and currently used. These potentialities of the AI application in OR management are able to be even more amplified when combined with other technologies. A major current limitation appears to be the manual introduction of data. This practice is not only prone to errors when entering information, but often involves a physiological delay in the timing introduction. On the other hand, automatic timing systems could provide precise and accurate information, as well as potentially available in real time. The combined use of new technologies, as ML, intelligent sensors and tracking systems, could therefore have a further significant impact on both patient quality and safety (56) (). Huang et al. (57), already in 2017, presented their SmartOR, i.e., a sensor network capable of identifying operating times independently. Our research group also dealt with the problem. We have an ongoing study called BLOC-OP (NCT05106621). This is a study that associates an indoor tracking system with ML analysis. The first has the purpose of making an automatic detection of the times; the architecture includes a Bluetooth low energy (BLE) tags indoor-localization via Raspberry Pi v4 module with relative antennas that communicate each other through a reserved Local Area Network (LAN) (). The ML algorithms analysis, on the other hand, has the purpose of making accurate predictions of the operating times, based on surgical and anesthetic information, which will be the basis for an intelligent scheduling model.

Figure 2

Figure 3

Logical architecture diagram of the BLOC-OP study. BLE sensors worn by patient are detected using Raspberry Pi v4 modules, positioned in each OR and recovery room. All data flows into a single server that will be used to create an intelligent scheduling model of surgical procedures using AI techniques. BLE, Bluetooth low energy; AI, artificial intelligence.

The new technologies in OR management, in addition to being able to optimize resources from an economic point of view, are able to improve both the quality and the safety of the services provided. PACU, post anesthesia care unit; OR, operating room. Logical architecture diagram of the BLOC-OP study. BLE sensors worn by patient are detected using Raspberry Pi v4 modules, positioned in each OR and recovery room. All data flows into a single server that will be used to create an intelligent scheduling model of surgical procedures using AI techniques. BLE, Bluetooth low energy; AI, artificial intelligence.

Limitations of AI and future directions

AI in medicine is not free from perplexities and limitations. Recently, Jotterand and Bosco have defined it as a sword of Damocles (58); if on the one hand it manages to overcome some current human limits, on the other hand it could manipulate human nature. Data ethics is the foundation of AI and its key areas include informed consent, privacy and data protection, ownership, objectivity and transparency. The legislature is adapting to the new requirements imposed by new technologies and scrupulously adhering to is essential. AI is a tool that must be deployed in the right situation to answer an appropriate question or solve an applicable problem; data to be used must therefore be strictly connected to this specific purpose (2). Data quality is another fundamental requirement for building accurate and trustworthy AI algorithms, but they are also susceptible to bias (gender, sexual orientation, race etc.) (2,59). AI’s task should primarily be to abolish these inequities and not exacerbate them. It is therefore essential to apply all possible methods to eliminate the bias and cancel the differences between governments. On this purpose, multidisciplinary collaborations around AI and ML technologies should be encouraged. In order to facilitate shared work, international laws and politics should also be adopted (60). Not only the quality of the data is important, but also their coding. Often, when big data systems are available, in order to be able to provide useful information from them, it is necessary to encode them. This is a crucial step which, in our opinion, must be carried out by a multidisciplinary team, composed of both healthcare professionals and data scientists. The COVID-19 pandemic has taught us this concept well (61). Having a lot of data does not mean neither having quality nor knowing how to use them correctly. The data scientist currently plays a crucial role. His skills are not only fundamental for the phase described above, but in every step of a project in which Big Data are used. Van Poucke was very successful in defining these phases: problem definition, hypothesis generation, data collection/extraction, model building, model implementation (62). As pointed out by the author, right from the initial problem definition stage, it is important to involve the data scientist in order to succeed in translating a clinical problem into a data problem. In addition to the challenges mentioned, it is important that all these models are translated into useful tools for daily clinical practice. To make this happen, it is imperative that a number of conditions are met (63). The model must be coherently validated externally by means of serious prospective validation studies and associated with an easily usable tool. In addition to this, healthcare professionals must, already today, begin to be trained in this field (64). Without proper training, what could come is a rapid halt in the use of new technologies in medicine. Surely, what we have presented is a review concerning the application of AI only in anesthesia, but in our discipline all branches, Intensive Care, Pain Medicine and also the Emergency Medical Services, have been invested (). New technologies, and in particular advanced simulation techniques, telemedicine and obviously AI are profoundly changing the discipline, in all its aspects, from the clinic, to research, to organization and medical education. Thanks to the interaction of technologies, it is possible to have a not summative effect, but a synergistic one. Think, for example, of what the application of intelligent real-time alarm systems associated with telemedicine techniques applied in postoperative monitoring might entail. It wasn’t long ago that the FDA approved the use of the first AI software system in medicine, a system capable of analyzing ocular fundus images to help doctors diagnose diabetic retinopathy (65), which to date FDA approved AI/ML medical devices have already risen to 343 (66), of these, a software called “Nervetrack” (Samsung Medison, Seoul, South Korea and Intel Corp., Santa Clara, CA, USA) recently garnered USA FDA clearance (67); this is an example of AI applied to US in order to recognize deep structures during peripheral nerve blocks in anesthesia currently available in clinical practice.

Figure 4

Graphic representation of the fields of anesthesiology affected by new technologies. Advanced simulation techniques, telemedicine and AI are the main culprits of the current technological revolution of anesthesia. AI, artificial intelligence. However, it is important always to remember that AI algorithms will never be able to surpass human performance. Although algorithms may one day exceed human capabilities in integrating complex, gigantic, structured datasets, much of the data that clinicians gather from patients comes from the clinician-patient relationship that is established when patients bestow trust on their doctor. A machine, however sophisticated it may be, can never replace the holistic vision of the patient, which only the healthcare professional is able to have. On the contrary, AI must be exploited as a tool with specific purposes, leaving the doctor the possibility to devote himself more to the human component that distinguishes the doctor-patient relationship. Therefore, qualitative research will be needed to better understand the ethical, cultural, and societal implications of integrating AI into clinical workflows (2). The article’s supplementary files as

61 in total

1. Machine learning in anaesthesia: reactive, proactive… predictive!

Authors: Pedro L Gambus; Sebastian Jaramillo
Journal: Br J Anaesth Date: 2019-08-20 Impact factor: 9.166

Review 2. eDoctor: machine learning and the future of medicine.

Authors: G S Handelman; H K Kok; R V Chandra; A H Razavi; M J Lee; H Asadi
Journal: J Intern Med Date: 2018-09-03 Impact factor: 8.989

3. Factors associated with rebound pain after peripheral nerve block for ambulatory surgery.

Authors: Garrett S Barry; Jonathan G Bailey; Joel Sardinha; Paul Brousseau; Vishal Uppal
Journal: Br J Anaesth Date: 2020-12-31 Impact factor: 9.166

4. Effect of a Machine Learning-Derived Early Warning System for Intraoperative Hypotension vs Standard Care on Depth and Duration of Intraoperative Hypotension During Elective Noncardiac Surgery: The HYPE Randomized Clinical Trial.

Authors: Marije Wijnberge; Bart F Geerts; Liselotte Hol; Nikki Lemmers; Marijn P Mulder; Patrick Berge; Jimmy Schenk; Lotte E Terwindt; Markus W Hollmann; Alexander P Vlaar; Denise P Veelo
Journal: JAMA Date: 2020-03-17 Impact factor: 56.272

5. MySurgeryRisk: Development and Validation of a Machine-learning Risk Algorithm for Major Complications and Death After Surgery.

Authors: Azra Bihorac; Tezcan Ozrazgat-Baslanti; Ashkan Ebadi; Amir Motaei; Mohcine Madkour; Panagote M Pardalos; Gloria Lipori; William R Hogan; Philip A Efron; Frederick Moore; Lyle L Moldawer; Daisy Zhe Wang; Charles E Hobson; Parisa Rashidi; Xiaolin Li; Petar Momcilovic
Journal: Ann Surg Date: 2019-04 Impact factor: 12.969

10. Automated machine learning-based model predicts postoperative delirium using readily extractable perioperative collected electronic data.

Authors: Xiao-Yi Hu; He Liu; Xue Zhao; Xun Sun; Jian Zhou; Xing Gao; Hui-Lian Guan; Yang Zhou; Qiu Zhao; Yuan Han; Jun-Li Cao
Journal: CNS Neurosci Ther Date: 2021-11-18 Impact factor: 5.243