Literature DB >> 35227128

Artificial Learning and Machine Learning Applications in Spine Surgery: A Systematic Review.

Cesar D Lopez1, Venkat Boddapati1, Joseph M Lombardi1, Nathan J Lee1, Justin Mathew1, Nicholas C Danford1, Rajiv R Iyer1, Marc D Dyrszka1, Zeeshan M Sardar1, Lawrence G Lenke1, Ronald A Lehman1.   

Abstract

OBJECTIVES: This current systematic review sought to identify and evaluate all current research-based spine surgery applications of AI/ML in optimizing preoperative patient selection, as well as predicting and managing postoperative outcomes and complications.
METHODS: A comprehensive search of publications was conducted through the EMBASE, Medline, and PubMed databases using relevant keywords to maximize the sensitivity of the search. No limits were placed on level of evidence or timing of the study. Findings were reported according to the PRISMA guidelines.
RESULTS: After application of inclusion and exclusion criteria, 41 studies were included in this review. Bayesian networks had the highest average AUC (.80), and neural networks had the best accuracy (83.0%), sensitivity (81.5%), and specificity (71.8%). Preoperative planning/cost prediction models (.89,82.2%) and discharge/length of stay models (.80,78.0%) each reported significantly higher average AUC and accuracy compared to readmissions/reoperation prediction models (.67,70.2%) (P < .001, P = .005, respectively). Model performance also significantly varied across postoperative management applications for average AUC and accuracy values (P < .001, P < .027, respectively).
CONCLUSIONS: Generally, authors of the reviewed studies concluded that AI/ML offers a potentially beneficial tool for providers to optimize patient care and improve cost-efficiency. More specifically, AI/ML models performed best, on average, when optimizing preoperative patient selection and planning and predicting costs, hospital discharge, and length of stay. However, models were not as accurate in predicting postoperative complications, adverse events, and readmissions and reoperations. An understanding of AI/ML-based applications is becoming increasingly important, particularly in spine surgery, as the volume of reported literature, technology accessibility, and clinical applications continue to rapidly expand.

Entities:  

Keywords:  artificial intelligence; deep learning; machine learning; orthopedic surgery; predictive modeling; spine surgery

Year:  2022        PMID: 35227128      PMCID: PMC9393994          DOI: 10.1177/21925682211049164

Source DB:  PubMed          Journal:  Global Spine J        ISSN: 2192-5682


Introduction

Machine learning (ML) is increasingly reported on in health care, including orthopedics, especially for its applications in predictive analytics. ML is a form of artificial intelligence (AI) that employs the use of algorithms and mathematical models that can learn from data, identify patterns and complex relationships, and make automated decisions—oftentimes with minimal human intervention.[1,2] These algorithms are able to find patterns in the data and apply those patterns to new challenges in the future. Algorithms include artificial neural networks (ANN), decision trees (DT), boosting/ensemble learning models (BEL), Bayesian networks (BN), logistic regression (LR), and support vector machines (SVM). Neural networks are modeled on neurons in the brain, and they use artificial intelligence to untangle and break down extremely complex relationships. Across various medical specialties, AI/ML has been shown to be beneficial in guiding clinical decision-making, and artificial neural networks used as outcome prediction models have been applied in diagnosing various medical conditions.[1-4] Within orthopedics, demonstrated applications of AI/ML include surgical risk stratification and optimization, clinical outcome prediction and diagnostics, cost-efficiency analyses, and in total joint arthroplasty literature it has been used for proposed risk-adjusted insurance reimbursement models. Spine surgery, in particular, is a field that involves high-risk procedures and is continually seeking to improve surgical planning, outcomes, and to reduce complications. With its powerful predictive capabilities, AI/ML has the potential to be used in new and innovative applications that may improve the safety of spine surgery and improve outcomes. The use of AI/ML is rapidly expanding in health care and has the potential to improve surgical care and reduce costs, especially for high-cost and complex spine surgery procedures. As such, it is important for spine surgeons to better understand the current applications of AI/ML, especially in light of the burgeoning literature regarding this topic in recent years. The purpose of this review is to identify and evaluate all current research-based spine surgery applications of AI/ML, namely, in optimizing preoperative patient selection, as well as predicting and managing postoperative outcomes and complications.

Materials and Methods

Search Strategy

A comprehensive search of publications, up to February 2020, was conducted using the EMBASE, Medline, and PubMed databases in accordance with PRISMA guidelines. Sample search query keywords and MeSH terms are provided in Supplementary Table 1. Screening of reference lists of retrieved articles also yielded additional studies.

Eligibility Criteria

Inclusion criteria consisted of original clinical studies, including studies which evaluate spine surgery applications of AI/ML in guiding clinical decision-making. Exclusion criteria consisted of studies that did not evaluate spine surgery applications of AI/ML, studies involving oncologic spine surgery or infectious etiologies, studies involving applications for design and development of hardware or implants, medical imaging analysis studies without explicit reference or application to spine surgery, studies with non-human subjects, non-English-language studies, inaccessible articles, conference abstracts, reviews, and editorials. No limits were placed on level of evidence or timing of the study since the majority of the reviewed studies were published within the last 10 years.

Study Selection

Article titles and abstracts were screened initially by two reviewers, and full-text articles were subsequently screened based on the selection criteria. The studies were rated by their level of evidence, based on the Oxford Centre for Evidence-based Medicine Levels of Evidence. Two authors reviewed each individual article that was included. Discrepancies in inclusion studies were discussed and resolved by consensus.

Data Extraction and Categorization

A database was generated from all included studies which consisted of the journal of publication, publication year, country of origin, study design, level of evidence, study duration, blinding of the study, number of involved institutions, AI/ML methods and clinical applications, surgical domain, data sources, input variables and output variables, sample size, average patient age, percent female patients, and any additional pertinent findings from the study. The reviewed articles were sorted into different, non-mutually exclusive categories based on AI/ML clinical application. AI/ML clinical applications were divided into two major groups:(1) administrative and clinical decision support and (2) postoperative prediction and management of complications and outcomes. The former group contained the following prediction and optimization sub-categories: preoperative planning and cost prediction, hospital discharge and length of stay (LOS), readmissions, and reoperations. The other group included postoperative cardiovascular complications, other complications, mortality, and functional and clinical outcomes.

Data Analysis

Descriptive statistics were employed to summarize important findings and results from the selected articles and to describe trends in AI/ML techniques, clinical applications, and relevant findings associated with its use. Summary data were presented using simple means, frequencies, standard deviations (for normally distributed data on a decimal scale), and proportions. AI/ML model performance within the reviewed studies were summarized using various metrics, including the area under the curve (AUC) of receiver operating characteristic (ROC) curves, accuracy (%), sensitivity (%), and specificity (%). AUC is a measure of a ML model’s discriminative ability (i.e., accurately predicting true positives and negatives while identifying false positive or negative cases).[9,10] AUC values range from .50 to 1 and measure a prediction models’ discriminative ability, with a higher AUC value signifying better predictive ability of the model correctly placing a patient into an outcome category. A model with an AUC of 1.0 is a perfect discriminator, .90 to .99 is considered excellent, .80 to .89 is good, .70 to .79 is fair, and .51 to .69 is considered poor. AUC measures a model’s discriminative ability in accurately selecting true positives and negatives, while minimizing false positives and false negatives. Accuracy is simply a measure of a model’s ability to correctly predict true positives and true negatives, without accounting for identifying false positives/negatives. Reported model performance metrics for each AI/ML algorithm type and for each clinical application category were aggregated across the reviewed studies. A formal bias assessment for each study was preformed based on the Cochrane Handbook for Systematic Reviews methodology (Supplementary Table 1). One-way ANOVA with post hoc Tukey tests were performed, with statistical significance set at P < .05. All statistical analysis was performed using Stata (version 16.1, Stata Corporation–College Station, Texas, USA).

Results

Search Results and Study Selection

Using our pre-defined search terms resulted in 335 articles, of which 67 duplicate articles were removed. The remaining 268 articles were screened by title and abstract according to inclusion and exclusion criteria. Ultimately, there were 44 articles included for full review, of which 41 met full inclusion and exclusion criteria. (Figure 1) Over 83% of studies had level of evidence III, and the median number of patients in each study was 964 (mean 2784, standard deviation [SD] 3122). Although there were no limitations on publication dates in the selection process, the majority of studies (77.5%) were published during the last 2 years (2018–2020) (Figure 2) AUC was the most frequently reported performance metric, appearing in 37 out of the 41 total reviewed studies (90.2%). In comparison, accuracy was reported less frequently (16 studies, 39.0%), as were sensitivity and specificity (11 studies, or 26.8%).
Figure 1.

PRISMA flowchart showing systematic review search strategy.

Figure 2.

Trends in the annual number of AI/ML publications in spine surgery (2011 to 2020).

PRISMA flowchart showing systematic review search strategy. Trends in the annual number of AI/ML publications in spine surgery (2011 to 2020).

Administrative and Clinical Decision Support Applications

A total of 18 reviewed studies (43.9%) evaluated the use of AI/ML applications in optimizing preoperative patient selection or projecting surgical costs, through prediction of hospital length of stay, discharges, readmissions, and other cost-contributing factors (Table 1, Supplementary Table 3). Eleven studies (26.8%) evaluated AI/ML applications in accurately predicting patient reoperations, operating time, hospital length of stay, discharges, readmissions, or surgical and inpatient costs.[13-23] Four studies (9.8%) used patients’ preoperative risk factors and other patient-specific variables to optimize the patient selection and surgical planning process through the use of AI/ML-based predictions of surgical outcomes and postoperative complications.[24-27] Four studies (9.8%) investigated the use of AI/ML in improving preoperative planning, through accurate identification of previously implanted anterior cervical spinal implants and a decision support system for spine fusion surgery that enhances surgical planning through prediction of pedicle screw pullout strength for any combination of patient-specific factors.[24,28-30] The authors of 17 studies mentioned the potential of AI/ML applications in bringing down the costs of spine surgery and optimizing cost-efficient, value-based care delivery.
Table 1.

Reviewed Studies of Preoperative Patient Selection and Planning in Spine Surgery.

Author, yearPathology or procedureML algorithmsPrediction outputsNumber of patientsAvg. age% femaleData source
Kalagara, 2018Lumbar laminectomyBoosting/ensemble learningReadmissions/reoperations403063--ACS-NSQIP database
Stopa, 2019Spine fusionDeep Learning/ANNDischarge/LOS1445045.10Single center
Ames, 2019ASDCluster analysisPre-op selection/planning57056.878.80Multicenter ASD databases
Goyal, 2019Spine fusionRegression analysis, boosting/ensemble learning, deep Learning/ANN, decision tree, and Bayesian networksDischarge/LOS and readmissions/reoperations88725748.50ACS-NSQIP database
Ogink, 2019SpondylolisthesisDeep learning/ANN, SVM, decision tree, and Bayesian networksDischarge/LOS18686363.00ACS-NSQIP database
Kuo, 2018Spinal fusionRegression analysis, SVM, decision tree, and Bayesian networksCost prediction53262.458.60Single center
Lerner, 2019Posterior lumbar spinal fusionCluster analysisPre-op selection/planning1877051.356.10IBM MarketScan® commercial database
Siccoli, 2019Lumbar decompressionBoosting/ensemble learning and decision treeDischarge/LOS and readmissions/reoperations6356248.00Single center
Chia, 2017Cerebral palsyDeep learning/ANNPre-op selection/planning242----Single center
Huang, 2019ACDFBayesian networks, SVM, and regression analysisPre-op selection/planning321----Single center
Varghese, 2018Spinal fusionDecision tree and regression analysisPre-op selection/planning------Single center
Karhade, 2018Lumbar degenerationDeep learning/ANN, decision tree, SVM, and Bayesian networksDischarge/LOS52735346.90ACS-NSQIP database
Hopkins, 2019Posterior lumbar spinal fusionDeep learning/ANNReadmissions/reoperations5816----ACS-NSQIP database
Ogink, 2019Lumbar spinal stenosisDeep Learning/ANN, decision tree, SVM, and Bayesian networksDischarge/LOS93386747.30ACS-NSQIP database
Karnuta, 2019Spinal fusionBayesian networksDischarge/LOS and cost prediction3807--57.80New York state SPARCS database
Khatri, 2019Spinal fusionDecision treePre-op selection/planning------Single center
Bekelis, 2014ACDFRegression analysisDischarge/LOS and readmissions/reoperations273255.746.30ACS-NSQIP database
Assi, 2014ScoliosisRegression analysisPre-op selection/planning141----Single center

Abbreviations: ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; SPARCS, Statewide Planning and Research Cooperative System.

Reviewed Studies of Preoperative Patient Selection and Planning in Spine Surgery. Abbreviations: ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; SPARCS, Statewide Planning and Research Cooperative System. The majority of the decision support studies evaluated AI/ML model performance using ROC/AUC, accuracy, sensitivity, and specificity. Two studies did not test model performance, but instead optimized preoperative patient selection using cluster analysis to classify patients based on preoperative risk factors and other variables. Four studies each evaluated different AI/ML-based predictive models of readmissions and reoperations, with an average AUC of .67 (SD .08) across 11 models and five different ML methods (Table 2). Predictive models of LOS and discharges were used in eight studies, with an average AUC of .80 (SD .08) across 23 models and six different AI/ML methods. Applications of preoperative patient selection/planning and cost prediction were used in 11 models across nine studies, reporting an average AUC of .89 (SD .08). ANOVA testing found statistically significant variability in model AUC, accuracy, and specificity across the different decision support applications (P < .001, P = .005, P < .001, respectively), and preoperative planning/cost prediction models and discharge/LOS models each reported significantly higher average accuracy (82.2% and 78.0%, respectively) compared to readmissions/reoperation prediction models (P = .009, P = .019, respectively). The same relationships were confirmed in comparisons of model specificity by Tukey post hoc testing. There were no significant differences in model sensitivity between the applications (Table 2).
Table 2.

Statistical Comparisons of Reported Model Performance Metrics, by Administrative/Clinical Decision Support Application.

Administrative or clinical decision support applicationsPerformance metrics: Mean (SD, N)
AUCAccuracySensitivitySpecificity
Preoperative planning and cost prediction.89 (.08, 11)82.2 (4.8, 7)70.5 (10.9, 6)87.7 (5.1, 6)
Discharge, LOS.80 (.08, 23)78.0 (7.7, 9)69.1 (19.8, 7)76.6 (7.8, 7)
Readmissions and reoperations.67 (.08, 11)70.2 (11.8, 8)56.0 (16.5, 7)59.0 (16.5, 7)
ANOVAP < .001P = .005P = .472P < .001
Tukey post hoc tests1 vs 2 (P = .005)1 vs 3 (P = .009)--1 vs 3 (P < .001)
1 vs 3 (P < .001)2 vs 3 (P = .019)--2 vs 3 (P = .002)
2 vs 3 (P < .001)------

Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.

Statistical Comparisons of Reported Model Performance Metrics, by Administrative/Clinical Decision Support Application. Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.

Prediction and Management of Postoperative Outcomes and Complications

A total of 24 reviewed studies used various AI/ML models to predict outcomes, complications, and adverse events (Supplementary Table 2, Table 3).[19,23,31-52] Average AUC and accuracy values significantly varied (P < .001 and P < .027, respectively) (Table 4). AUC for predicting postoperative cardiovascular complications averaged .69 (SD .12, 21 models), other postoperative complications averaged .68 (SD .12, 31 models), postoperative mortality models averaged .82 (SD .08, 30 models), and postoperative functional and clinical outcome models averaged .75 (SD .09, 30 models). Tukey post hoc testing found statistically significant differences between postoperative mortality models (average AUC of .82) and each of the other prediction models (Table 4). Average accuracy was also found to be significantly different between other postoperative complications and postoperative functional and clinical outcomes, (85.8% vs 72.2%, respectively) (P = .027) and there was no significant variation in reported sensitivity and specificity values (Table 4).
Table 3.

Reviewed Studies of Postoperative Outcome Prediction in Spine Surgery.

Author/yearPathology or procedureML algorithmsPrediction outputsNumber of patientsAvg age% FemaleData source
Arvind, 2018ACDFDeep learning/ANN, regression analysis, and SVMCardiac, VTE, wound infection, and 30-day mortality62645352Multi-center database
Kim, 2018Lumbar decompressionDeep learning/ANN and regression analysisCardiac, VTE, wound infection, and 30-day mortality67896055ACS-NSQIP database
Kim, 2018ASDDeep learning/ANN and regression analysisCardiac, VTE, wound infection, and 30-day mortality17466059ACS-NSQIP database
Karhade, 2019ACDFDeep learning/ANN, SVM, decision tree, regression analysis, and boosting/ensemble learningSustained opioid use27375153Multi-center database
Han, 2019GeneralRegression analysisAdverse events, cardiac/CHF, neurologic, pulmonary/pneumonia, and overall medical/surgical complication3318706354IBM MarketScan, CMS Medicaid, Medicare databases
Durand, 2018ASDDecision treePostoperative blood transfusion2055466ACS-NSQIP database
Karhade, 2019Spinal metastatic diseaseDeep learning/ANN, regression analysis, SVM, decision tree, and boosting/ensemble learning90-day mortality, and 1-year mortality7326142Single center
Scheer, 2017ASDDecision treeAdverse events and major complications5575879Multi-center ASD databases
Janssen, 2018Thoracolumbar spine surgeryRegression analysisWound infection8985251NCI SEER registry
Karhade, 2019Spinal epidural abscessDeep Learning/ANN, regression analysis, SVM, decision tree, and boosting/ensemble learningMortality: 90-day10535939Multi-center database
Karhade, 2019Lumbar spine surgeryRegression analysisSustained opioid use84356046Multi-center database
Ryu, 2018Spinal ependymomaDecision tree and regression analysis5-year mortality and 10-year mortality2822--47NCI SEER registry
Khan, 2020Degenerative cervical myelopathy (DCM)Decision tree, regression analysis, SVM, and boosting/ensemble learningPRO/functional outcomes (SF-36 MCS, PCS)1935235Multi-center AOSpine CSM clinical trials
Staartjes, 2019Single-level tubular microdiscectomy for lumbar disc herniationDeep learning/ANN and regression analysisClinical improvement (leg pain, back pain, and functional disability)4224949Single center
Hoffman, 2015Cervical spondylotic myelopathySVM and regression analysisPRO/functional outcomes (post-op ODI)206045Single center
Shamim, 2009Lumbar disc surgeryCluster analysisPost-op poor outcomes5014131Single center
Azimi, 2014Lumbar spinal stenosisDeep learning/ANN and regression analysisPost-op patient satisfaction1686065Single center
Azimi, 2015Lumbar disk herniationDeep learning/ANN and regression analysisPost-op recurrent lumbar disc herniation4025054Single center
Azimi, 2017Lumbar disk herniationDeep learning/ANNPost-op successful outcomes2034853Single center
Buchlak, 2017ASDRegression analysisPost-op complications1366374Single center
Karhade, 2018Spinopelvic chordoma surgeryDeep Learning/ANN, decision tree, SVM, and Bayesian networks5-year mortality2656439NCI SEER registry
Khor, 2018Lumbar fusionRegression analysisClinical improvement19656160Multi-center database
Bekelis, 2014ACDFRegression analysisCardiac, VTE, wound infection, and 30-day mortality27325646ACS-NSQIP database
Siccoli, 2019Lumbar decompressionDeep learning/ANN, decision tree, and Bayesian networksClinical improvement (6wk, 12wk)6356248Single center
Ames, 2019Adult spinal deformityRegression analysis, decision tree, and boosting/ensemble learningPRO/functional outcomes (SRS-22R)56154.475.9Single center

Abbreviations: ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; CHF, congestive heart failure; VTE, venous thromboembolism; UTI, urinary tract infection; PRO, patient-reported outcomes; SF-36, short-form 36 questionnaire; MCS, mental health composite score; PCS, physical health composite score; ODI, Oswestry disability index; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; CMS, centers for Medicare and Medicaid services; NCI SEER, National Cancer Institute Surveillance, Epidemiology, and End Results database; AOSpine CSM, AOSpine North America cervical spondylotic myelopathy study.

Table 4.

Statistical Comparisons of Reported Model Performance Metrics, by Postoperative Prediction/Management Application.

Postoperative prediction/management applicationsPerformance metrics: Mean (SD, N)
AUCAccuracySensitivitySpecificity
Postoperative cardiovascular complications.69 (.12, 21)--81.0 (4.2, 2)52.0 (1.4, 2)
Other postoperative complications.68 (.12, 31)85.8 (7.9, 4)77.6 (4.4, 5)51.6 (.5, 5)
Postoperative mortality.82 (.08, 30)------
Postoperative functional or clinical outcomes.75 (.09, 30)72.2 (11.2, 28)73.8 (15.5, 24)60.9 (17.5, 24)
ANOVAP < .001P = .027P = .487P = .278
Tukey post hoc tests1 vs 3 (P < .001)------
2 vs 3 (P < .001)------
3 vs 4 (P = .035)------

Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.

Reviewed Studies of Postoperative Outcome Prediction in Spine Surgery. Abbreviations: ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; CHF, congestive heart failure; VTE, venous thromboembolism; UTI, urinary tract infection; PRO, patient-reported outcomes; SF-36, short-form 36 questionnaire; MCS, mental health composite score; PCS, physical health composite score; ODI, Oswestry disability index; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; CMS, centers for Medicare and Medicaid services; NCI SEER, National Cancer Institute Surveillance, Epidemiology, and End Results database; AOSpine CSM, AOSpine North America cervical spondylotic myelopathy study. Statistical Comparisons of Reported Model Performance Metrics, by Postoperative Prediction/Management Application. Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.

Comparison of AI/ML Algorithms

The most commonly applied AI/ML algorithms in the reviewed studies were logistic regression (24 studies, 58.5%), while cluster analysis was only used in 3 studies (7.3%) (Supplementary Table 2, Table 5). When comparing AI/ML model performance across various algorithm types, there was statistically significant variation confirmed by one-way ANOVA testing. Bayesian networks had the highest average AUC (.80, SD .09, 13 models), while support vector machines (SVMs) had the lowest average AUC (.63, SD .18, 17 models). Tukey post hoc testing found significant differences in AUC between SVM and several AL/ML algorithms, including Bayesian network (P = .009), decision tree (P = .018), and deep learning/ANN (P = .019) (Table 5). There was significant variation in reported average sensitivity of the AL/ML algorithms (P = .006). Deep learning/ANN had the highest reported average sensitivity of 81.5% (SD 12.1%, 8 models), while Bayesian network and boosting/ensemble learning had the lowest reported sensitivity, with 63.7% (SD 11.0%, 4 models) and 55.7% (SD 21.7%, 7 models), respectively. ANOVA testing did not detect significant differences in accuracy or specificity between the AI/ML algorithms (P = .083 and P = .554, respectively) (Table 5).
Table 5.

Statistical Comparisons of Reported Model Performance Metrics, by AI/ML Algorithm.

AI/ML algorithmPerformance metrics: Mean (SD, N)
AUCAccuracySensitivitySpecificity
Bayesian network (BN).80 (.09, 13)76.9 (11.9, 8)63.7 (11.0, 4)67.4 (17.7, 4)
Boosting/ensemble learning (BEL).76 (.10, 13)74.1 (9.6, 8)55.7 (21.7, 7)71.7 (11.4, 7)
Decision tree (DT).77 (.11, 29)74.0 (8.7, 13)75.4 (13.7, 12)62.5 (21.7, 12)
Deep learning/artificial neural network (ANN).77 (.11, 34)83.0 (10.7, 10)81.5 (12.1, 8)71.8 (10.1, 8)
Logistic regression (LR).74 (.11, 56)70.4 (10.6, 13)70.6 (12.4, 19)61.0 (12.4, 19)
Support vector machines (SVM).63 (.18, 17)67.5 (12.9, 3)72.3 (18.3, 3)56.0 (42.9, 3)
ANOVAP = .007P = .083P = .006P = .554
Tukey post hoc testsBN vs SVM (P = .009)--BEL vs ANN (P = .002)--
DT vs SVM (P = .018)------
ANN vs SVM (P = .019)------

Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; AI/ML, artificial intelligence and machine learning.

Statistical Comparisons of Reported Model Performance Metrics, by AI/ML Algorithm. Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; AI/ML, artificial intelligence and machine learning.

Discussion

This systematic review is the first to evaluate and summarize AI/ML applications in optimizing patient selection and predicting surgical outcomes and complications in spine surgery. Our review included 41 studies from the literature which tested AI/ML-based prediction and optimization models that may help guide clinical decision-making and surgical planning. Among all the reviewed studies, AI/ML models were fairly accurate, averaging 74.9% overall accuracy and AUC of .75, across all AI/ML methods. In particular, AI/ML models performed best in optimizing preoperative patient selection and planning and predicting costs, hospital discharge, and length of stay. Model performance was also good or fair (AUC between .70 and .89) in predicting postoperative mortality and functional and clinical outcomes. However, model performance was considered poor (AUC between .50 and .69) in predicting postoperative complications (including cardiovascular complications), adverse events, and readmissions and reoperations, which may be due to the difficulty in predicting random events which are out of the surgeon’s control in the postoperative period. In addition, model performance metrics such as AUC must also be carefully interpreted, especially because AUC balances a model’s precision and recall (and resulting false positives and false negatives), and in certain clinical applications such as cancer screening or prediction of potentially fatal complications after spine surgery, providers may prefer a model with a lower AUC that minimizes false negatives.[4,53] Although AI/ML models did not perform well in predicting postoperative complications, they offer a potentially beneficial tool for providers to optimize preoperative planning and improve cost-efficiency. For example, a practicing surgeon may use an electronic medical record system with an integrated AI/ML application that accurately predicts which patients will almost certainly require inpatient vs outpatient surgery to ensure that these high-risk patients have access to specialized care and supervision post-operatively. As a result, surgeons can have an incredibly accurate aid in patient selection, thus ensuring that patients are treated in the appropriate setting. In a systematic review of AI/ML applications in neurosurgery, Buchlak et al. reported similar model performance results for deep learning/ANN and logistic regression models as our study. However, their findings reported SVM performance to have an average AUC of .80 and accuracy of 81.8%, which is significantly higher than the results from our study for SVM. It appears that SVM may be shown to be more accurate in certain non-spine neurosurgical applications, such as image classification,[55-60] but is perhaps less accurate for guided decision-making in spine surgery. AI/ML-based predictive modeling may be especially beneficial in spine surgery, which usually involves complex procedures with potentially high complication rates in often highly comorbid patient population. Ames et al. showed that an AI/ML-based classification system for ASD surgical candidates optimizes personalized treatment plans based on patient-specific risk factors. This application may aid surgeons with pre-operative decision-making by informing them about which treatment options may offer optimal clinical improvement and value with the lowest risk of adverse events. In our review, several of the AI/ML prediction and optimization models that were used to improve patient care and postoperative outcomes also showed the potential to reduce unnecessary healthcare expenditures and even provide risk-adjusted reimbursement models for providers and hospitals.[13,14,22,26] The predictive capabilities of AI/ML models enable decision makers to forecast costs related to postoperative outcomes and complications, pain medication use, patient discharges and discharge placements, length of stay, unplanned readmissions, and other postoperative interventions. The authors highlighted the potential of AI/ML to improve clinical decision making and patient care by predicting likely postoperative outcomes, which enables providers to optimize resource allocation for post-surgical monitoring and focused care of high-risk patients.[16,17,20,22] Of particular relevance to curtailing rising inpatient costs, accurate forecasting of hospital length of stay has important implications for management of bed utilization and other hospital resources. Kalagara et al. analyzed hospital readmissions after laminectomy and used patient-specific variables to develop predictive models for identifying readmitted patients with over 95% accuracy. AI/ML may also aid surgeons and clinical decision makers to more efficiently plan for surgery and select patients for the optimal surgical setting (for instance, outpatient vs inpatient) that will produce the best care outcomes while improving cost-efficiency. Several studies highlighted the potential value of predictive modeling during the pre-operative period in helping surgeons with optimizing patient selection for surgery and surgical planning, which also allows providers to efficiently allocate needed hospital resources and plan for possible postoperative interventions to ensure the best possible outcomes.[22,25-27] The recent shift toward value-based health care has likely also spurred the recent spike in research of AI/ML applications in optimizing cost-efficiency and resource allocation, especially because post-surgical inpatient care and other associated hospital costs are major drivers of US healthcare expenditures and spine surgery costs.[62-67] In contrast, outpatient surgical procedures have been shown to be comparatively less costly than inpatient treatment, and treating suitable surgical candidates in the outpatient setting may offer significant cost savings.[68-73] Development of well-defined and accurate patient selection criteria for outpatient surgery, along with optimized anesthesia and postoperative pain management protocols, are associated with reduced patient readmission risk and surgical costs.[74-76] Predictive modeling of patient length of stay, based on their medical comorbidities, demographic profile, and other variables, may aid surgeons in the selection of outpatient surgical candidates, and has been shown to be effective in selecting patients for outpatient posterior spinal fusions. Through the use of patient-specific risk factors, AI/ML applications may also enable development of risk-adjusted insurance reimbursement models which compensate providers and hospitals commensurate with the case complexity and patient complication risk and comorbidities, providing a potential solution for unwillingness to treat medically complex patients. However, issues of data privacy and security when using AI/ML remain a major challenge which must be addressed, as patients may feel uncomfortable with their personal health information being used on such a large scale. Although there has been a recent significant increase in the number of AI/ML publications in spine surgery, there remains a general lack of large, powered, and externally validated studies which would elucidate more information on their efficacy in spine surgery practice. In addition, it is important to note that although that our review included studies through February 2020, it still provides a detailed overview of the recent trends in the literature and the potential early applications of AI/ML. Many of the reviewed studies involved different spine procedures that vary in complexity and risk and included studies with models which varied in the quality and quantity of their training and validation data. As such, any conclusions about the efficacy of AI/ML applications in spine surgery require further investigation. This study does not make conclusive relationships between AI/ML and clinical efficacy, but instead presents statistical findings and trends from recent studies. Future directions in research of AI/ML applications in spine surgery, and in health care, must focus on developing externally validated and commercially viable systems that can be easily implemented and incorporated with already-existing hospital systems in a cost-efficient manner. In addition, future studies should evaluate optimal methods that aid in determining surgical candidates and which can use a wide range of preoperative data. An understanding of AI/ML-based applications is becoming increasingly important, particularly in spine surgery, as the volume of reported literature, technology accessibility, and clinical applications continues to rapidly expand.
  73 in total

1.  Lumbar Ultrasound Image Feature Extraction and Classification with Support Vector Machine.

Authors:  Shuang Yu; Kok Kiong Tan; Ban Leong Sng; Shengjin Li; Alex Tiong Heng Sia
Journal:  Ultrasound Med Biol       Date:  2015-06-26       Impact factor: 2.998

2.  Predicting discharge placement after elective surgery for lumbar spinal stenosis using machine learning methods.

Authors:  Paul T Ogink; Aditya V Karhade; Quirina C B S Thio; William B Gormley; Fetullah C Oner; Jorrit J Verlaan; Joseph H Schwab
Journal:  Eur Spine J       Date:  2019-04-02       Impact factor: 3.134

3.  Artificial Intelligence Based Hierarchical Clustering of Patient Types and Intervention Categories in Adult Spinal Deformity Surgery: Towards a New Classification Scheme that Predicts Quality and Value.

Authors:  Christopher P Ames; Justin S Smith; Ferran Pellisé; Michael Kelly; Ahmet Alanay; Emre Acaroğlu; Francisco Javier Sánchez Pérez-Grueso; Frank Kleinstück; Ibrahim Obeid; Alba Vila-Casademunt; Christopher I Shaffrey; Douglas Burton; Virginie Lafage; Frank Schwab; Christopher I Shaffrey; Shay Bess; Miquel Serra-Burriel
Journal:  Spine (Phila Pa 1976)       Date:  2019-07-01       Impact factor: 3.468

4.  Neer Award 2016: Outpatient total shoulder arthroplasty in an ambulatory surgery center is a safe alternative to inpatient total shoulder arthroplasty in a hospital: a matched cohort study.

Authors:  Tyler J Brolin; Ryan P Mulligan; Frederick M Azar; Thomas W Throckmorton
Journal:  J Shoulder Elbow Surg       Date:  2016-08-31       Impact factor: 3.019

5.  Use of artificial neural networks to predict surgical satisfaction in patients with lumbar spinal canal stenosis: clinical article.

Authors:  Parisa Azimi; Edward C Benzel; Sohrab Shahzadi; Shirzad Azhari; Hasan Reza Mohammadi
Journal:  J Neurosurg Spine       Date:  2014-01-17

6.  Trends in Utilization and Cost of Cervical Spine Surgery Using the National Inpatient Sample Database, 2001 to 2013.

Authors:  Caterina Y Liu; Corinna C Zygourakis; Seungwon Yoon; Tamara Kliot; Christopher Moriates; John Ratliff; R Adams Dudley; Ralph Gonzales; Praveen V Mummaneni; Christopher P Ames
Journal:  Spine (Phila Pa 1976)       Date:  2017-08-01       Impact factor: 3.468

7.  Development of machine learning algorithms for prediction of discharge disposition after elective inpatient surgery for lumbar degenerative disc disorders.

Authors:  Aditya V Karhade; Paul Ogink; Quirina Thio; Marike Broekman; Thomas Cha; William B Gormley; Stuart Hershman; Wilco C Peul; Christopher M Bono; Joseph H Schwab
Journal:  Neurosurg Focus       Date:  2018-11-01       Impact factor: 4.047

8.  Imaging Surrogates of Infiltration Obtained Via Multiparametric Imaging Pattern Analysis Predict Subsequent Location of Recurrence of Glioblastoma.

Authors:  Hamed Akbari; Luke Macyszyn; Xiao Da; Michel Bilello; Ronald L Wolf; Maria Martinez-Lage; George Biros; Michelle Alonso-Basanta; Donald M OʼRourke; Christos Davatzikos
Journal:  Neurosurgery       Date:  2016-04       Impact factor: 4.654

9.  The Seattle spine score: Predicting 30-day complication risk in adult spinal deformity surgery.

Authors:  Quinlan D Buchlak; Vijay Yanamadala; Jean-Christophe Leveque; Alicia Edwards; Kellen Nold; Rajiv Sethi
Journal:  J Clin Neurosci       Date:  2017-07-01       Impact factor: 1.961

10.  Identifying appropriate candidates for ambulatory outpatient shoulder arthroplasty: validation of a patient selection algorithm.

Authors:  Matthew N Fournier; Tyler J Brolin; Frederick M Azar; Raj Stephens; Thomas W Throckmorton
Journal:  J Shoulder Elbow Surg       Date:  2018-08-10       Impact factor: 3.019

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.