Literature DB >> 33207969

Feasibility and Assessment of a Machine Learning-Based Predictive Model of Outcome After Lumbar Decompression Surgery.

Arthur André1,2,3, Bruno Peyrou3, Alexandre Carpentier2, Jean-Jacques Vignaux3.   

Abstract

STUDY
DESIGN: Retrospective study at a unique center.
OBJECTIVE: The aim of this study is twofold, to develop a virtual patients model for lumbar decompression surgery and to evaluate the precision of an artificial neural network (ANN) model designed to accurately predict the clinical outcomes of lumbar decompression surgery.
METHODS: We performed a retrospective study of complete Electronic Health Records (EHR) to identify potential unfavorable criteria for spine surgery (predictors). A cohort of synthetics EHR was created to classify patients by surgical success (green zone) or partial failure (orange zone) using an Artificial Neural Network which screens all the available predictors.
RESULTS: In the actual cohort, we included 60 patients, with complete EHR allowing efficient analysis, 26 patients were in the orange zone (43.4%) and 34 were in the green zone (56.6%). The average positive criteria amount for actual patients was 8.62 for the green zone (SD+/- 3.09) and 10.92 for the orange zone (SD 3.38). The classifier (a neural network) was trained using 10,000 virtual patients and 2000 virtual patients were used for test purposes. The 12,000 virtual patients were generated from the 60 EHR, of which half were in the green zone and half in the orange zone. The model showed an accuracy of 72% and a ROC score of 0.78. The sensitivity was 0.885 and the specificity 0.59.
CONCLUSION: Our method can be used to predict a favorable patient to have lumbar decompression surgery. However, there is still a need to further develop its ability to analyze patients in the "failure of treatment" zone to offer precise management of patient health before spinal surgery.

Entities:  

Keywords:  ROC curve; lumbar decompression surgery; machine learning; retrospective study; synthetic electronic medical record

Year:  2020        PMID: 33207969      PMCID: PMC9344503          DOI: 10.1177/2192568220969373

Source DB:  PubMed          Journal:  Global Spine J        ISSN: 2192-5682


Introduction

Lumbar spinal disorders are among the most disabling conditions, particularly in developed countries, due to the increase in sedentary lifestyles and aging populations. When conservative treatment is insufficient or pharmaceutical options show too many secondary effects (dependency, misuse), surgery is a valid option to relieve pain and improve function.[2-4] However, patient selection remains very complex and the benefits of surgical interventions sometimes uncertain. Indeed, between 2 and 23% of patients having back surgery will present an adverse event or a complication after surgery.[6,7] Around 30% to 50% of patients will not be—or only slightly—relieved—by the surgical act, and will maintain their intake of morphine, with the side effects and the costs that this entails Surgery success is well evaluated by validated indicators such as patient-reported outcomes measures (PROMS). This protocol is based on the standardized collection of patient well-being and health status after a surgical procedure. It is used on large cohorts to study a set of factors participating in clinical outcomes after surgical treatment (see Table 1.).
Table 1.

Predictors.

AuthorYearSignificant predictorPositive predictive factorNegative predictive factorArea
Katz et al 10 1999Low cardiovascular comorbidity*GREEN ZONE
Hägg et al 11 2003Severe disc degeneration, Neuroticism, Pre-operative sick leave*ORANGE ZONE
Kohlboeck et al 12 2004Straight leg raise test, Depression, Sensory pain*ORANGE ZONE
Trief et al 13 2006Better emotional health*GREEN ZONE
Slover et al 14 2006Active compensation case, Self-rated poor health, Smoking, Headaches, Depression, Nervous system disorders*ORANGE ZONE
Braybrooke et al 15 2007Time to surgery*ORANGE ZONE
Mannion et al 16 2007Pain duration, Re-operations, Multilevel surgery, Depression, FABQ Score*ORANGE ZONE
Park et al 17 2008Minimally invasive surgery*GREEN ZONE
Park et al 17 2008Age, BMI > 25, Hypertension, Coronary artery diseases, Diabetes*RED ZONE
Garcia et al 18 2008Weight reduction program*GREEN ZONE
Vaidya et al 19 2009Obesity, Multiple level fusions*RED ZONE
Chen et al 20 2009Diabetes*RED ZONE
Abbott et al 21 2011Catastrophizing, Pain intensity, Bad expectations*ORANGE ZONE
Senker et al 22 2011Minimally invasive surgery*GREEN ZONE
Chaichana et al 23 2011Depression, Decreased perception scale anxiety*ORANGE ZONE
Sinikallio et al 24 2011Depression*ORANGE ZONE
Kalanithi et al 25 2012Morbid obesity*RED ZONE
Sørlie et al 26 2012MODIC type 1 smoking*ORANGE ZONE
Hellum et al 28 2012Long duration Low back pain high fear avoidance for work, MODIC changes*ORANGE ZONE
Gaudelli and Thomas 29 2012Instrumented fusion*RED ZONE
Mehta et al 30 2012Obesity*RED ZONE
Sharma et al 31 2013Diabetes*RED ZONE
Takahashi et al 32 2013Diabetes of more than 20 years*RED ZONE
Bekelis et al 33 2014Age, Extensive operations, Medical deconditioning (weight loss, dialysis, peripheral vascular disease) BMI, Neurologic deficit, Bleeding disorders*RED ZONE
Lee et al 34 2014Opioid consumption, Modified somatic perception, Depression*ORANGE ZONE
Pakarinen et al 27 2014Depression*ORANGE ZONE
Kim et al 35 2018Back pain, Pain sensitivity*ORANGE ZONE
Coronado et al 36 2015Increased pain sensitivity Increased pain catastrophizing*ORANGE ZONE
McGirt et al 37 2015Functional score opioid use, Hypertension, Atrial fibrillation, extremity pain, myocardial infarction, Diabetes, Osteoporosis, Smoking*ORANGE ZONE
Anderson et al 38 2015Chronic opioid therapy, Additional lumbar surgery, depression, work loss*ORANGE ZONE
Chotai et al 39 2015Insurance status, Functional score, BP/NP Scores*ORANGE ZONE
Schöller et al 40 2016Re-operation, Duration of pain, Spondylisthesis, Smoking, gender, Age, BMI*ORANGE ZONE
Archer et al 41 2016Cognitive-behavioral based physicaltherapy (CBPT)*GREEN ZONE
Asher et al 42 2017ASA score, disability, education, Unemployment, Insurance status*ORANGE ZONE
Mummaneni et al 43 2017Open surgery*ORANGE ZONE
Crawford et al 44 2017DiscopathyORANGE ZONE
Suri et al 45 2017Smoking, Depression*ORANGE ZONE
McGirt et al 5 2017Education, Employment status, Baseline EQ5D, Fusion*ORANGE ZONE
Sharma et al 46 2018Prior opioid dependence, Younger age*ORANGE ZONE
Dunn et al 47 2018Catastrophizing, depression*ORANGE ZONE
Chan et al 48 2018Symptom duration*ORANGE ZONE
O’Donnell et al 49 2018Opioid use, Time to surgery, Legal representation, Psychiatric comorbidity*ORANGE ZONE
Khor et al 50 2018Age, Gender, Ethnic, Insurance Status, ASA Score, functional score*ORANGE ZONE
Dobran et al 51 2019Age, BMI*RED ZONE
Staub et al 52 2020Obesity, Re-operation, insurance status*ORANGE ZONE
Mauro et al 53 2020BMI*ORANGE ZONE
Rudolfsen et al 54 2020Quality of life score, Functional score*GREEN ZONE
Predictors. Most of these studies are based on the analysis of electronic medical records (EHR) in single-institution or in large national Database, describing statistically relevant risk factors of adverse event or surgery failure on a population.[5,55] There is a growing interest about predictive factors influencing individual response after surgery, especially in terms of individual PROM. Furthermore, some promising predictive models in disk herniation recurrence or fusion[50,56,57] exist but there is a lack of practical models for lumbar spine decompression in general. “4P” (predictive, preventive, personalized and participative) medicine benefits from the support of artificial intelligence (AI) machine learning and synthetic patient models.[59,60] Regarding spine surgery, tools are already capable of improving the quality of the spine diagnosis. Some algorithms allow to determine the average duration of sick leave, the risks of opioids dependence for prolonged periods post-operatively and to predict postoperative adverse events up to 30 days after spinal surgery[64-66] (see Table 2.).
Table 2.

Predictive Model for Spine Surgery.

AuthorYearData collection (center)Number of patientsClassifier usedPrediction / AUC
Azimi et al 67 2014Database(single-center)168ANN, Logistic regression analysis2-year surgical satisfaction (AUC 0.80)
Azimi et al 68 2014Database(single-center)203ANN, Logistic regression analysisSuccessful surgery outcome for disk herniation (AUC 0.82)
Azimi et al 69 2015Database(single-center)402ANN, Logistic regression analysisSuccessful ANN model to predict recurrent lumbar disk herniation (AUC 0.84)
Ratliff et al 70 2016Database(National)279 135LASSO (GLMnet), multivariate logistic regressionAdverse events (AUC 0.61)
Azimi et al 56 2017Database(single-center)346ANNOptimal treatment choice for LSCS patients (AUC 0.89)
Oh et al 71 2017Database(Multi-center)234C5.0 algorithm (type of decision tree model)Post-operative improvement AUC (0.96)
Scheer et al 72 2017Database(Multi-center)557C5.0 algorithm (type of decision tree model)Major intra- or perioperative complications (AUC 0.89)
Staarjes et al 73 2018Registry(single-center)422TensorFlow ANNFavorable outcome (AUC 0.87)
Khor et al 50 2018Database(Multi-center)1 965Multivariate analysisPredicting lower ODI: nonprivate insurance workers’ compensation (0.20), current smoking (0.43) or previous smoking (0.66), asthma (0.54), and a lower baseline score (1.05)
Iderberg et al 62 2018Registry(Multi-center)19 131Multivariate, regression analysis / GLMPredicting Clinical outcomes: Odds ratios: Social welfare (1.34) / Living Alone (1.14) / Educational level (-2.39) / Disposable income (-2.58)
Kim et al 35 2018Registry(Multi-center)22 629ANNs and multivariate logistic regressionWound complications and mortality (AUC 0.6 to 0.71)
Karhade et al 74 2018Registry(Multi-center)26 364SVM, ANNPrediction of anormal discharges (AUC 0.82)
Kuo et al 75 2018Database(Single-center)532SVMs, logistic regression, C4.5 decision treeMedical costs (AUC 0.90)
Kalagara et al 65 2018Registry(Multi-center)26 869R Foundation for statistical computing/ GBMReadmission (AUC 0.69)
Goyal et al 76 2019Registry(Multi-center)59 145GLM/ GMB/ ANN/ RF / pLDA/ VarBayesDischarge to non-home facility (AUC >0.80)
Han et al 66 2019MarketScan & Medicaid Databases(Multi-center)1 106 234Multivariate logistic regression analysisPredicting the risk of a pulmonary complication (AUC 0.76)
Siccoli et al 64 2019Registry635Random forests, extreme gradient boosting (XGBoost), Bayesian generalized linear models (GLMs), boosted trees, k-nearestneighbor, simple GLMs, artificial neural networks with a single hidden layerExtended hospital stay with an accuracy of 77% (AUC 0.58)
Shah et al 77 2019Database(single-center)367Logistic regression analysis, Stochastic gradient boosting, Random Forest, Support Vector machineFailure of nonoperative management.Random Forest (AUC 0.56)Logistic Regression (AUC 0.79)
Karhade et al 78 2019Database(single-center)1 053Logistic regression analysis, Stochastic gradient boosting, Random Forest, Support Vector machinePrediction of 90-day mortality in spinal epidural abscess (AUC 0.89)
Hopkins et al 79 2019Registry(Multi-center)23 264ANN (7 layers)Readmissions (AUC > 0.60)
Nelson et al 80 2019Database(Single-center)22 318appointmentsANN, Logistic regression analysis, Support vector machine, Random ForestScheduled appointment attendance in healthcare ANN AUC (0.81)
Karhade et al 63 2019Database(Multi-center)5 413Logistic regression analysis, Stochastic gradient boosting, Random Forest, Support Vector machineProlonged postoperative opioid prescription(AUC 0.81)
Hopkins et al 81 2020Database(single-center)4046ANN (9 layers deep neural network)Prediction of infections (AUC 0.78)

Notes: ACC = accuracy; ACS-NSQIP = American College of Surgeons National Surgical Quality Improvement Program; ANN = artificial neural networks; AUC = area under the receiver operating characteristic curve; COPD = chronic obstructive pulmonary disease; DNN = deep neural networks; EHR = electronic health records; GBM = gradient boosting machine; GLM = generalized linear model; GLMnet = elastic-net GLM; LSS = lumbar spinal stenosis; MCID = minimum clinically important difference; ML = machine learning; NPV = negative predictive value; NRS = numeric rating scale; NRS-BP = NRS for back pain; NRS-LP = NRS for leg pain; ODI = Oswestry Disability Index; PHC = predictive hierarchical clustering; PPV = positive predictive value; PROMs = patient-reported outcome measures; RF = random forest; ROC = receiver operating characteristic

Predictive Model for Spine Surgery. Notes: ACC = accuracy; ACS-NSQIP = American College of Surgeons National Surgical Quality Improvement Program; ANN = artificial neural networks; AUC = area under the receiver operating characteristic curve; COPD = chronic obstructive pulmonary disease; DNN = deep neural networks; EHR = electronic health records; GBM = gradient boosting machine; GLM = generalized linear model; GLMnet = elastic-net GLM; LSS = lumbar spinal stenosis; MCID = minimum clinically important difference; ML = machine learning; NPV = negative predictive value; NRS = numeric rating scale; NRS-BP = NRS for back pain; NRS-LP = NRS for leg pain; ODI = Oswestry Disability Index; PHC = predictive hierarchical clustering; PPV = positive predictive value; PROMs = patient-reported outcome measures; RF = random forest; ROC = receiver operating characteristic Among these machine learning methods, we found multivariate logistic regression, stochastic gradient boosting or support vector machine methods and recently artificial neural networks and their improvement in deep neural networks[60,77] to support decision-making activities. Despite the current focus using EHR as the standard for development of machine learning algorithms, it can be very difficult to gather all the data needed to train such models. Likewise, for technical reasons (interoperability, data exchange, and ability of the operator to use information technologies) or legal and ethical issues, it is difficult to access the full records in academic and industrial research. The generation of synthetic patients from the exploitation of EHR solves many problems related to the processing of real patients data. Therefore data-driven methods were developed based on synthetic EHR in 3 different ways: using synthetic health data records to help overcome confidentiality issues,[62,85] modeling disease progression and interventions for prospective analysis of large scale virtual cohorts ; and completing EHR data for imbalanced cohorts (cf. Table 3).
Table 3.

Synthetic Patient Models.

StudyAuthorsPatient synthetic model and technologyKeypoint
He et al 87 2008Adaptive Synthetic Sampling Method for Imbalanced Data (ADASYN)Reducing the bias introduced by the class imbalance, and promote recognition of complex patients
Teutonico et al 88 2015Discrete re-sampling and multivariate normal distribution (MVND) methodologies in the creation of virtual patient populationThe multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols.
McLachlan et al 89 2016The CoMSER method takes a constraint-based approach involving:(1) formalizing clinical practice guidelines into the CareMap constraint and the CareMap into the State Transition Machine (STM),(2) incorporating published Health Incidence Statistics based constraints into the STM, and(3) exploiting domain expertise in verifying domain knowledge and creating the reusable library of clinical notesProduction of synthetic EHR that is considered realistic. The main contribution of this work is the approach that uses a CareMap for generating synthetic EHR with neither access to the real EHR nor using anonymized EHR. .
Kim et al 90 2018ADASYNAdaptive synthetic sampling approach to imbalanced learning (ADASYN) was used to generate positive synthetic complications for training model
Kim et al 35 2018ADASYNADASYN utilizes examples from the minority class that are difficult to learn and generates synthetic new cases based on these examples to improve model learning and generalizability
Baowaly et al 83 2019MedWGAN / MedBGAN(modified Generating Adversarial network)Learn the distribution of real-world EHRs and exhibit remarkable performance in generating realistic synthetic EHRs for both binary and count variables.
Pollack et al 91 20195 Steps Generating Synthetic Patient Data*Steps to generate EHR for testing and evaluation of Health information technology
Synthetic Patient Models.

Objective

The aim of this study is twofold, to develop a virtual patients model for lumbar decompression surgery and to evaluate the precision of an artificial neural network (ANN) model designed to accurately predict the clinical outcomes of lumbar decompression surgery.

Materials and Methods

A transparent reporting of a multivariable prediction model for individual prognosis was used for reporting our model of machine learning in Biomedical Research.

Institutional Review Board

The EHR screening was approved by the department review board from the Department of Neurosurgery, Pitié-Salpêtrière University Hospital, all other data was anonymously reported and there is no specific approval.

Population

Any patient who underwent lumbar decompression surgery from January 2019 to April 2019 in the Department of Neurosurgery, Pitié-Salpêtrière University Hospital was included. We exploited retrospectively the local EHR.

Data Collection

Data collection was carried out through the automated request of EHR patients from our center (Orbis, Agfa Healthcare). Pre-operative criteria were collected, including the patient’s age, sex, body mass index (BMI), demographic, radiological criteria, as well as the presence of comorbidities (diabetes, sleep apnea syndrome, kidney disease.), the type of work and the duration of sick leave, socio-professional problems, psychological disorders (anxiety or depressive syndrome) drugs consumption (NSAIDs, opioids), and immediate post-operative criteria such as: radiological criteria, sleep or food improvement, return to work, or rehabilitation inpatients center. Patients were classified into 3 categories according to their surgery outcome: Green (significant improvement of pain and function without level 2 or 3 analgesics or other symptom) Orange (no significant improvement and/or significant medication intake anxiety-depression and/or persistent lumbar pain) and Red (early adverse event or complication)

Predictors

The potential predictive factors were identified based on a comprehensive literature review (see Table 1.) on PubMed central library using the following MESH terms combined to the screening of preoperative data available in our EHRs (see Table 4.):
Table 4.

Patient Baseline Predictors.

VariableBinary criteria (1;0)Baseline Strength established
Day of surgerySame day; day before0%
Length of stay (LOS)> 4 days: < 4 days10%
Timing for procedure (1st,2nd,3 rd, 4th, 5th positioning in the day)3 rd, 4th, 5th in the day; 1st, 2nd,3rd10%
Type of job: sedentaryPresence; absence30%
Type of job: heavy workerPresence; absence30%
Work stopping duration before surgery-sedentary >1, 0< 1 day10%
Work stopping duration before surgery-heavy worker >3, 0< 3 days10%
Work stopping duration before surgery-moderate >14, 0< 14 days10%
Work stopping duration before surgery-light worker >35, 0< 35 days10%
Sleep disorderPresence; absence15%
Professional conflictPresence; absence30%
Family conflictPresence; absence15%
Specific physical activityPresence; absence30%
General physical activityAbsence; presence30%
AppetiteAbsence; presence5%
Age> 65 ans15%
BMI> 3050%
Smoking> 10 pack-year10%
Pre-operative walking distance reductionPresence; absence15%
Prior to surgery opioid consumptionPresence; absence20%
Cauda equina syndromePresence; absence30%
Transit disordersPresence; absence5%
Pre-operative motor deficitPresence; absence20%
Pre-operative sensitive deficitPresence; absenceIndication
Impulsive movement or pushing effortPresence; absence30%
Pre-operative inflammatory painPresence; absence30%
LimpPresence; absence10%
Acute lumbar painPresence; absence5%
Chronic lumbar painPresence; absence30%
Lumbar stifnessPresence; absence20%
Sphincter dysfunctionPresence; absence40%
DiabetePresence; absence10%
Pre-operative anxiety or depressive syndromePresence; absence20%
Sleep apnea syndromePresence; absence10%
COPDPresence; absence5%
PneumopathyPresence; absence20%
Liver disorderPresence; absence15%
AtheromaPresence; absence15%
Kidney DiseasePresence; absence5%
Pre-operative MODIC ImagesPresence; absence30%
Pre-operative CalcificationPresence; absence30%
Pre-operative stenosisPresence; absenceIndication
Pre-operative protrusionPresence; absence0%
Pre-operative excluded disc herniationAbsence; presence50%
Pre-operative disc herniationPresence; absenceDiscrete
L1L2 LevelPresence; absence30%
L2L3 LevelPresence; absence30%
L3L4 LevelPresence; absence30%
Pre-operative arthritisPresence; absence0%
Pre-operative hypertrophic facet diseasePresence; absence0%
Pre-operative osteophytePresence; absence0%
Pre-operative spondylolysisPresence; absence0%
Explicit pre-operative explanationsAbsence; Presence50%
Favorable operator experienceAbsence;presence70%
Food intake improvement> 3 days10%
Sleep improvement> 2 days20%
Return to work sedentary >42> 42 days30%
Return to work light >42> 42 days30%
Return to work moderate >75> 75 days30%
Return to work heavy workers >90> 90 days30%
InfectionPresence; absence15%
Autonomous walking recovery> 2 days20%
Anti-inflammatory drugs post-operativelyPresence; absence10%
Post-operative anxiety or depressive syndromePresence; absence20%
Post-operative disc calcificationPresence; absence20%
Post-operative stenosisPresence; absence40%
Post-operative fibrosisPresence; absence50%
Rehabilitation inpatients centerConvalescent home; home20%
Operative recurrencePresence; absence50%
Patient Baseline Predictors. “Machine Learning”[Mesh] OR “Artificial Intelligence”[Mesh] OR “Natural Language Processing”[Mesh] OR “Neural Networks (Computer)”[Mesh] OR “Support Vector Machine”[Mesh] OR Machine learning[Title/Abstract] OR Artificial Intelligence[Title/Abstract] OR Neural network[Title/Abstract] OR Neural networks[Title/Abstract] OR Natural language processing[Title/Abstract] OR deep learning[Title/Abstract] OR machine intelligence[Title/Abstract] OR computational intelligence[Title/Abstract] OR computer reasoning[Title/Abstract]))) AND (((“Neurosurgery”[Mesh] OR “Neurosurgical Procedures”[Mesh] OR “Intervertebral Disc Displacement”[Mesh] OR “Spinal Stenosis”[Mesh] OR neurosurgery[Title/Abstract] OR neurosurgeries[Title/ Abstract] OR neurosurgical[Title/Abstract] OR neurosurgically[Title/Abstract] OR spinal [Title/Abstract] OR lumbar[Title/Abstract] AND (“Surgical Procedures, operative”[Mesh] OR “Postoperative Complications”[Mesh] OR “surgery” [Subheading] OR “Postoperative Period”[Mesh] OR “Perioperative Period”[Mesh] OR “Preoperative Period”[Mesh] OR surgery[Title/Abstract] OR surgeries[Title/Abstract] OR surgical[Title/Abstract] OR postoperative*[Title/Abstract] OR post-operative*[Title/Abstract] OR preoperative*[Title/Abstract] OR preoperative*[Title/Abstract] OR perioperative*[Title/Abstract] OR peri-operative*[Title/Abstract] OR operative procedure*[Title/Abstract])))) NOT (Comment[Publication Type] OR editorial[Publication Type] OR letter[Publication Type] OR case reports[Publication Type]).”

From Predictors to Criteria Tables

The potential predictors had to be usable in a neural network algorithm (see part Training and validation of the model). In the input table each criterion was a binary value (1 or 0) that represents the presence or absence. So, each predictor was transformed into discrete criterium to fill the binary values tables.

Statistical Analysis

Criteria for real and synthetic patients were compared. The mean percentage of presence for each criterion for each zone (green and orange), as well as the mean number of criteria for each category of patients and each zone were reported.

Synthetic Patient Model

Our synthetic patient model allows us to generate as many virtual patients as we desire in order to train the classifier without the need of real patients. The model that we propose can help in bootstrapping a new model without long and costly data collection, it could also be used to boost under represented categories in classification problem. It is a statistical approach designed to create a virtual model, statistically representative of real patients’ population. Our method was to create patients that fall in the 2 zones that we defined (orange or green). To do so, we generated tables of random pre-op symptoms based on the input data defined before. Each input data (criteria) has a probability of presence, either 1 or 0 (present or not) based on a uniform distribution. Then, each criterium was associated with a strength. The strength of each criteria was determined by a cross-professional group including spine surgeons, clinical register experts and statisticians. In the input table, each criterium strength was added to the total strength of the table. This total strength was compared to a threshold, classifying patient in the orange zone (superior to the threshold) or the green zone (inferior) Tables are generated for 10000 virtual patients, of which 5000 are green and 5000 are orange.

Artificial Neural Network Architecture

Our classifier is an artificial neural network, which architecture is based on our criteria (see Figure 1). Each input neuron represents a pre-operative criterium and the value associated is the presence or the absence of it.
Figure 1.

Architecture of our artificial neural network.

Architecture of our artificial neural network. Activation functions for input and hidden layers are Rectified Linear Unit (ReLU). The activation function of the output layer is a sigmoid, the output value is then a Boolean: 1 if green, 0 if orange (See Figure 1). We use Keras Tensorflow framework for the construction and training of our model.

Training and Validation of the Model

The training of the classifier is done using 80% of the data set of virtual patients and 20% were used for testing purposes. The sets are randomly chosen in the virtual patient’s dataset, but we keep the 50% green and orange repartition. The algorithm chosen for loss calculation is binary cross entropy and Adam optimizer for back propagation. The indicator that we use for real data is twofold: accuracy of the model—i.e. classification in either green or orange zone for a given table, and the ROC curve—i.e. the percentage of true positive on false positive at different thresholds. Validation of the ANN is done against real patient tables using the Receiver Operating Characteristic Curve (AUC).

Results

Population and EHR Data Set

In the actual cohort, we included 60 patients, with complete EHR allowing sufficient analysis, 26 patients are in the orange zone constituting (43.4%) and 34 are in the green zone (56.6%) (See Figure 2). The average positive criteria amount for actual patients is 8.5 for the green zone (SD+/- 3.09) and 10.47 for the orange zone (SD 3.38). Results are presented in Figures 2 and 3.
Figure 2.

Real patient distribution according the number of pre operative criteria and their outcome (green: success/orange: failure).

Figure 3.

Statistical presence of criteria for each group orange / green (EHR).

Real patient distribution according the number of pre operative criteria and their outcome (green: success/orange: failure). Statistical presence of criteria for each group orange / green (EHR). A total of 68 unfavorable predictors were collected and included in the initial training of the predictive model (See Table 4.). Those 68 criteria are used (58 “type of criteria” and their variants). Among the 68 criteria, 54 are pre-operative criteria and 14 are peri-operative criteria (from surgery to 1-month follow-up). Missing criteria are also counted. 5 other criteria are related to Patient-Related Outcome and allow us to assess the improvement of the quality of life (See Table 5.). The presence of one of these criteria defines the patient’s outcome as falling into the orange zone. Our machine learning model was then evaluated through the correct patient classification in the orange zone.
Table 5.

Patient’s Clinical Outcomes (orange zone).

Clinical characteristic evaluatedBinary criteria (1;0)Area
Walking distance still limited at 1 monthPresence; absenceOrange zone
Partial recovery from post-operatively motor deficit at 1 monthPresence; absenceOrange zone
Partial recovery from post-operatively sensory deficit at 1 monthPresence; absenceOrange zone
Post-operative neuropathic pain at 1 monthPresence; absenceOrange zone
Post-operative anxiety-depression syndrome at 1 monthPresence; absenceOrange zone
Patient’s Clinical Outcomes (orange zone).

Synthetic Data Set

We generated 10000 virtual patients for training our classifier, 5000 were allocated to the green zone, 5000 to the orange zone. We chose a 50/50 split in order not to introduce a bias of distribution between the 2 zones during the algorithm training. We also generated 2000 tables for testing (20% of the training set). Figure 4 shows a Gaussian distribution of the number of criteria for the 2 zones.
Figure 4.

Number of patient criteria for the 2 zones (syn-EHRS).

Number of patient criteria for the 2 zones (syn-EHRS). For patients in the green zone we found a mean of 7.92 symptoms per table, (median: 9, SD +/- 1.71), for patients in the orange zone the mean is 10.93, (median: 11, SD +/- 1.81). These numbers are coherent with what we observe in real patient distributions (see Figure 2.). Submitting the number of criteria to a Welch’s test we get a value of -71.31 715 with a p-value of 0.0, confirming that the difference in number of criteria for the 2 zones is significantly different. Indeed, patients in the orange zone tend to have more criteria. Moreover, the higher the strength of a criteria the higher the probability of presence is for that symptom in the orange category. For instance, the predictor “BMI >30” is more represented in orange tables (16.88%) than in green ones (1.84%). Conversely, most of the criteria with low strength are represented with nearly the same proportion in the 2 categories (<2%): age, appetite, COPD, transit disorders, Sleep apnea work stopping duration before surgery-light worker>35, kidney disease and diabetes. The statistical presence of each criteria in each zone is plotted in Figure 5.
Figure 5.

Statistical presence of criteria for each group (syn-EHRs).

Statistical presence of criteria for each group (syn-EHRs). The combination of several criteria leads from green to orange zone, i.e, the presence of 1 or 2 criteria is not significant in itself to classify the patient outcome. In our synthetic population, 5 criteria are present more than 20% of the time, but these criteria alone do not determine the zone.

Comparison of Criteria Between Real Patient and Synthetic Patient

The criteria proportions in each cohort are compared in Table 6. In order to assess the relevance of the virtually generated patients and their representativeness, we used an open-clustering approach.
Table 6.

Real and Synthetics Patient’s Predictors Distribution (%).

CriteriaGreen_real (%)Orange_real (%)Green_synth (%)Orange_synth (%)
0Day of surgery52.9461.5417.614.02
1Length of stay (LOS)35.2942.3112.9615.02
2Timing for procedure (1st, 2nd,3 rd, 4th, 5th in the day)67.6561.5412.514.94
3Type of job sedentary8.8219.2312.726.84
4Type of job worker14.713.857.1413.32
5Work stopping duration before surgery-sedentary>10037.1238.02
6Work stopping duration before surgery-heavy worker>30018.1818.74
7Work stopping duration before surgery-moderate>14009.049.44
8Work stopping duration before surgery-light worker>35004.725.16
9Sleep disorder2.9430.7710.1814.24
10Professional conflict5.8811.545.916.14
11Family conflict5.8811.5410.4214.62
12Specific physical activity005.9415.74
13General physical activity005.8215.72
14Appetite0015.1614.88
15Age32.3557.6914.1214.56
16BMI5069.231.8416.88
17Smoking23.5311.5412.2615.1
18Pre-operative walking distance38.2442.3110.8614.82
19Prior to surgery opioid consumption009.4615.58
20Cauda equina syndrome07.695.3814.76
21Transit disorders2.943.8514.5814.1
22Pre-operative motor deficit11.7619.239.4215.3
23Pre-operative sensitive deficit23.5330.7716.8814.06
24Impulsive movement or pushing effort14.7115.386.116.34
25Pre-operative inflammatory pain2.947.695.7215.54
26Limp10010012.814.98
27Acute lumbar pain29.4134.6214.6414.76
28Chronic lumbar pain73.5388.465.7815.36
29Lumbar stiffness23.5338.469.0614.98
30Sphincter dysfunction2.947.693.5415.42
31Diabetes8.8211.5412.514.48
32Pre-operative anxiety or depressive syndrome03.858.7615.16
33Sleep apnea syndrome2.9419.2313.6815.18
34COPD8.823.8514.5213.58
35Pneumopathy008.8415.64
36Liver disorder0011.114.54
37Atheroma0011.4814.72
38Kidney Disease5.883.8513.9415.2
39Pre-operative MODIC Images2.943.855.3815.5
40Pre-operative Calcification8.8205.3215.86
41Pre-operative stenosis52.945017.5813.84
42Pre-operative protrusion5.883.8518.1613.22
43Pre-operative excluded disc herniation5.88029.2624.4
44Pre-operative disc herniation38.2423.0814.2612.1
45L1L2 Level03.8520.5833.54
46L2L3 Level2.9430.7710.8216.62
47L3L4 Level17.65505.228.26
48Pre-operative arthrosis26.4723.0817.4414.5
49Pre-operative hypertrophic facet disease29.4126.9217.1414.12
50Pre-operative osteophyte03.8517.4613.86
51Pre-operative spondylolysis8.8211.5417.9813.66
52Explicit pre-operative explanations002.0816.02
53Operator experience (years of practice)0016.0414.42
54Food intake improvement0013.5215.18
55Sleep improvement008.2816.04
56Return to work sedentary >420028.5440.1
57Return to work light >420015.1418.42
58Return to work moderate >75006.869.5
59Return to work heavy workers >90003.844.86
60Infection2.943.8511.215.46
61Autonomous walking recovery03.858.816.2
62Anti-inflammatory drugs0012.614.7
63Post-operative anxiety or depressive syndrom009.2815.4
64Post-operative disc calcification009.3615.58
65Post-operative stenosis2.9404.1216.8
66Post-operative fibrosis5.8802.416.22
67Rehabilitation inpatients center009.1214.9
68Operative recurrence034.621.7216.04
Real and Synthetics Patient’s Predictors Distribution (%). As we are conscious of the lack of exhaustive data in the real patients cohort criteria, we presume that several non-significantly different criteria could be finally relevant if correctly assessed. Therefore, we preserve them to keep a maximum of meaningful data for the training of our machine learning and increase the reliability of our synthetic population.

Training and Validation of the Model (ANN Results)

The classifier is trained using 10000 patients from the training set and 2000 patients from the test set. The batch size is 2000 and the model is trained for 100 epochs. The loss decreases rapidly, and the accuracy is growing also quickly. After 50 epochs the model is already close to convergence (see Figure 6.).
Figure 6.

Training model evolution (Accuracy and loss / Number of epochs).

Training model evolution (Accuracy and loss / Number of epochs). The test set is also synthetic and does not provide a solid way of stopping the model before overfitting because it has the same convergence as the training set. Thus, we use the real data to test our model and stop training. After 100 epochs the test on real data gives an accuracy of 72% and the ROC curve is as follows with a ROC score of 0.78 (See Figure 6). The sensitivity of our model is then 88,5%, specificity is 58%, PPV is 62% an NPV 87%, these numbers for each zone are reported in Table 7.
Table 7.

ANN Model for Predict Successful Spine Surgery.

PrecisionRecallf1-scoreSupport
Orange Zone0.620.8850.7326
Green Zone0.870.590.7034
Accuracy0.7260
Macro average0.750.740.7260
Weighted avg0.760.720.7160
ANN Model global performance
ROC AUC ScoreSensitivitySpecificityPPVNPV
0.780.8850.590.620.87

Notes: PPV = Positive Predictive Value; NPV = Negative Predictive Values

ANN Model for Predict Successful Spine Surgery. Notes: PPV = Positive Predictive Value; NPV = Negative Predictive Values

Discussion

Our results show similar risk factors identified in other cohorts. In our real patients cohort, age > 65 years, BMI> 30, surgery same day of hospital entry, chronic low back pain are strongly predictive of the orange zone. In our virtual cohorts, sedentary job, L1L2 level, return to work to sedentary job >42 days, work stopping duration before surgery-sedentary>1, are the strongest predictors for the orange zone, ie. treatment failure or poor improvement. However, on their own, they cannot determine one outcome or the other. This illustrates the need for an individual predictive tool based on several predictors, having multiple degrees of influence (strength) on the outcome. Our model was statistically representative of the real data. We also used the real data as the validation set of the classifier, in order to better fit the real world. Our machine learning model can classify the orange population in 88,5% of cases, whereas our green zone is correctly classified in 59% of the cases. The overall precision, calculated by the area under the ROC curve (AUC) is 0.78 (see Figure 7).[35,56,63,74,67-76,78-81] This model is particularly suitable for screening patients who react negatively to lumbar surgery, with similar sensitivity to other predicting tools recently published. Nevertheless, there is still a lack of specificity, maybe due to the 23 missing criteria from the database, which prevent our model to evaluate their impact as clinical predictors. Although ANNs show very promising performance, it was trained using virtual patients generated by our model, thus limiting the precision of the response in real cases. Moreover the study sample of real patients was small, and therefore this study will need to be repeated with larger, multicentre datasets and external validation to convincingly demonstrate its validity and predictive power.
Figure 7.

AUC of our ANN-models using EHRs and syn-EHRs.

AUC of our ANN-models using EHRs and syn-EHRs. The goal of our method was to obtain a reproducible, repeatable, and usable tool, that can fit with various databases, deal with missing data and can be applied to similar stakes. Indeed, the missing complete electronic patient data, the difficulty to access it and the inability to standardize and exploit this data make the development of an omniscient prediction tool challenging. Thus, we increase the number of exploitable variables (below the significance threshold) to obtain an individual response, we generate virtual patients to increase the size of our training cohort, and we use medical know-how as a tool for architecture of our virtual patients to answer a data quantity problem. Our algorithm is based on deep learning, which goal is to use as much data as possible to increase its accuracy and precision. The more intensive the use of the algorithm, the better the accuracy in cases statistically farther and farther from the center of the Gaussian. Indeed, the amount of data influences the variability of this data. This increases the number of “rare” cases far from the median value, making it less necessary to use techniques to boost their number (data augmentation). The real cases collected by retro-analysis of the data will gradually replace the data augmentation of the training set and the model will increase its robustness. This method is used in all machine learning algorithms whose training is supervised. Successive versions are improved by increasing the dataset as the actual data is captured. As we move toward personalized medicine and value-based care, there is an increasing need to collect and use PRO scores not just in research settings, but also in routine clinical care or quality improvement activities. The progressive digital transformation in the healthcare facilities should allow us to collect more precise and valuable clinical data.

Conclusion

Our method can be used to predict outcome lumbar decompression surgery. There is still a need to further develop its ability to analyze patients in the “failure of treatment” zone in order to offer precise management of patient health before spinal surgery. Through the exploitation of a larger database more representative over time, we think that our model will be capable of improving classification of the orange zone. This model is in concordance with already published machine-learning tools in spine surgery, successfully allowing to predict the improvement of post-operative symptomatology[64,94] and reduction of drug consumption.[38,95,96] Thus, it will be possible to administer the patient’s health monitoring to reduce the post-operative risks and above all to promote its recovery after surgery with appropriate therapies. In addition, a software suite could help surgical practice by reducing the surgical gesture to its anatomical usefulness by avoiding the psychological or iatrogenic undesirable effects inherent in the medico-social framework of the intervention.
  89 in total

Review 1.  Complications in spine surgery.

Authors:  Rani Nasser; Sanjay Yadla; Mitchell G Maltenfort; James S Harrop; D Greg Anderson; Alexander R Vaccaro; Ashwini D Sharan; John K Ratliff
Journal:  J Neurosurg Spine       Date:  2010-08

2.  The prediction of successful surgery outcome in lumbar disc herniation based on artificial neural networks.

Authors:  Parisa Azimi; Edward C Benzel; Sohrab Shahzadi; Shirzad Azhari; Hassan R Mohammadi
Journal:  J Neurosurg Sci       Date:  2016-06       Impact factor: 2.279

3.  Using machine learning to predict 30-day readmissions after posterior lumbar fusion: an NSQIP study involving 23,264 patients.

Authors:  Benjamin S Hopkins; Jonathan T Yamaguchi; Roxanna Garcia; Kartik Kesavabhotla; Hannah Weiss; Wellington K Hsu; Zachary A Smith; Nader S Dahdaleh
Journal:  J Neurosurg Spine       Date:  2019-11-29

4.  An analysis from the Quality Outcomes Database, Part 2. Predictive model for return to work after elective surgery for lumbar degenerative disease.

Authors:  Anthony L Asher; Clinton J Devin; Kristin R Archer; Silky Chotai; Scott L Parker; Mohamad Bydon; Hui Nian; Frank E Harrell; Theodore Speroff; Robert S Dittus; Sharon E Philips; Christopher I Shaffrey; Kevin T Foley; Matthew J McGirt
Journal:  J Neurosurg Spine       Date:  2017-05-12

5.  Perioperative morbidity and complications in minimal access surgery techniques in obese patients with degenerative lumbar disease.

Authors:  Wolfgang Senker; Christian Meznik; Alexander Avian; Andrea Berghold
Journal:  Eur Spine J       Date:  2011-01-25       Impact factor: 3.134

6.  Use of artificial neural networks to predict surgical satisfaction in patients with lumbar spinal canal stenosis: clinical article.

Authors:  Parisa Azimi; Edward C Benzel; Sohrab Shahzadi; Shirzad Azhari; Hasan Reza Mohammadi
Journal:  J Neurosurg Spine       Date:  2014-01-17

7.  Weight loss in overweight and obese patients following successful lumbar decompression.

Authors:  Ryan M Garcia; Patrick J Messerschmitt; Christopher G Furey; Henry H Bohlman; Ezequiel H Cassinelli
Journal:  J Bone Joint Surg Am       Date:  2008-04       Impact factor: 5.284

8.  Diabetes associated with increased surgical site infections in spinal arthrodesis.

Authors:  Sam Chen; Matt V Anderson; Wayne K Cheng; Montri D Wongworawat
Journal:  Clin Orthop Relat Res       Date:  2009-02-19       Impact factor: 4.176

9.  Deep Learning-Based Risk Model for Best Management of Closed Groin Incisions After Vascular Surgery.

Authors:  Bora Chang; Zhifei Sun; Prabath Peiris; Erich S Huang; Ehsan Benrashid; Ellen D Dillavou
Journal:  J Surg Res       Date:  2020-03-17       Impact factor: 2.192

10.  Patient characteristics of smokers undergoing lumbar spine surgery: an analysis from the Quality Outcomes Database.

Authors:  Anthony L Asher; Clinton J Devin; Brandon McCutcheon; Silky Chotai; Kristin R Archer; Hui Nian; Frank E Harrell; Matthew McGirt; Praveen V Mummaneni; Christopher I Shaffrey; Kevin Foley; Steven D Glassman; Mohamad Bydon
Journal:  J Neurosurg Spine       Date:  2017-09-29
View more
  1 in total

Review 1.  Artificial Intelligence-Driven Prediction Modeling and Decision Making in Spine Surgery Using Hybrid Machine Learning Models.

Authors:  Babak Saravi; Frank Hassel; Sara Ülkümen; Alisia Zink; Veronika Shavlokhova; Sebastien Couillard-Despres; Martin Boeker; Peter Obid; Gernot Michael Lang
Journal:  J Pers Med       Date:  2022-03-22
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.