Literature DB >> 35412473

Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

Aditya V Karhade1, Jacobien H F Oosterhoff1,2, Olivier Q Groot1, Nicole Agaronnik1, Jeffrey Ehresman3, Michiel E R Bongers1, Ruurd L Jaarsma4, Santosh I Poonnoose5, Daniel M Sciubba3, Daniel G Tobert1, Job N Doornberg4,6, Joseph H Schwab1.   

Abstract

BACKGROUND: Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity. QUESTION/
PURPOSE: Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents?
METHODS: Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, F1-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated.
RESULTS: The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]).
CONCLUSION: We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/ . Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts. LEVEL OF EVIDENCE: Level III, diagnostic study.
Copyright © 2022 by the Association of Bone and Joint Surgeons.

Entities:  

Mesh:

Year:  2022        PMID: 35412473      PMCID: PMC9384904          DOI: 10.1097/CORR.0000000000002200

Source DB:  PubMed          Journal:  Clin Orthop Relat Res        ISSN: 0009-921X            Impact factor:   4.755


  30 in total

Review 1.  Unintended durotomy in lumbar degenerative spinal surgery: a 10-year systematic review of the literature.

Authors:  George M Ghobrial; Thana Theofanis; Bruce V Darden; Paul Arnold; Michael G Fehlings; James S Harrop
Journal:  Neurosurg Focus       Date:  2015-10       Impact factor: 4.047

2.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination.

Authors:  George C M Siontis; Ioanna Tzoulaki; Peter J Castaldi; John P A Ioannidis
Journal:  J Clin Epidemiol       Date:  2014-10-23       Impact factor: 6.437

3.  Using Natural Language Processing of Free-Text Radiology Reports to Identify Type 1 Modic Endplate Changes.

Authors:  Hannu T Huhdanpaa; W Katherine Tan; Sean D Rundell; Pradeep Suri; Falgun H Chokshi; Bryan A Comstock; Patrick J Heagerty; Kathryn T James; Andrew L Avins; Srdjan S Nedeljkovic; David R Nerenz; David F Kallmes; Patrick H Luetmer; Karen J Sherman; Nancy L Organ; Brent Griffith; Curtis P Langlotz; David Carrell; Saeed Hassanpour; Jeffrey G Jarvik
Journal:  J Digit Imaging       Date:  2018-02       Impact factor: 4.056

4.  On the Construction of Multilingual Corpora for Clinical Text Mining.

Authors:  Fabián Villena; Urs Eisenmann; Petra Knaup; Jocelyn Dunstan; Matthias Ganzinger
Journal:  Stud Health Technol Inform       Date:  2020-06-16

5.  Editorial Commentary: Big Databases Are Not All Created Equal: Interpret Their Studies With Caution.

Authors:  Michael D Feldman
Journal:  Arthroscopy       Date:  2021-01       Impact factor: 4.772

6.  Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.

Authors:  Aditya V Karhade; Michiel E R Bongers; Olivier Q Groot; Thomas D Cha; Terence P Doorly; Harold A Fogel; Stuart H Hershman; Daniel G Tobert; Sunita D Srivastava; Christopher M Bono; James D Kang; Mitchel B Harris; Joseph H Schwab
Journal:  Spine J       Date:  2020-04-12       Impact factor: 4.166

7.  Use of Natural Language Processing Algorithms to Identify Common Data Elements in Operative Notes for Knee Arthroplasty.

Authors:  Elham Sagheb; Taghi Ramazanian; Ahmad P Tafti; Sunyang Fu; Walter K Kremers; Daniel J Berry; David G Lewallen; Sunghwan Sohn; Hilal Maradit Kremers
Journal:  J Arthroplasty       Date:  2020-10-10       Impact factor: 4.757

Review 8.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.

Authors:  Gary S Collins; Johannes B Reitsma; Douglas G Altman; Karel G M Moons
Journal:  BMJ       Date:  2015-01-07

9.  An Evaluation of Pretrained BERT Models for Comparing Semantic Similarity Across Unstructured Clinical Trial Texts.

Authors:  Jessica Patricoski; Kory Kreimeyer; Archana Balan; Kent Hardart; Jessica Tao; Valsamo Anagnostou; Taxiarchis Botsis
Journal:  Stud Health Technol Inform       Date:  2022-01-14

Review 10.  CORR Synthesis: When Should We Be Skeptical of Clinical Prediction Models?

Authors:  Aditya V Karhade; Joseph H Schwab
Journal:  Clin Orthop Relat Res       Date:  2020-12       Impact factor: 4.755

View more
  1 in total

1.  CORR Insights®: Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

Authors:  Eugene K Wai
Journal:  Clin Orthop Relat Res       Date:  2022-05-25       Impact factor: 4.755

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.