Florian R Schroeck1, Olga V Patterson2, Patrick R Alba2, Erik A Pattison3, John D Seigne4, Scott L DuVall2, Douglas J Robertson5, Brenda Sirovich5, Philip P Goodney5. 1. VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH. Electronic address: florian.r.schroeck@dartmouth.edu. 2. Department of Internal Medicine, VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT. 3. VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH. 4. Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH. 5. VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH.
Abstract
OBJECTIVE: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports. METHODS: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer. RESULTS: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer. CONCLUSION: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data. Published by Elsevier Inc.
OBJECTIVE: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports. METHODS: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer. RESULTS: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer. CONCLUSION: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data. Published by Elsevier Inc.
Authors: Henk Harkema; Wendy W Chapman; Melissa Saul; Evan S Dellon; Robert E Schoen; Ateev Mehrotra Journal: J Am Med Inform Assoc Date: 2011-09-21 Impact factor: 4.497
Authors: Anthony N Nguyen; Michael J Lawley; David P Hansen; Rayleen V Bowman; Belinda E Clarke; Edwina E Duhig; Shoni Colquist Journal: J Am Med Inform Assoc Date: 2010 Jul-Aug Impact factor: 4.497
Authors: Marko Babjuk; Andreas Böhle; Maximilian Burger; Otakar Capoun; Daniel Cohen; Eva M Compérat; Virginia Hernández; Eero Kaasinen; Joan Palou; Morgan Rouprêt; Bas W G van Rhijn; Shahrokh F Shariat; Viktor Soukup; Richard J Sylvester; Richard Zigeuner Journal: Eur Urol Date: 2016-06-17 Impact factor: 20.096
Authors: Maximilian Burger; James W F Catto; Guido Dalbagni; H Barton Grossman; Harry Herr; Pierre Karakiewicz; Wassim Kassouf; Lambertus A Kiemeney; Carlo La Vecchia; Shahrokh Shariat; Yair Lotan Journal: Eur Urol Date: 2012-07-25 Impact factor: 20.096
Authors: Florian R Schroeck; Erik A Pattison; Daniel W Denhalter; Olga V Patterson; Scott L DuVall; John D Seigne; Douglas J Robertson; Brenda Sirovich; Philip P Goodney Journal: Urology Date: 2016-08-30 Impact factor: 2.649
Authors: Richard J Sylvester; Adrian P M van der Meijden; Willem Oosterlinck; J Alfred Witjes; Christian Bouffioux; Louis Denis; Donald W W Newling; Karlheinz Kurth Journal: Eur Urol Date: 2006-01-17 Impact factor: 20.096
Authors: Jesus Fernandez-Gomez; Rosario Madero; Eduardo Solsona; Miguel Unda; Luis Martinez-Piñeiro; Marcelino Gonzalez; Jose Portillo; Antonio Ojea; Carlos Pertusa; Jesus Rodriguez-Molina; Jose Emilio Camacho; Mariano Rabadan; Ander Astobieta; Manuel Montesinos; Santiago Isorna; Pedro Muntañola; Anabel Gimeno; Miguel Blas; Jose Antonio Martinez-Piñeiro Journal: J Urol Date: 2009-09-16 Impact factor: 7.450
Authors: Karim Chamie; Mark S Litwin; Jeffrey C Bassett; Timothy J Daskivich; Julie Lai; Jan M Hanley; Badrinath R Konety; Christopher S Saigal Journal: Cancer Date: 2013-06-04 Impact factor: 6.860
Authors: Florian R Schroeck; Kristine E Lynch; Zhongze Li; Todd A MacKenzie; David S Han; John D Seigne; Douglas J Robertson; Brenda Sirovich; Philip P Goodney Journal: Cancer Date: 2019-05-23 Impact factor: 6.860
Authors: Anobel Y Odisho; Mark Bridge; Mitchell Webb; Niloufar Ameli; Renu S Eapen; Frank Stauf; Janet E Cowan; Samuel L Washington; Annika Herlemann; Peter R Carroll; Matthew R Cooperberg Journal: JCO Clin Cancer Inform Date: 2019-07
Authors: Florian R Schroeck; Kristine E Lynch; Ji Won Chang; Todd A MacKenzie; John D Seigne; Douglas J Robertson; Philip P Goodney; Brenda Sirovich Journal: JAMA Netw Open Date: 2018-09-28
Authors: Anobel Y Odisho; Briton Park; Nicholas Altieri; John DeNero; Matthew R Cooperberg; Peter R Carroll; Bin Yu Journal: JAMIA Open Date: 2020-10-14
Authors: Michael E Rezaee; Kristine E Lynch; Zhongze Li; Todd A MacKenzie; John D Seigne; Douglas J Robertson; Brenda Sirovich; Philip P Goodney; Florian R Schroeck Journal: PLoS One Date: 2020-03-23 Impact factor: 3.240
Authors: Carlos R Oliveira; Patrick Niccolai; Anette Michelle Ortiz; Sangini S Sheth; Eugene D Shapiro; Linda M Niccolai; Cynthia A Brandt Journal: JMIR Med Inform Date: 2020-11-03
Authors: Michael E Rezaee; A Aziz Ould Ismail; Chiamaka L Okorie; John D Seigne; Kristine E Lynch; Florian R Schroeck Journal: Eur Urol Open Sci Date: 2021-02-16