Sobia Nasir Laique1, Umar Hayat2, Shashank Sarvepalli3, Byron Vaughn2, Mounir Ibrahim4, John McMichael4, Kanza Noor Qaiser5, Carol Burke4, Amit Bhatt4, Colin Rhodes6, Maged K Rizk4. 1. Division of Gastroenterology and Hepatology, Mayo Clinic, Phoenix, Arizona, USA. 2. Division of Gastroenterology, University of Minnesota, Minneapolis, Minnesota, USA. 3. Department of Hospital Medicine, Cleveland Clinic, Cleveland, Ohio, USA; Department of Bioinformatics, Vanderbilt University, Nashville, Tennessee, USA. 4. Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio, USA. 5. Department of Hospital Medicine, Cleveland Clinic, Cleveland, Ohio, USA. 6. eHealth Technology, West Henrietta, New York, New York, USA.
Abstract
BACKGROUND AND AIMS: Colonoscopy is commonly performed for colorectal cancer screening in the United States. Reports are often generated in a non-standardized format and are not always integrated into electronic health records. Thus, this information is not readily available for streamlining quality management, participating in endoscopy registries, or reporting of patient- and center-specific risk factors predictive of outcomes. We aim to demonstrate the use of a new hybrid approach using natural language processing of charts that have been elucidated with optical character recognition processing (OCR/NLP hybrid) to obtain relevant clinical information from scanned colonoscopy and pathology reports, a technology co-developed by Cleveland Clinic and eHealth Technologies (West Henrietta, NY, USA). METHODS: This was a retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and the University of Minnesota, Minneapolis, Minnesota. A randomly sampled list of outpatient screening colonoscopy procedures and pathology reports was selected. Desired variables were then collected. Two researchers first manually reviewed the reports for the desired variables. Then, the OCR/NLP algorithm was used to obtain the same variables from 3 electronic health records in use at our institution: Epic (Verona, Wisc, USA), ProVation (Minneapolis, Minn, USA) used for endoscopy reporting, and Sunquest PowerPath (Tucson, Ariz, USA) used for pathology reporting. RESULTS: Compared with manual data extraction, the accuracy of the hybrid OCR/NLP approach to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%. Comparison of the dataset collected via NLP alone with that collected using the hybrid OCR/NLP approach showed that the accuracy for almost all variables was >99%. CONCLUSIONS: Our study is the first to validate the use of a unique hybrid OCR/NLP technology to extract desired variables from scanned procedure and pathology reports contained in image format with an accuracy >95%.
BACKGROUND AND AIMS: Colonoscopy is commonly performed for colorectal cancer screening in the United States. Reports are often generated in a non-standardized format and are not always integrated into electronic health records. Thus, this information is not readily available for streamlining quality management, participating in endoscopy registries, or reporting of patient- and center-specific risk factors predictive of outcomes. We aim to demonstrate the use of a new hybrid approach using natural language processing of charts that have been elucidated with optical character recognition processing (OCR/NLP hybrid) to obtain relevant clinical information from scanned colonoscopy and pathology reports, a technology co-developed by Cleveland Clinic and eHealth Technologies (West Henrietta, NY, USA). METHODS: This was a retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and the University of Minnesota, Minneapolis, Minnesota. A randomly sampled list of outpatient screening colonoscopy procedures and pathology reports was selected. Desired variables were then collected. Two researchers first manually reviewed the reports for the desired variables. Then, the OCR/NLP algorithm was used to obtain the same variables from 3 electronic health records in use at our institution: Epic (Verona, Wisc, USA), ProVation (Minneapolis, Minn, USA) used for endoscopy reporting, and Sunquest PowerPath (Tucson, Ariz, USA) used for pathology reporting. RESULTS: Compared with manual data extraction, the accuracy of the hybrid OCR/NLP approach to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%. Comparison of the dataset collected via NLP alone with that collected using the hybrid OCR/NLP approach showed that the accuracy for almost all variables was >99%. CONCLUSIONS: Our study is the first to validate the use of a unique hybrid OCR/NLP technology to extract desired variables from scanned procedure and pathology reports contained in image format with an accuracy >95%.
Authors: Douglas K Rex; Philip S Schoenfeld; Jonathan Cohen; Irving M Pike; Douglas G Adler; M Brian Fennerty; John G Lieb; Walter G Park; Maged K Rizk; Mandeep S Sawhney; Nicholas J Shaheen; Sachin Wani; David S Weinberg Journal: Am J Gastroenterol Date: 2014-12-02 Impact factor: 10.864
Authors: Timothy D Imler; Justin Morea; Charles Kahi; Eric A Sherer; Jon Cardwell; Cynthia S Johnson; Huiping Xu; Dennis Ahnen; Fadi Antaki; Christopher Ashley; Gyorgy Baffy; Ilseung Cho; Jason Dominitz; Jason Hou; Mark Korsten; Anil Nagar; Kittichai Promrat; Douglas Robertson; Sameer Saini; Amandeep Shergill; Walter Smalley; Thomas F Imperiale Journal: Am J Gastroenterol Date: 2015-03-10 Impact factor: 10.864
Authors: Serguei S V Pakhomov; Harry Hemingway; Susan A Weston; Steven J Jacobsen; Richard Rodeheffer; Véronique L Roger Journal: Am Heart J Date: 2007-04 Impact factor: 4.749
Authors: Ann G Zauber; Sidney J Winawer; Michael J O'Brien; Iris Lansdorp-Vogelaar; Marjolein van Ballegooijen; Benjamin F Hankey; Weiji Shi; John H Bond; Melvin Schapiro; Joel F Panish; Edward T Stewart; Jerome D Waye Journal: N Engl J Med Date: 2012-02-23 Impact factor: 91.245
Authors: Gottumukkala S Raju; Phillip J Lum; Rebecca S Slack; Selvi Thirumurthi; Patrick M Lynch; Ethan Miller; Brian R Weston; Marta L Davila; Manoop S Bhutani; Mehnaz A Shafi; Robert S Bresalier; Alexander A Dekovich; Jeffrey H Lee; Sushovan Guha; Mala Pande; Boris Blechacz; Asif Rashid; Mark Routbort; Gladis Shuttlesworth; Lopa Mishra; John R Stroehlein; William A Ross Journal: Gastrointest Endosc Date: 2015-04-22 Impact factor: 9.427
Authors: Andrew J Gawron; William K Thompson; Rajesh N Keswani; Luke V Rasmussen; Abel N Kho Journal: Am J Gastroenterol Date: 2014-06-17 Impact factor: 10.864
Authors: Djenaba A Joseph; Reinier G S Meester; Ann G Zauber; Diane L Manninen; Linda Winges; Fred B Dong; Brandy Peaker; Marjolein van Ballegooijen Journal: Cancer Date: 2016-05-20 Impact factor: 6.860
Authors: Jeffrey K Lee; Christopher D Jensen; Theodore R Levin; Ann G Zauber; Chyke A Doubeni; Wei K Zhao; Douglas A Corley Journal: J Clin Gastroenterol Date: 2019-01 Impact factor: 3.062
Authors: Reiko Nishihara; Kana Wu; Paul Lochhead; Teppei Morikawa; Xiaoyun Liao; Zhi Rong Qian; Kentaro Inamura; Sun A Kim; Aya Kuchiba; Mai Yamauchi; Yu Imamura; Walter C Willett; Bernard A Rosner; Charles S Fuchs; Edward Giovannucci; Shuji Ogino; Andrew T Chan Journal: N Engl J Med Date: 2013-09-19 Impact factor: 91.245
Authors: Amy G Feldman; Susan Moore; Sheana Bull; Megan A Morris; Kumanan Wilson; Cameron Bell; Margaret M Collins; Kathryn M Denize; Allison Kempe Journal: JMIR Form Res Date: 2022-01-13