Alexander P Glaser1, Brian J Jordan1, Jason Cohen1, Anuj Desai1, Philip Silberman1, Joshua J Meeks1. 1. Alexander P. Glaser, Brian J. Jordan, Jason Cohen, Anuj Desai, Joshua J. Meeks, Feinberg School of Medicine, Northwestern University; Alexander P. Glaser, Brian J. Jordan, Joshua J. Meeks, Robert H. Lurie Comprehensive Cancer Center, Northwestern University; and Philip Silberman, Clinical and Translational Sciences Institute, Northwestern University, Chicago, IL.
Abstract
PURPOSE: Bladder cancer is initially diagnosed and staged with a transurethral resection of bladder tumor (TURBT). Patient survival is dependent on appropriate sampling of layers of the bladder, but pathology reports are dictated as free text, making large-scale data extraction for quality improvement challenging. We sought to automate extraction of stage, grade, and quality information from TURBT pathology reports using natural language processing (NLP). METHODS: Patients undergoing TURBT were retrospectively identified using the Northwestern Enterprise Data Warehouse. An NLP algorithm was then created to extract information from free-text pathology reports and was iteratively improved using a training set of manually reviewed TURBTs. NLP accuracy was then validated using another set of manually reviewed TURBTs, and reliability was calculated using Cohen's κ. RESULTS: Of 3,042 TURBTs identified from 2006 to 2016, 39% were classified as benign, 35% as Ta, 11% as T1, 4% as T2, and 10% as isolated carcinoma in situ. Of 500 randomly selected manually reviewed TURBTs, NLP correctly staged 88% of specimens (κ = 0.82; 95% CI, 0.78 to 0.86). Of 272 manually reviewed T1 tumors, NLP correctly categorized grade in 100% of tumors (κ = 1), correctly categorized if muscularis propria was reported by the pathologist in 98% of tumors (κ = 0.81; 95% CI, 0.62 to 0.99), and correctly categorized if muscularis propria was present or absent in the resection specimen in 82% of tumors (κ = 0.62; 95% CI, 0.55 to 0.73). Discrepancy analysis revealed pathologist notes and deeper resection specimens as frequent reasons for NLP misclassifications. CONCLUSION: We developed an NLP algorithm that demonstrates a high degree of reliability in extracting stage, grade, and presence of muscularis propria from TURBT pathology reports. Future iterations can continue to improve performance, but automated extraction of oncologic information is promising in improving quality and assisting physicians in delivery of care.
PURPOSE: Bladder cancer is initially diagnosed and staged with a transurethral resection of bladder tumor (TURBT). Patient survival is dependent on appropriate sampling of layers of the bladder, but pathology reports are dictated as free text, making large-scale data extraction for quality improvement challenging. We sought to automate extraction of stage, grade, and quality information from TURBT pathology reports using natural language processing (NLP). METHODS:Patients undergoing TURBT were retrospectively identified using the Northwestern Enterprise Data Warehouse. An NLP algorithm was then created to extract information from free-text pathology reports and was iteratively improved using a training set of manually reviewed TURBTs. NLP accuracy was then validated using another set of manually reviewed TURBTs, and reliability was calculated using Cohen's κ. RESULTS: Of 3,042 TURBTs identified from 2006 to 2016, 39% were classified as benign, 35% as Ta, 11% as T1, 4% as T2, and 10% as isolated carcinoma in situ. Of 500 randomly selected manually reviewed TURBTs, NLP correctly staged 88% of specimens (κ = 0.82; 95% CI, 0.78 to 0.86). Of 272 manually reviewed T1 tumors, NLP correctly categorized grade in 100% of tumors (κ = 1), correctly categorized if muscularis propria was reported by the pathologist in 98% of tumors (κ = 0.81; 95% CI, 0.62 to 0.99), and correctly categorized if muscularis propria was present or absent in the resection specimen in 82% of tumors (κ = 0.62; 95% CI, 0.55 to 0.73). Discrepancy analysis revealed pathologist notes and deeper resection specimens as frequent reasons for NLP misclassifications. CONCLUSION: We developed an NLP algorithm that demonstrates a high degree of reliability in extracting stage, grade, and presence of muscularis propria from TURBT pathology reports. Future iterations can continue to improve performance, but automated extraction of oncologic information is promising in improving quality and assisting physicians in delivery of care.
Authors: Mahul B Amin; John R Srigley; David J Grignon; Victor E Reuter; Peter A Humphrey; Michael B Cohen; M Elizabeth H Hammond Journal: Arch Pathol Lab Med Date: 2003-10 Impact factor: 5.534
Authors: Robert S Svatek; Brent K Hollenbeck; Sten Holmäng; Richard Lee; Simon P Kim; Arnulf Stenzl; Yair Lotan Journal: Eur Urol Date: 2014-01-21 Impact factor: 20.096
Authors: Richard S Matulewicz; Jeffrey J Tosoian; C J Stimson; Ashley E Ross; Meera Chappidi; Tamara L Lotan; Elizabeth Humphreys; Alan W Partin; Edward M Schaeffer Journal: J Urol Date: 2016-12-01 Impact factor: 7.450
Authors: Justin R Gregg; Maximilian Lang; Lucy L Wang; Matthew J Resnick; Sandeep K Jain; Jeremy L Warner; Daniel A Barocas Journal: JCO Clin Cancer Inform Date: 2017-06-08
Authors: Guido Dalbagni; Kinjal Vora; Matthew Kaag; Angel Cronin; Bernard Bochner; S Machele Donat; Harry W Herr Journal: Eur Urol Date: 2009-07-17 Impact factor: 20.096
Authors: Tomasz Oliwa; Steven B Maron; Leah M Chase; Samantha Lomnicki; Daniel V T Catenacci; Brian Furner; Samuel L Volchenboum Journal: JCO Clin Cancer Inform Date: 2019-08
Authors: Anobel Y Odisho; Mark Bridge; Mitchell Webb; Niloufar Ameli; Renu S Eapen; Frank Stauf; Janet E Cowan; Samuel L Washington; Annika Herlemann; Peter R Carroll; Matthew R Cooperberg Journal: JCO Clin Cancer Inform Date: 2019-07
Authors: Joeky T Senders; Aditya V Karhade; David J Cote; Alireza Mehrtash; Nayan Lamba; Aislyn DiRisio; Ivo S Muskens; William B Gormley; Timothy R Smith; Marike L D Broekman; Omar Arnaout Journal: JCO Clin Cancer Inform Date: 2019-04
Authors: Sajjad Abedian; Evan T Sholle; Prakash M Adekkanattu; Marika M Cusick; Stephanie E Weiner; Jonathan E Shoag; Jim C Hu; Thomas R Campion Journal: JCO Clin Cancer Inform Date: 2021-10
Authors: Anobel Y Odisho; Briton Park; Nicholas Altieri; John DeNero; Matthew R Cooperberg; Peter R Carroll; Bin Yu Journal: JAMIA Open Date: 2020-10-14
Authors: Bridie S Thompson; Sam Hardy; Nirmala Pandeya; Jean Claude Dusingize; Adele C Green; Athon Millane; Daniel Bourke; Ronald Grande; Cameron D Bean; Catherine M Olsen; David C Whiteman Journal: JCO Clin Cancer Inform Date: 2020-08