Sami-Ramzi Leyh-Bannurah1, Zhe Tian1, Pierre I Karakiewicz1, Ulrich Wolffgang1, Guido Sauter1, Margit Fisch1, Dirk Pehrke1, Hartwig Huland1, Markus Graefen1, Lars Budäus1. 1. Sami-Ramzi Leyh-Bannurah, Dirk Pehrke, Hartwig Huland, Markus Graefen, and Lars Budäus, Prostate Cancer Center Hamburg-Eppendorf; Sami-Ramzi Leyh-Bannurah, Margit Fisch, and Guido Sauter, University Medical Center Hamburg-Eppendorf, Hamburg; Ulrich Wolffgang, University of Muenster, Muenster, Germany; and Zhe Tian and Pierre I. Karakiewicz, University of Montreal Health Center, Montreal, Canada.
Abstract
PURPOSE: Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). METHODS: Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. RESULTS: Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. CONCLUSION: Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines.
PURPOSE: Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). METHODS: Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. RESULTS: Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. CONCLUSION: Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines.
Authors: Tomasz Oliwa; Steven B Maron; Leah M Chase; Samantha Lomnicki; Daniel V T Catenacci; Brian Furner; Samuel L Volchenboum Journal: JCO Clin Cancer Inform Date: 2019-08
Authors: Selen Bozkurt; Christopher J Magnani; Martin G Seneviratne; James D Brooks; Tina Hernandez-Boussard Journal: Front Digit Health Date: 2022-06-02
Authors: Andrew B Chen; Taseen Haque; Sidney Roberts; Sirisha Rambhatla; Giovanni Cacciamani; Prokar Dasgupta; Andrew J Hung Journal: Urol Clin North Am Date: 2021-10-23 Impact factor: 2.766
Authors: Sajjad Abedian; Evan T Sholle; Prakash M Adekkanattu; Marika M Cusick; Stephanie E Weiner; Jonathan E Shoag; Jim C Hu; Thomas R Campion Journal: JCO Clin Cancer Inform Date: 2021-10
Authors: Pilar López-Úbeda; Alexandra Pomares-Quimbaya; Manuel Carlos Díaz-Galiano; Stefan Schulz Journal: BMC Med Inform Decis Mak Date: 2021-05-04 Impact factor: 2.796
Authors: Miaohan Qiu; Yi Li; Kun Na; Zizhao Qi; Sicong Ma; He Zhou; Xiaoming Xu; Jing Li; Kai Xu; Xiaozeng Wang; Yaling Han Journal: Front Cardiovasc Med Date: 2022-01-13